1 Study materials

1.1 Relational operators

For more information read the help for relational operators (?Comparison).

Carry out the relational operator activities and exercies.


1.2 Boolean (logical) operators

Read Section 12.3 of R for Data Science (2e).

For more information read the help for relational operators (?Logic).

Carry out the relational operator activities and exercies.


1.3 Logical indexing to subset vectors and data frames

Carry out the logical indexing activities and exercies.


2 Activities

2.1 Relational operators

Refer to the relational operators section and perform the following tasks.

You have seen binary arithmetic operators such as +, -, *, /, ^. The arithmetic operators, take two numeric values or variables and compute some new numeric value. For example, 22+35 returns the value 57, which is of type double:

x <- 22+35
x
## [1] 57
typeof(x)
## [1] "double"

The relational operators are also binary since they take two variables or values, but instead of returning a numeric value, they return a boolean (TRUE/FALSE) value depending on whether the relation is true or not.

  1. Insert a code chunk and run the statement 5 > 1.

  2. The returned value can be assigned to a variable using the assignment operator <-. Assign the value of a relational statement using the == operator to a variable and determine the variable’s datatype.

  3. Comparisons need not be between numerical variables, but can occur between other datatypes as well. Assign the string “James” to a variable and use the != to compare it to the string “james”. Explain the result.

  4. Comparisons can also be made between data structures such as vectors and data frames. Instead of a single logical value, a vector of logical values is returned. Assign the first 10 elements of the Sepal.Length column of the iris data frame to a variable. Check the datatype of the variable. Then compare the variable to 4.6 using the <= relational operator and assign the result to a different variable. Print out both the variables. Do the results make sense?

  5. Given that TRUE and FALSE are equivalent to 1 and 0 respectively, summing the elements of a logical vector tells us how many comparisons were true. The sum function (see ?sum in console) sums the values of all its arguments. Use sum to determine how many flowers above had sepal length less than or equal to 4.6.

  6. Assign the first 10 rows and first 4 columns of the iris data frame to a variable. Compare the variable to 3.4 with the >= operator and assign the result to a new variable. Print out the value of the new variable. Use the class() and typeof() functions to determine the kind of data structure and data type the variable is respectively.


2.2 Boolean (logical) operators

Very often we would like to combine the results of two or more comparisons. For example, one may want to know if there are any pickup trucks with more highway mileage than 20 miles per gallon. To do this, we would perform two comparisons: 1) which trucks have class variable with value "pickup" and 2) which trucks have hwy greater than 20. Then, we would like to combine the comparisons so that we determine the trucks for which both conditions are met.

  1. We can combine logical values/variables using boolean/logical operators. The AND (&) operator returns TRUE when both operands are true. Create a code chunk to perform the AND between different combinations of TRUE and FALSE.

  2. The boolean operators don’t just work between scalar logical values and variables, but between logical vectors as well. Use the combine c() to create two logical vectors encapsulating all four pairs of TRUE and FALSE. Use the data.frame function to create a data frame with three columns called IN1, IN2, and OUT, where IN1 and IN2 are two vectors and OUT is the result of a logical AND between the two vectors. Print out the data frame.

  3. Create and print the truth table for the logical OR (|) as was done for AND.

  4. Write a statement to test which elements of the class column of the mpg data frame are equal to "pickup" and assign the result to a variable. Do the same to test which trucks have hwy greater than 20 and assign to a different variable. In order to determine which trucks meet both conditions, we will use the logical AND operator &. Write a statement that determines which elements of both logical vectors are TRUE by using the & operator. Print out its value and lenght. Then determine how many pickup trucks have mileage greater than 20 mpg using the sum function.

2.3 Logical indexing to subset vectors and data frames

In the example above, there is only one pickup with hwy mileage greater than 20 mpg. It would be great to figure out which one it is. In Module 2 Study Guide A, we learnt how to subset vectors and data frames using the indices of the elements, rows, or columns. However, to subset by index you have to know beforehand the position of the element you are looking for and you are limited to only contiguous sets of elements when you use a range such as 5:10. R provides another way to subset vectors and data frames, known as logical indexing that is very powerful.

  1. In logical indexing, instead of specifying the numeric indices of the elements we would like to subset, we instead specify a logical vector of the same dimensions as the vector/data frame. R then returns those elements where the logical vector is TRUE. The seq() function returns a numeric vector with evenly spaced numbers (see ?seq in console). Use the seq function to create a numeric vector of length 5 with values ranging from 1 to 5. Print it out. Also create a logical vector of length 5 with two elements TRUE and the rest FALSE. Print it out. Index the former by the latter and assign the result to a new variable. What is it’s length? Print out its value. Which elements of the original vector remain?

  2. This approach can also be used to subset data frames. Use data.frame() and rnorm() to create a data frame with 5 rows and 3 columns. Create a two logical vectors, one of length 5 and the other of length 3, with some values TRUE and others FALSE. Print all of them out. Subset the rows using logical indexing with the 5 element-long logical vector. Check the dimensions of the subset and print its value out. What has changed? Similarly subset the columns with the 3 element-long logical vector.

  3. Use logical indexing to subset the mpg data frame to identify the pickup truck with highway mileage greater than 20 mpg.

  4. It is possible to also combine multiple conditions using more than one Boolean operator. The main thing to remember when using multiple Boolean operators is that, just like arithematic operators, they also have precedence rules, and lower precendence operators must be protected (if needed) from higher precedence ones. The order of precendence is ! > & > |. How do the output of the two statements FALSE & FALSE | TRUE and FALSE & (FALSE | TRUE) differ? Why?

  5. Load the palmerpengiuns library. Use logical indexing to subset the penguins dataset for Adelie penguins that live on either Biscoe or Dream. Determine how many there are.


2.4 Additional exercises

2.4.1 Relational operators

  1. The rnorm(n) function generates n random numbers from the Normal distribution. Use ?rnorm in the console to look up its arguments. Assign a single random number to a variable, print out the value, and then test whether it is less than 0 using a relational statement. Run the code chunk repeatedly until you have seen both TRUE and FALSE.

  2. The sample(x, size) function generates a random sample of size size from vector x. Assign the manufacturer column of the mpg data frame to a variable. Generate a random sample of 10 manufacturer names using sample. Print out the sample of names. Use a relation statement to determine which ones match ford and assign to variable. Print out its value. Determine how many match ford.

  3. Comparisons can also be made between two vectors of equal length. Assign the first four columns of the first flower in the iris dataset to a variable and do the same for the fourth flower. Which flower is bigger?

2.4.2 Boolean (logical) operators

  1. Follow the procedures in the activities for Boolean operators to create and print the truth table for the xor logical operator. Note: the syntax for xor is different from & or | (see ?Logic in console). What is the difference between | and xor?

  2. Use the procedures above to create a truth table for the NOT (!) operator. Note: ! is a unary instead of a binary operator, so you will need only one input logical vector with the two possible values of a logical variable. Also, be sure to use something other than in as the variable name since in is a reserved word. What is the function of the NOT operator?

  3. In the mpg dataset, use relational and Boolean operators to determine how many subcompact cars with 6 cyclinders were made in 1999.

  4. Using relational and Boolean operators, determine how many Chinstrap penguins had body mass between 3000g and 3300g.

  5. In the mpg dataset, how many cars except pickup trucks or suvs had highway mileage less than 18?

2.4.3 Logical indexing to subset vectors and data frames

  1. Refer to question 3 in the Boolean operator exercises. Use logical indexing to determine which subcompact cars made in 1999 had 6 cylinders.

  2. Refer to question 4 in the Boolean operator exercises. Use logical indexing to determine which islands the Chinstrap penguins with body mass between 3000g and 3300g lived?

  3. Refer to question 5 in the Boolean operator exercises. Use logical indexing to determine which cars except pickup trucks or suvs had highway mileage less than 15 in the mpg dataset?

  4. As we did in Module 1 HW, create a scatterplot of flipper length vs. body mass, putting the former on the y- and the latter on the x-axis and use color to show sex. The is.na() (see ?is.na in console) function returns TRUE if its argument is NA (not applicable, which is used to indicate missing data). If the argument is a vector, is.na returns a logical vector with TRUE in the positions with NAs and FALSE elsewhere. Use is.na and Boolean operators to create a logical vector which identifies the penguins whose sex is not NA. Use this logical vector to remove the rows which have NA for sex and assign to a variable. Use this variable to recreate plot.


End of Module 2 Study Guide B