For more information read the help for relational operators
(?Comparison
).
Carry out the relational operator activities and exercies.
Read Section 12.3 of R for Data Science (2e).
For more information read the help for relational operators
(?Logic
).
Carry out the relational operator activities and exercies.
Carry out the logical indexing activities and exercies.
Refer to the relational operators section and perform the following tasks.
You have seen binary
arithmetic operators such as +
, -
,
*
, /
, ^
. The arithmetic
operators, take two numeric values or variables and compute some new
numeric value. For example, 22+35
returns the value 57
, which is of type
double:
x <- 22+35
x
## [1] 57
typeof(x)
## [1] "double"
The relational operators are also binary since they take two variables or values, but instead of returning a numeric value, they return a boolean (TRUE/FALSE) value depending on whether the relation is true or not.
Insert a code chunk and run the
statement 5 > 1
.
The returned value can be
assigned to a variable using the assignment operator <-
.
Assign the value of a relational statement using the ==
operator to a variable and determine the variable’s
datatype.
Comparisons need not be between
numerical variables, but can occur between other datatypes as well.
Assign the string “James” to a variable and use the !=
to
compare it to the string “james”. Explain the result.
Comparisons can also be made
between data structures such as vectors and data frames. Instead of a
single logical value, a vector of logical values is
returned. Assign the first 10 elements of the Sepal.Length
column of the iris
data frame to a variable. Check the
datatype of the variable. Then compare the variable to 4.6 using the
<=
relational operator and assign the result to a
different variable. Print out both the variables. Do the results make
sense?
Given that TRUE
and
FALSE
are equivalent to 1 and 0 respectively, summing the
elements of a logical vector tells us how many comparisons were true.
The sum
function (see ?sum
in console) sums
the values of all its arguments. Use sum
to determine how
many flowers above had sepal length less than or equal to
4.6.
Assign the first 10 rows and
first 4 columns of the iris
data frame to a variable.
Compare the variable to 3.4 with the >=
operator and
assign the result to a new variable. Print out the value of the new
variable. Use the class()
and typeof()
functions to determine the kind of data structure and data type the
variable is respectively.
Very often we would like to combine
the results of two or more comparisons. For example, one may want to
know if there are any pickup trucks with more highway mileage than 20
miles per gallon. To do this, we would perform two comparisons: 1) which
trucks have class
variable with value "pickup"
and 2) which trucks have hwy
greater than 20. Then, we
would like to combine the comparisons so that we determine the trucks
for which both conditions are met.
We can combine logical
values/variables using boolean/logical
operators. The AND (&
) operator
returns TRUE
when both operands are true. Create a code
chunk to perform the AND between different combinations
of TRUE
and FALSE
.
The boolean operators don’t just
work between scalar logical values and variables, but between logical
vectors as well. Use the combine c()
to create two logical
vectors encapsulating all four pairs of TRUE
and
FALSE
. Use the data.frame
function to create a
data frame with three columns called IN1
, IN2
,
and OUT
, where IN1
and IN2
are
two vectors and OUT
is the result of a logical AND between
the two vectors. Print out the data frame.
Create and print the truth table
for the logical OR (|
) as was done for
AND.
Write a statement to test which
elements of the class
column of the mpg
data
frame are equal to "pickup"
and assign the result to a
variable. Do the same to test which trucks have hwy
greater
than 20 and assign to a different variable. In order to determine which
trucks meet both conditions, we will use the logical
AND operator &
. Write a statement that
determines which elements of both logical vectors are TRUE
by using the &
operator. Print out its value and
lenght. Then determine how many pickup trucks have mileage greater than
20 mpg using the sum
function.
In the example above, there is only
one pickup with hwy mileage greater than 20 mpg. It would be great to
figure out which one it is. In Module 2 Study Guide A, we learnt how to
subset vectors and data frames using the indices of the elements, rows,
or columns. However, to subset by index you have to know beforehand the
position of the element you are looking for and you are limited to only
contiguous sets of elements when you use a range such as
5:10
. R provides another way to subset vectors and data
frames, known as logical indexing that is very
powerful.
In logical indexing, instead of
specifying the numeric indices of the elements we would like to
subset, we instead specify a logical vector of the same
dimensions as the vector/data frame. R then returns those elements where
the logical vector is TRUE
. The seq()
function
returns a numeric vector with evenly spaced numbers (see
?seq
in console). Use the seq
function to
create a numeric vector of length 5 with values ranging from 1 to 5.
Print it out. Also create a logical vector of length 5 with two elements
TRUE
and the rest FALSE
. Print it out. Index
the former by the latter and assign the result to a new variable. What
is it’s length? Print out its value. Which elements of the original
vector remain?
This approach can also be used to
subset data frames. Use data.frame()
and
rnorm()
to create a data frame with 5 rows and 3 columns.
Create a two logical vectors, one of length 5 and the other of length 3,
with some values TRUE
and others FALSE
. Print
all of them out. Subset the rows using logical indexing with the 5
element-long logical vector. Check the dimensions of the subset and
print its value out. What has changed? Similarly subset the columns with
the 3 element-long logical vector.
Use logical indexing to subset
the mpg
data frame to identify the pickup truck with
highway mileage greater than 20 mpg.
It is possible to also combine
multiple conditions using more than one Boolean operator. The main thing
to remember when using multiple Boolean operators is that, just like
arithematic operators, they also have precedence rules, and lower
precendence operators must be protected (if needed) from higher
precedence ones. The order of precendence is !
>
&
> |
. How do the output of the two
statements FALSE & FALSE | TRUE
and
FALSE & (FALSE | TRUE)
differ? Why?
Load the
palmerpengiuns
library. Use logical indexing to subset the
penguins dataset for Adelie penguins that live on either Biscoe or
Dream. Determine how many there are.
The rnorm(n)
function generates n random numbers from the Normal distribution. Use
?rnorm
in the console to look up its arguments. Assign a
single random number to a variable, print out the value, and then test
whether it is less than 0 using a relational statement. Run the code
chunk repeatedly until you have seen both TRUE
and
FALSE
.
The sample(x, size)
function generates a random sample of size size
from vector
x
. Assign the manufacturer
column of the
mpg
data frame to a variable. Generate a random sample of
10 manufacturer names using sample
. Print out the sample of
names. Use a relation statement to determine which ones match
ford
and assign to variable. Print out its value. Determine
how many match ford
.
Comparisons can also be made
between two vectors of equal length. Assign the first four columns of
the first flower in the iris
dataset to a variable and do
the same for the fourth flower. Which flower is bigger?
Follow the procedures in the activities for Boolean operators to create and
print the truth table for the xor
logical operator.
Note: the syntax for xor
is different from
&
or |
(see ?Logic
in
console). What is the difference between |
and
xor
?
Use the procedures above to
create a truth table for the NOT (!
)
operator. Note: !
is a unary instead of a
binary operator, so you will need only one input logical vector with the
two possible values of a logical variable. Also, be sure to use
something other than in
as the variable name since
in
is a reserved word. What is the function of the NOT
operator?
In the mpg
dataset,
use relational and Boolean operators to determine how many subcompact
cars with 6 cyclinders were made in 1999.
Using relational and Boolean operators, determine how many Chinstrap penguins had body mass between 3000g and 3300g.
In the mpg
dataset,
how many cars except pickup trucks or suvs had highway mileage
less than 18?
Refer to question 3 in the Boolean operator exercises. Use logical indexing to determine which subcompact cars made in 1999 had 6 cylinders.
Refer to question 4 in the Boolean operator exercises. Use logical indexing to determine which islands the Chinstrap penguins with body mass between 3000g and 3300g lived?
Refer to question 5 in the
Boolean operator exercises. Use logical
indexing to determine which cars except pickup trucks or suvs
had highway mileage less than 15 in the mpg
dataset?
As we did in Module 1 HW, create
a scatterplot of flipper length vs. body mass, putting the former on the
y- and the latter on the x-axis and use color to show sex. The
is.na()
(see ?is.na
in console) function
returns TRUE if its argument is NA
(not applicable, which
is used to indicate missing data). If the argument is a vector,
is.na
returns a logical vector with TRUE
in
the positions with NA
s and FALSE
elsewhere.
Use is.na
and Boolean operators to create a logical vector
which identifies the penguins whose sex is not NA. Use
this logical vector to remove the rows which have NA
for
sex and assign to a variable. Use this variable to recreate
plot.
End of Module 2 Study Guide B