Please run these code chunks before you start.

library(vembedr)
library(ggplot2)
library(palmerpenguins)
load("../../Homeworks/Module2/Homework.RData")

1 R data types and structures

  1. Insert a code chunk to determine what datatype the variables FFF, GGG, HHH, and III are.

  2. Inspect the value of GGG and HHH in the Environment tab of the Environments pane. What sort of data structures are they?

  3. Why are GGG and HHH different types even though both have the same values?

  4. Print out the 31st value of FFF.

  5. Print out the 13th through 23rd values of III.

  6. Print out the value of the variable JJJ. What kind of data structure is JJJ?

  7. Write code to print out the number of row and number of columns of JJJ.

  8. Use the summary() function to determine datatype of the col3 column of mpg.

  9. Print out the 17th entry in the 2nd column of JJJ.

  10. Print out the entire column col1.

  11. Print out the 11th row of JJJ.


2 Working with and visualizing datasets

Note: In all plots, give the plot a title and create proper x-, y- and color-axis (if applicable) labels with units if available.

2.1 Plotting distributions and relationships

  1. Read about the chickwts dataset. What kind of variable is feed? What would be an appropriate way to visualize its distribution? Generate the plot.

  2. What kind of variable is the weight? What would be an appropriate plot for visualizing its distribution? Visualize its distribution.

  3. What would be an appropriate way to visualize the distribution of weight as it depends on feed? Visualize it. What may be concluded?

  4. What would be an appropriate plot to visualize the proportions of transmission types used in each class of vehicle? Make the plot. In which class of vehicle are manual transmissions most common? In which class of vehicle are automatic transmissions most popular?

  5. In the mpg dataset, visualize the relationship between engine displacement and the highway mileage. Visualize how this relationship changes with the drive train using facets. What may be concluded?

  6. The trees dataset contains the diameter, height, and volume for black cherry trees. Use ?trees in the console to read up on the variables. What kind of variables are the three? What would be an appropriate type of plot to visualize the relationship of all three variables? Make the plot. What can you infer about the relationship between the three variables?


3 Relational operators

  1. Use a relational operator to write an expression that checks whether the body mass of the 6th penguin in the penguins dataset is greater than or equal to that of the 33rd penguin and assigns the result to a variable. Print out the value of the variable. What is its datatype? Is the 6th penguin greater than or equal to the 33rd in body mass?

  2. The sample(x, size, replace = TRUE) function generates a random sample of size size from vector x with replacement (the same element of x can be samples multiple times). Assign a vector of the numbers 1 through 6, representing the outcome of a die roll, to a variable. Use the sample function and this vector to randomly sample 100 die rolls and assign the result to another variable. What is the data structure produced by sample? What are its dimensions? How many of these do you expect to be equal to 2? Write an expression that checks which rolls resulted in 2’s and assign to a variable. What data structure and type is this variable? Write an expression to determine how many rolls resulted in 2’s. Did the result match your expectation?

  3. Use indexing to assign the 3rd through the 6th value of the 8th row (penguin) in the penguins dataset to a variable. Do the same for the 150th row. Write a relational statement checking whether the 33rd penguin has smaller measurements than the 8th and assign the result to a variable. What is the data structure or the result? What is the datatype?


4 Boolean (logical) operators

  1. Use relational and Boolean operators to determine how many Adelie penguins live on the Biscoe island.

  2. Use the approach from problem #2 above to sample 1000 die rolls and assign to a variable. Sample a second time (representing another die) and assign to another. Then use relational and Boolean operators to determine rolls in which both dice produced 1 or both dice produced 2. How many do you expect? Did the result match expectation?

  3. In the mpg dataset, determine how many subcompacts were made by Ford, Honda, and Subaru in the year 2008.


5 Logical indexing to subset vectors and data frames

  1. Refer to problem 1 in the Boolean operator exercises above. Use logical indexing to subset the penguins dataset for Adelies living on Biscoe and plot only their flipper length vs body mass.

  2. Refer to question 3 in the Boolean operator exercises above. Use logical indexing to determine which subcompact models were made by Ford, Honda, and Subaru in 2008?

  3. Are there any flowers in the iris dataset with petal length more than 10 times petal width? What species have such thin petals? Use relational and Boolean operators and logical indexing to answer.


End of Module 2 HW