📝 Practical: Introduction
Getting started with R and the tidyverse
Recap of material covered
This week you’ve seen why biologists need statistics, and how R will help us work with data.
Why statistics?
Biology is messy: organisms differ, environments vary, and measurements are never perfect. Statistics help us separate real patterns from chance, and the history of the field grew directly out of biological problems (think of “Student’s” t-test, or Fisher’s tea‐tasting experiment).Why R?
R is a free and powerful tool for data analysis. Unlike point-and-click software, R lets us write scripts that are reproducible, transparent, and easy to share. RStudio provides a user-friendly environment to organise code, results, and plots.Why the tidyverse?
The tidyverse is a set of R packages that makes data handling easier. With it you can quickly import your data, clean and reshape it, calculate summaries, and create clear, publication-quality plots.
This worksheet gives you hands-on practice with these ideas. You’ll start with R as a calculator, create objects and vectors, and then use tidyverse functions to organise, summarise, and visualise data.
Before you begin
- Make sure you have both R and RStudio installed. Get them from here: https://posit.co/download/rstudio-desktop/
- Open RStudio. You should see four main panels: script editor, console, environment/history, and plots/files/packages/help.
R and RStudio are free (libre and gratis). R is maintained by the Comprehensive R Archive Network. RStudio Open Source Edition is provided by posit. Although posit does offer some paid-for services, these are not helpful for your participation in the unit, nor for during your research projects.
Getting started with RStudio
Follow these steps to set up your first R session.
- Open RStudio
- Find and launch RStudio on your computer.
- You should see four main panels:
- Script editor (top left)
- Console (bottom left)
- Environment/history (top right)
- Files/plots/packages/help (bottom right)
- Script editor (top left)
- Find and launch RStudio on your computer.
Your RStudio environment should look something like this. Key areas are highlighted here:
- Create a new script
- Go to File → New File → R Script or click the ‘new script’ button.
- A blank script window will appear in the top left.
- This is where you’ll write and save your code.
- Go to File → New File → R Script or click the ‘new script’ button.
- Save your script
- Save your file with a meaningful name (e.g.
intro_practical.R).
- R scripts have the extension
.R.
- Save by pressing Ctrl + S (Windows/Linux) or Cmd + S (Mac), or click on the blue “disk” icon (similar to this emoji: 💾). Make sure you save regularly!
- Save your file with a meaningful name (e.g.
- Run your first command
- Type
2 + 2into the script editor.
- Highlight it and press Ctrl + Enter (Windows/Linux) or Cmd + Enter (Mac).
- The result will appear in the console.
- Type
- Add comments
Start a line with
#to add a note to yourself.
Example:
# This line calculates 2 + 2 2 + 2
- Check your workspace
- Look at the Environment panel (top right).
- Any objects you create (like variables or data tables) will appear here.
- Look at the Environment panel (top right).
- Install a package (only once)
In the console, type:
install.packages("tidyverse")This downloads the tidyverse package.
- Load a package (every session)
In the console or script, type:
library(tidyverse)Now the tidyverse functions are ready to use.
👉 You are ready to start the worksheet!
If you hover your mouse in code boxes that appear on the worksheet, a ‘Copy to clipboard’ button will appear in the top right corner. You can also select code and copy/paste it into your RStudio script window in the normal way. However, you might find that you retain a better understanding of what you are doing if you type out the commands instead.
Objective 1. Learn how to use R as a calculator
Try typing these into the console and see what happens.
2 + 2
6 * 7
(5 + 3) / 2📝 Tasks: Perform the following calculations:
- 12^2
- the square root of 225 (sqrt())
- the log base 10 of 1000 (log10())
Objective 2. Creating objects in R
Objects let you store values and use them later. For example:
x <- 42
y <- sqrt(x)
y
# Delete (remove) the object y
rm(y)📝 Tasks:
- Create an object calledage_ywith your age in years.
- Calculate your age in months and save it asage_months. - Delete the objectage_y
Objective 3. Vectors in R
Vectors are a type of R object that holds multiple values.
height_m <- c(1.6, 1.7, 1.8, 1.65, 1.75)
mean(height_m)📝 Tasks:
- Create a new vector calledweight_kgwith 5 values.
- Usemean()andsd()to calculate the average and variation ofheight_mand andweight_kg.
- What happens if you tryheight_m + weight_kg? Does this make any sense? - CalculateBMI <- (weight_kg) / height_m ^2
Objective 4. Extending R functionality through packages and the tidyverse
The tidyverse is a collection of packages that make R easier to use.
# install.packages("tidyverse") # only needed the first time you use a package
library(tidyverse) # every time you start RYou can also install packages using Tools → Install packages… and typing in the package name
Other packages we will use in this unit include lme4 and nlme
📝 Tasks:
- Installlme4usinginstall.packages(lme4).
- Installnlmeusing Tools → Install packages….
- Reflect: which method do you prefer? Which method do you think allows for greater reproducibility? (Hint: one method you can record in your R script, the other you cannot.)
Objective 5. Using tibbles to store data
Tibbles are another type of object in R. We typically use tibbles to store data. A tibble is like a spreadsheet, where different columns represent different types of data, and rows represent different data points. If you’ve used base R in the past, this is similar to a data.frame, only with slightly tweaked features.
Let’s create a tibble of the height and mass of four students:
students <- tibble(
name = c("Alice", "Ben", "Cara", "Dan"),
height_m = c(1.62, 1.74, 1.80, 1.32),
weight_kg = c(54, 68, 72, 50)
)
studentsTo access one column of a tibble, we can use the $ notation
students$height_m # Will return the vector height_m
students$weight_kg # Will return the vector weight_kg📝 Tasks:
- Usemean()andsd()to calculate the average and standard deviation ofheight_mand andweight_kg.
- Calculate BMI usingstudents$height_mandstudents$weight_kg^2
Objective 6. Getting a grip on pipes
Pipes are special operators that send objects to functions, or the output of one function to another function. Let’s try some examples:
Return a tibble with just student names and heights:
students |>
select(name, height_m)Add a new column called BMI to data. Remember that you need to store the new object you’ve created–otherwise, you are only printing it to the screen.
students <- data |>
mutate(BMI = weight_kg / height_m^2)📝 Task:
- Use the pipe andselect()to select only the columns containing height and weight from data.
- Use the pipe andfilter()to create a new tibble containing only people taller than 1.7m.
- Use the pipe andmutate()to create a new column containing height in inches (1 inch = 2.54 cm).
Recall that mutate() allows us to make new columns, and filter() allows us to retain only rows that match certain characteristics.
💡 Remember: you can enter ?function_name in the console (e.g. ?mutate) to bring up the help file and remind yourself what a function does.
❓Challenge questions
Use what you’ve learned this week to complete the following questions. Remember to use the script editor and to save your code as a .R script.
Normally you will upload your saved .R script to Canvas before 5PM on the day of the practical. However, this is not required for the intro practical.
Question 1:
Create a numeric vector of at least 10 values representing measurements (e.g. weights of mice).
1. Calculate the mean, median, and standard deviation.
2. Use logical operators to identify which values are greater than the mean.
3. Plot a basic histogram of your values using hist().
Question 2:
- Create a new tibble with at least 6 rows of your own made-up data (e.g. plant growth under different watering treatments).
- Add a new calculated column of your choosing. Remember you need to assign the new output to an object name!
- View the new tibble by clicking on it in the environment viewer.
Question 3:
Make a tibble recording the heights and sexes of 8 imaginary plants.
1. Add a new column that converts height from cm to m using mutate().
2. Use group_by() and summarise() to calculate the mean height for each sex.
Question 4:
- Create a tibble of 12 students with columns for
nameandfavourite_colour.
- Use
count()to tally how many students chose each colour.
- Which colour was most popular?
Question 5:
Work with a built-in example dataset, Edgar Anderson’s Iris Data, iris, which gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
- Load the dataset
iriswithas_tibble(iris). Don’t forget to assign it a name using the assignment operator<-.
- Inspect the new dataset object you created.
- Use
select()to retain only the columnsSepal.Length,Petal.Length, andSpecies.
Recap: this week you’ve practiced
- How to use R as a calculator and create objects
- How to work with vectors and basic functions like
mean()andsd()
- How to install and load packages, including the tidyverse
- How to create and explore tibbles to store data
- How to use pipes (
|>) with functions such asselect(),filter(), andmutate()
These steps form the foundation for reproducible data analysis in R, which we will build on in later practicals.