📝 Practical: Classical statistics in R

Hypothesis testing to correlation

Author

BIOL33031/BIOL65161

Practical info

This worksheet will help you practice the key ideas behind hypothesis testing — including the null hypothesis, p-values, test selection, and interpretation — using real data and R code.

You’ll apply tests to different types of data (continuous and categorical), interpret p-values, and connect statistical decisions with biological questions.

BIOL65161 students

Please upload your saved .R script to Canvas before 5PM on the day of the practical.

Part 1: Understanding hypotheses

In your own words, what does the null hypothesis (H₀) represent?
Suppose you’re testing whether a new antibiotic reduces bacterial growth compared with a control.
Write out suitable null (H₀) and alternative (H₁) hypotheses.

Part 2: t-tests

We’ll use simulated data to explore how p-values behave.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Simulate bacterial growth under control vs antibiotic
set.seed(42)
growth <- tibble(
  treatment = rep(c("Control", "Antibiotic"), each = 15),
  OD600 = c(rnorm(15, mean = 0.8, sd = 0.05),
            rnorm(15, mean = 0.7, sd = 0.05))
)

growth |> head()

# A tibble: 6 × 2
  treatment OD600
  <chr>     <dbl>
1 Control   0.869
2 Control   0.772
3 Control   0.818
4 Control   0.832
5 Control   0.820
6 Control   0.795

2.1 Visualise the data

Make a boxplot showing mean growth (OD600) by treatment group.

2.2 Run a two-sample t-test

Use a t-test to test whether the mean growth differs between treatments.

2.3 Check your interpretation

If the p-value is 0.012, what does that mean in context?

2.4 Run a one-tailed t-test

We might instead ask whether the antibiotic specifically reduces growth.

Re-run the t-test using a one-tailed test (hint: use alternative = "less").

Part 3: Working with categorical data

Now suppose we have counts of resistant and sensitive isolates from two species.

microbiology <- tibble(
  Species = c(rep("E. coli", 100), rep("P. aeruginosa", 100)),
  Resistance = c(rep(c("Resistant", "Sensitive"), times = c(45, 55)),
                 rep(c("Resistant", "Sensitive"), times = c(70, 30)))
)

table(microbiology)

               Resistance
Species         Resistant Sensitive
  E. coli              45        55
  P. aeruginosa        70        30

3.1 Create a contingency table

Use table() to summarise the counts by Species and Resistance.

3.2 Perform a chi-squared test

Run a χ² test to check if resistance is associated with species.

3.3 Fisher’s exact test

Run a Fisher’s exact test on the same data. Does the outcome differ? When should you use Fisher’s exact test?

Part 4: Correlation and relationships

We’ll now explore relationships between two continuous variables using the Palmer Penguins dataset.

install.packages("palmerpenguins") # If not already installed

Installing package into '/home/mqbssdgg/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)

library(palmerpenguins)


Attaching package: 'palmerpenguins'

The following objects are masked from 'package:datasets':

    penguins, penguins_raw

penguins |> glimpse()

Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

4.1 Visualise a relationship

Make a scatter plot of bill_length_mm vs flipper_length_mm coloured by species using ggplot() and geom_point()

4.2 Calculate the correlation coefficient

Use cor() to find the correlation between bill length and flipper length for Adélie penguins only.

4.3 Test correlation significance

Use cor.test() to determine if the correlation is statistically significant.

Part 5: Reflection

What does it mean if a result is statistically significant but not biologically meaningful?
Why do we say we “fail to reject” H₀ instead of “accepting” it?
Why is correlation not the same thing as causation?
When might a one-tailed test be more appropriate than a two-tailed test?
What kind of data are suitable for a t-test compared with a chi-squared test?
What does a p-value actually represent in hypothesis testing?
Why should we check assumptions (like normality or independence) before running a test?
What kind of biological question would be best addressed using a correlation analysis?