πŸ“ Practical: Classical statistics in R

Hypothesis testing to correlation

Author

BIOL33031/BIOL65161

Practical info

This worksheet will help you practice the key ideas behind hypothesis testing β€” including the null hypothesis, p-values, test selection, and interpretation β€” using real data and R code.

You’ll apply tests to different types of data (continuous and categorical), interpret p-values, and connect statistical decisions with biological questions.

BIOL65161 students

Please upload your saved .R script to Canvas before 5PM on the day of the practical.


Part 1: Understanding hypotheses

  1. In your own words, what does the null hypothesis (Hβ‚€) represent?

  2. Suppose you’re testing whether a new antibiotic reduces bacterial growth compared with a control.
    Write out suitable null (Hβ‚€) and alternative (H₁) hypotheses.


Part 2: t-tests

We’ll use simulated data to explore how p-values behave.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
βœ” dplyr     1.1.4     βœ” readr     2.1.5
βœ” forcats   1.0.1     βœ” stringr   1.6.0
βœ” ggplot2   4.0.0     βœ” tibble    3.3.0
βœ” lubridate 1.9.4     βœ” tidyr     1.3.1
βœ” purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()
β„Ή Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Simulate bacterial growth under control vs antibiotic
set.seed(42)
growth <- tibble(
  treatment = rep(c("Control", "Antibiotic"), each = 15),
  OD600 = c(rnorm(15, mean = 0.8, sd = 0.05),
            rnorm(15, mean = 0.7, sd = 0.05))
)

growth |> head()
# A tibble: 6 Γ— 2
  treatment OD600
  <chr>     <dbl>
1 Control   0.869
2 Control   0.772
3 Control   0.818
4 Control   0.832
5 Control   0.820
6 Control   0.795

2.1 Visualise the data

Make a boxplot showing mean growth (OD600) by treatment group.

2.2 Run a two-sample t-test

Use a t-test to test whether the mean growth differs between treatments.

2.3 Check your interpretation

If the p-value is 0.012, what does that mean in context?

2.4 Run a one-tailed t-test

We might instead ask whether the antibiotic specifically reduces growth.

Re-run the t-test using a one-tailed test (hint: use alternative = "less").


Part 3: Working with categorical data

Now suppose we have counts of resistant and sensitive isolates from two species.

microbiology <- tibble(
  Species = c(rep("E. coli", 100), rep("P. aeruginosa", 100)),
  Resistance = c(rep(c("Resistant", "Sensitive"), times = c(45, 55)),
                 rep(c("Resistant", "Sensitive"), times = c(70, 30)))
)

table(microbiology)
               Resistance
Species         Resistant Sensitive
  E. coli              45        55
  P. aeruginosa        70        30

3.1 Create a contingency table

Use table() to summarise the counts by Species and Resistance.

3.2 Perform a chi-squared test

Run a χ² test to check if resistance is associated with species.

3.3 Fisher’s exact test

Run a Fisher’s exact test on the same data. Does the outcome differ? When should you use Fisher’s exact test?


Part 4: Correlation and relationships

We’ll now explore relationships between two continuous variables using the Palmer Penguins dataset.

install.packages("palmerpenguins") # If not already installed
Installing package into '/home/mqbssdgg/R/x86_64-pc-linux-gnu-library/4.5'
(as 'lib' is unspecified)
library(palmerpenguins)

Attaching package: 'palmerpenguins'
The following objects are masked from 'package:datasets':

    penguins, penguins_raw
penguins |> glimpse()
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

4.1 Visualise a relationship

Make a scatter plot of bill_length_mm vs flipper_length_mm coloured by species using ggplot() and geom_point()

4.2 Calculate the correlation coefficient

Use cor() to find the correlation between bill length and flipper length for AdΓ©lie penguins only.

4.3 Test correlation significance

Use cor.test() to determine if the correlation is statistically significant.


Part 5: Reflection

  1. What does it mean if a result is statistically significant but not biologically meaningful?

  2. Why do we say we β€œfail to reject” Hβ‚€ instead of β€œaccepting” it?

  3. Why is correlation not the same thing as causation?

  4. When might a one-tailed test be more appropriate than a two-tailed test?

  5. What kind of data are suitable for a t-test compared with a chi-squared test?

  6. What does a p-value actually represent in hypothesis testing?

  7. Why should we check assumptions (like normality or independence) before running a test?

  8. What kind of biological question would be best addressed using a correlation analysis?