Call:
glm(formula = count ~ spray, family = poisson, data = InsectSprays)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.67415 0.07581 35.274 < 2e-16 ***
sprayB 0.05588 0.10574 0.528 0.597
sprayC -1.94018 0.21389 -9.071 < 2e-16 ***
sprayD -1.08152 0.15065 -7.179 7.03e-13 ***
sprayE -1.42139 0.17192 -8.268 < 2e-16 ***
sprayF 0.13926 0.10367 1.343 0.179
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 409.041 on 71 degrees of freedom
Residual deviance: 98.329 on 66 degrees of freedom
AIC: 376.59
Number of Fisher Scoring iterations: 5
Generalised linear modelling
Understanding response distributions
Link to slides: PowerPoint and PDF.
So far, we’ve assumed that our response variable is continuous and roughly normally distributed — for example, when using lm() or lmer().
But many biological datasets don’t fit that assumption: we often measure counts, proportions, or binary outcomes.
To deal with these cases, we use generalised linear models (GLMs), which let us specify a different distribution for the response.
When normality doesn’t make sense
Let’s imagine two examples:
Counting pigeons in a garden
The data are counts (0, 1, 2, …) that can’t be negative.
These follow a Poisson distribution, defined by a single parameter, λ (lambda), which represents both the mean and variance.Counting growth on plates
You might have four plates and record “growth” or “no growth” for each, a binary outcome.
These data are binomially distributed, with two parameters:size: number of trials (e.g. number of plates)prob: probability of success (e.g. probability of growth)
Both datasets are numeric, but neither is likely to have normally distributed residuals — so a normal‐error model is inappropriate.
Rather than transforming data to fit a normal model, GLMs let us directly specify an appropriate response family using the family argument in R functions such as glm() or glmer(). Examples include family = poisson for counts or family = binomial for binary outcomes. The video also briefly mentions other types of models — multinomial, negative binomial, and survival models — which follow similar logic but suit different data types. Later videos will show how to fit and interpret these models in practice.
Fitting and checking models
Link to slides: PowerPoint and PDF.
Now that we understand that GLMs allow us to choose different distributions for the response, this video shows how to fit these models and check whether they make sense.
Example: insect counts and spray treatments
We’ll use a classic dataset where insects were counted on agricultural plots treated with different sprays. These are count data, so we use a Poisson model.
This model asks whether insect counts depend on the type of spray — conceptually similar to an ANOVA, but using a Poisson error distribution.
Understanding the output
The model summary looks familiar, but there are a few new elements:
- Residual deviance and null deviance replace residual variance.
- For Poisson models, the mean and variance are expected to be equal.
- We can check this by dividing the residual deviance by its degrees of freedom.
- If that ratio is around 1, great. If it’s much higher, we may have overdispersion — more variability than the model expects.
dispersion <- glm_pois$deviance / glm_pois$df.residual
dispersionIf this ratio is > 1, the data are more variable than a perfect Poisson model would expect.