Linear mixed-effects models

In this workshop we will use the nlme package in R to fit linear mixed-effects models to data. See the "fit models" submenu of the Rtips page for help. You might also need to refer to the "display" page to help visualize your results.

Linear mixed-effects models are used when you have random effects, which occurs when your subjects or units are grouped. The groups are assumed to be randomly sampled from a "population" of groups. Example situations include

when you divide up plots and apply separate treatments to the parts
when your sampling design is nested (quadrats within transects; transects within woodlots; woodlots within districts)
when you have measurements on related individuals
when you measure subjects or units repeatedly

Linear models for mixed effects are implemented in the R command lme in the nlme library, which is included in your basic R installation.

To begin using methods for fitting linear mixed-effects models, load the nlme library:

library(nlme)

Repeatability of a sexual signal trait

This first data set was extracted from a paper by Griffith and Sheldon (2001, Animal Behaviour 61: 987–993), who measured the white forehead patch of 30 male collared flycatchers in two years on the Swedish island of Gotland. The patch is important in mate attraction. Our goal here will be to estimate the repeatability of patch length (mm).
The data are here.

Read and examine the data

Read the data from the file.
View the first few lines of data to make sure it was read correctly.
Create a plot showing the pair of measurements for each individual flycatcher. Use a plot that shows you the difference between repeat measurements made on the same individual.

Fit a linear mixed-effects model

Fit a linear mixed-effects model to the data, treating the individual birds as the random groups. Store the results in an lme object.
Note: The two measurements on each bird were taken in successive years of the study. For simplicity here, do not include "year" as a factor in the model.
(OK, if you really want to try including "year" as a fixed factor in the model, go ahead. Just make sure to convert year to a factor in R or your intercept will be hard to interpret. Recalculate repeatability with this model as described in steps (2) and (3) below. How is the interpretation of repeatability changed?)
Extract parameter estimates from the saved lme object. What does the "Intercept" corresponding to the fixed effect refer to? What do the standard deviations ("StdDev") for the random effects refer to?
Extract the variance components from the fitted model. Use these to estimate the repeatability of patch length from year to year*.
Interpret the measure of repeatability obtained in (3). If the repeatability you obtained is less than 1.0, what is the source of the variation among measurements within individuals. Is it measurement error alone?
(Optional: Produce a plot of residuals against fitted (predicted) values. Notice anything odd? This isn't a mistake. To confirm what is happening, plot the original data in a stripchart and superimpose the lme fitted (predicted) values (see the repeatability example under "Fit a linear mixed-effects model" in the R tips pages for help and additional information on what is happening)).
(Optional: Fit a linear regression of the 1999 measurements on the 1998 measurements. What slope did you obtain? Was it less than, equal to, or greater than 1? Did males with the largest patches in 1998 have larger, equal, or smaller patches in 1999 on average? Now switch the variables: fit a linear regression of the 1998 measurements on the 1999 measurements. What slope did you obtain? Was it less than, equal to, or greater than 1? Did males with the largest patches in 1999 have larger, equal, or smaller patches in 1998 on average? What name is given to this seemingly paradoxical pattern? Can you explain why it occurs?).

* You've done this correctly if you end up with the repeatability value 0.776.

Goldie's vision

Cronly-Dillon and Muntz (1965; J. Exp. Biol 42: 481-493) used the optomotor response to measure color vision in the goldfish. We looked at some of these data already in lecture. Here we will examine the full set of wavelengths tested. Each of 5 fish was tested at all the wavelength in random order. A large value of sensitivity indicates that the fish can detect a low light intensity. An important feature of the optomotor response is that fish don't habituate, and it is unlikely that a measurement of visual sensitivity under one wavelength would have an effect on later measurements at another wavelength. The data are here.

Read and examine the data

Read the data from the file.
View the first few lines of data to make sure it was read correctly.
Use an interaction plot to compare the responses of individual fish across the different light wavelength treatments. Adjust the options so that all the labels are visible along the x-axis.

Fit a linear mixed-effects model

Fit a linear mixed-effects model to the data. Store the results in an lme object.
Use an interaction plot to show the fitted (predicted) values instead of the observed data*. Note that the predicted lines for different fish are parallel -- this is because you have fit a model without an interaction term (including an interaction term is not possible with these data because there is only one measurement per fish per wavelength).
Compare the predicted values from (2) with the observed values (these are indicated in the interaction plot of the data). The difference between the predicted and observed values are the residuals, whose variance is also estimated by lme.
What assumptions are you making in (1)?
Create a plot of residuals against fitted values to check one of these assumptions.
Extract parameter estimates from the saved lme object. The parameters for the fixed effects have the same interpretation as in the case of a categorical variable analyzed using lm (arbitrarily, the group "nm426" is set as the "control"). What do the standard deviations ("StdDev") for the random effects refer to?
Generate the ANOVA table for the lme object. What effects are tested here, the random effects or the fixed effects? Interpret the ANOVA results.

* If you've fitted the model correctly, your interaction plot will have 5 lines overlaid almost on top of one another, resembling a single line. This implies that there was very little variance among individual fish in this study. Check the variance components from your lme fit to see if your estimate of variance among individuals agrees with this interpretation. Note that the result doesn't necessarily imply that the variance among individuals in the population is truly that small. Look at the confidence limits on the variance components to determine the range of values for the between-fish variance in the population are not ruled out by the data.

Yukon yarrow

The Kluane project experimentally investigated the effects of fertilization and herbivory on vegetation dynamics in the boreal forest ecosystem of Kluane National Park in the Yukon (Krebs, C.J., Boutin, S. & Boonstra, R., eds (2001a) Ecosystem dynamics of the Boreal Forest. The Kluane Project. Oxford University Press, New York). The current data are from a study of the effects of plant resources and herbivory on the defensive chemistry of understory plant species. Sixteen 5x5 m plots were randomly assigned one of four treatments. Four of the plots were surrounded by a fence exclosure to exclude herbivores; four were fertilized with N-P-K fertilizer; four received both the exclosure and fertilizer; and four were left as untreated controls. Each of the 16 plots was divided in two. One side of each plot received its treatment continually over the 20 years of study. The other half of each plot received the treatment only for the first ten years, after which it was left to reverse to its untreated state. The file, accessible here, contains measurements of the concentration of phenolics (a crude measure of plant defense compounds) in yarrow (Achillea millefolium), a herb common in the plots. The measurement units are mg Tannic Acid Equivalent per g dry weight. The data were kindly provided by P. de Koning and R. Turkington.

Read and examine the data

Read the data from the file.
View the first few lines of data to make sure it was read correctly. Plot and treatment are self-explanatory. Treatment is given as a single variable with four levels (let's stick with this approach rather than converting to two variables, enclosure and fertilizer, for a 2x2 factorial design). Duration indicates whether the half-plots received the treament for the full 20 years or whether the treatment was stopped ("reversed") after 10 years. The variable "phen.ach" is the concentration of phenolics in yarrow.
Generate a graph that illustrates the concentrations of phenolics in yarrow in the different treatments and durations. For now, treat "treatment" as a single variable with four levels. Later I will ask you to add the fitted (predicted) values to the display, so consider using a type of graph that will permit this.

Fit a linear mixed-effects model

The experiment used a split-plot design, in which whole plots were randomly assigned different treatments, and then different levels of a second treatment (duration) were assigned to plot halves. Fit an appropriate linear mixed model to the data. Include the interaction between the two fixed effects in the model. Store the results in an lme object.
Check the residuals in relation to the fitted values to investigate one of the assumptions of lme. If necessary, use a transformation to remedy any deficiencies and refit the model.
Add the fitted (predicted) values from the lme model to your graph of the data created in the previous section. Describe in words how the model is fitting the data.
Extract the parameters from the fitted model object. In the output generated, you will see two quantities given for "StdDev" under the label "Random effects". Explain what these quantities refer to.
Generate the ANOVA table for the fixed effects. Summarize the conclusions that may be drawn from the results. By default, lme will fit the model terms sequentially rather than use the "drop one" approach. Repeat the ANOVA table using "drop one" fits instead. Are the results any different? How? Why?
(Optional: Break the single treatment variable into two variables, one for fertilizer treatment and one for exclosure treatment. This takes advantage of the 2x2 design of these factors, and allows interactions between the two treatments to be investigated. Fit a new, linear mixed model to the data and describe the results.)