Phylogenetic comparative methods
In this workshop will explore some of the tools available in R for analyzing species as data points. For help with R commands to carry out resampling, see the phylogeny tab on the R tips help pages.Rockfish evolution
Rockfish (genus Sebastes) are a hugely diverse group of mainly Pacific Ocean fishes. There are over 100 recognized species inhabiting everything from the intertidal zone to waters over 2000 m deep. They have some of the highest longevities of any fish, with a maximum reported age of 205 years. Data on maximum body size (length), age (lifespan) and habitat depth of 56 species is provided here in a .csv file. The data were gathered from FishBase, from Love (2002), and by Travis Ingram.A phylogenetic tree of the 56 species is provided in a text file here. The tree, from Hyde & Vetter (2007), is a consensus Bayesian tree based on 7 mitochondrial and 2 nuclear genes.
- Download the file containing the phylogenetic tree. Inspect to determine whether it is in newick or nexus format and input into R using the appropriate command.
- Plot the phylogeny. You may have to adjust an option to minimize overlapping the labels. Take a moment to admire the structure of the tree. Branch lengths are intended to reflect time. Does it look like the genus diversified mainly in a sudden early burst, a recent explosion, or a steady growth in the number of species through time? Notice that all the tips are contemporaneous.
- Obtain the species measurements from the data file and input to a data frame. Check that the species names in the trait file and in the phylogenetic tree are identical and are listed in the same order (this is a requirement for the methods we will be using from the ape package).
TIPS analysis
Let's begin with an analysis that ignores phylogeny, so that we have a baseline for comparison.- Inspect scatter plots of the species data. Make any necessary transformations here to help meet the assumptions of linear models.
- Choose one pair of variables in the data set to compare (suitably transformed). Carry out a linear model fit ignoring phylogeny for now.
- Examine the diagnostic plots in (2) to check your assumptions.
- Extract the regression coefficients and standard errors from the model object created in (2). Take note of the values for the slope estimate.
- Obtain the correlation coefficient between the variables used in (2). Take note of the value of the estimate.
PICs (phylogenetically independent contrasts)
Let's use the same variables and apply phylogenetically independent contrasts instead.- Convert the same two variables used in your TIPS analysis to phylogenetically independent contrasts. Create a scatter plot of the contrasts in your two variables. Are they associated?
- Fit a linear model to the independent contrasts you created in (1). Use the contrasts corresponding to the same response and explanatory variables as in your TIPS analysis. Examine the diagnostic plots to check the linear model assumptions.
- Extract the regression coefficients and standard errors from the model object created in (2). How does the slope* compare with that obtained in your TIPS analysis? Is the standard error from the PICs analysis greater than, less than, or the same as in your TIPS analysis? (Meta-analyses have often found that PICs yield a similar answer to an analysis that ignores phylogeny, but your specific case might or might not.)
- Calculate the correlation coefficient** on the independent contrasts (consult the R tips pages if necessary).
** 0.625
General least squares
- Carry out the equivalent analysis to PICs using general least squares instead. Confirm that the slope coefficient and its standard error are identical to that obtained in the analysis of independent contrasts.
- Examine the residual plot from the model fit in (1). Notice that the mean of the residuals is not zero (calculate the mean of the residuals to confirm this). This is because the GLS analysis does not weight each of the observations equally.
- (In effect, GLS fits a linear model to the variables after transforming them according to values computed from the phylogenetic correlation matrix. A disadvantage of the residual plot here is that we aren't seeing the diagnostic plot for the transformed variables, which would be nice. These can be calculated "by hand" but it is no fun*.)
Estimate Pagel's λ
- Does incorporating phylogeny result in a better fit to the data than ignoring phylogeny? Use Pagel's λ to help decide this. Using GLS, fit your model again while fixing Pagel's λ = 1. Refit, but this time using λ = 0. Which fit has the higher log-likelihood? How large is the difference in their AIC scores?
- Find the maximum-likelihood estimate of λ*. Is the AIC score improved when a linear model is fitted using this maximum likelihood value for λ ?
- Repeat the analyses with the other pairs of variables to decide whether including phylogeny generally improves the fit of linear models to these data.