Selecting among candidate models requires a criterion for
evaluating and comparing models, and a strategy for searching the
possibilities. In this workshop we will explore some of the tools
available in R for model selection. You might need to download and
install the MuMIn
package from the CRAN website to carry
out all the exercises.
For help with these exercises, see the “Fit model” link on the R tips web page.
Savage et al. (2004, Functional Ecology 18: 257-282) used data to reevaluate competing claims for the value of the allometric scaling parameter \(\beta\) relating whole-organism metabolic rate to body mass in endotherms: \[BMR = \alpha M^\beta\] In this formula \(BMR\) is basal metabolic rate, \(M\) is body mass, and \(\alpha\) is a constant. On a log scale this can be written as \[\log(BMR)=\log(\alpha)+\beta\log(M)\] where \(\beta\) is now a slope parameter of an ordinary linear regression – a linear model. Theory based on optimization of hydrodynamic flows through the circulation system predicts that the exponent should be \(\beta = 3/4\), whereas we would expect \(\beta =2/3\) if metabolic rate scales with heat dissipation and therefore body surface area. These alternative scaling relationships represent distinct biophysical hypotheses. We will use them as candidate models and apply model selection procedures to compare their fits to data.
Savage et al. compiled data from 626 species of mammals. To simplify, and reduce possible effects of non-independence of species data points, they took the average of \(\log(BMR)\) among species in small intervals of \(\log(M)\).
The resulting values of basal metabolic rate and mass can be downloaded here. Body mass is in grams, whereas basal metabolic rate is in watts.
Plot the data. Is the relationship between mass and metabolic rate linear on a log scale?
Fit a linear model to the log-transformed data (original data are not on the log scale). What is the estimate of slope?
Produce a 95% confidence interval for the slope. Does the interval include either of the candidate values for the scaling parameter \(\beta\)?
Add the least squares regression line from (2) to your plot.
Now let’s use model selection to compare the fits of the two candidate models to the data using the following steps.
To begin, you need to force regression lines having specified slopes through the (log-transformed) data. Replot the data indicating the relationship between \(\log(M)\) and \(\log(BMR)\). Add to this plot the best-fit line having slope 3/4. Repeat this for the slope 2/3. By eye, which line appears to fit the data best?
Compare the residual sum of squares of the two models you fit in (5). Which has the smaller value? Do these values agree with your visual assessment of your plots in (6)?
Calculate the log-likelihood of each model fitted in (5). Which has the higher value?
Calculate AIC for the two models, and the AIC difference*. By this criterion, which model is best? How big is the AIC difference?
In general terms, what does AIC score attempt to measure?
Calculate the Akaike weights of the two models**. Which has the higher weight of evidence in its favor? These weights would be used in Multimodel Inference (such as model averaging), which we won’t go into in this course. The weights should sum to 1. (They are sometimes wrongly interpreted as the posterior probability that the given model is the “best” model, assuming that the “best” model is one of the set of models being compared. This interpretation could work but it makes assumptions that we won’t go into right now.)
Summarize the overall findings. Do both models have some support, according to standard criteria, or does one of the two models have essentially no support?
Why is it not possible to compare the two models using a conventional log-likelihood ratio test***?
Optional: Both theories mentioned earlier predict that the relationship between basal metabolic rate and body mass will conform to a power law — in other words that the relationship between \(\log(BMR)\) and \(\log(M)\) will be linear. Is the relationship linear in mammals? Use AIC to compare the fit of a linear model fitted to the relationship between \(\log(BMR)\) and \(\log(M)\) with the fit of a quadratic regression of \(\log(BMR)\) on \(\log(M)\) (a model in which both \(\log(M)\) and \((\log(M))^2\) are included as terms). Don’t force a slope of \(2/3\) or \(3/4\). Plot both the linear and quadratic regression curves with the data. Which model has the most support? Which has the least? On the basis of this analysis, does the relationship between basal metabolic rate and body mass in mammals conform to a power law?
* 23.73591
** 9.99e-01 7.01e-06
***The models are not
nested.
In the current example we are going data dredging, unlike the previous example, with all its attendant risks. We have no candidate models. Let’s just try all possibilities and see what turns up. The data include a set of possible explanatory variables and we want to known which model, of all possible models, is the “best”. Sensibly, we also wish to identify those models that are near-best and should be kept under consideration (e.g., for use in planning, or subsequent multimodel inference).
The response variable is the abundance of forest birds in 56 forest fragment in southeastern Australia by Loyn (1987, cited in Quinn and Keough [2024] and analyzed in their Box 8.2). Abundance is measured as the number of birds encountered in a timed survey (units aren’t explained). Six predictor variables were measured in each fragment:The data can be downloaded here.
Using histograms, scatter plots, or the pairs
command, explore the frequency distributions of the variables. Several
of the variables are highly skewed, which will lead to outliers having
excessive leverage. Transform the highly skewed variables to solve this
problem. (I log-transformed area
, dist
and
ldist
. The results are not perfect.)
Use the cor
command to estimate the correlation
between pairs of explanatory variables. The results will be easier to
read if you round to just a couple of decimals. Which are the most
highly correlated variables?
Using the model selection tool dredge()
in the
MuMIn
package, determine which linear model best predicts
bird abundance (use AIC as the criterion). dredge()
carries
out an automated model search using subsets of the ‘global’ model
provided. Ignore interactions for this exercise. (You will need to
install the MuMIn
package if you haven’t yet done
so.)
How many variables are included in the best model*?
Count the number of models in total having an AIC difference less than or equal to 7. This is one way to decide which models have some support and remain under consideration.
Another way to determine the set of models that have support is
to use AIC weights. Calculate the Akaike weights of all the models from
your dredge()
analysis. How much weight is given to the
best model**? Are there common features shared among the models having
the highest weights?
How many models are in the “confidence set” whose cumulative weights reach 0.95***?
Use a linear model to fit the “best” model to the data. Produce a
summary of the results. Use visreg()
to visualize the
conditional relationship between bird abundance and each of the three
variables in the “best” model one at a time. Visually, which variable
seems to have the strongest relationship with bird abundance in the
model?
Generate an ANOVA table for the best model. Use Type 2 or Type 3 sums of squares so that the order of entry of the main effects in the formula don’t affect the tests (there are no interactions). Why should we view the resulting \(P\)-values with a great deal of skepticism****?
Notice that in your ANOVA table, not all terms in the best model are stastically significant at \(P < 0.05\) and so would not be retained in a stepwise multiple regression process. Are you OK with this? Good.
* 3 plus intercept (plus variance of residuals makes “df” = 5
parameters estimated)
** 0.127
*** 20
**** Because we
arrived at this model by data dredging.
Let’s try analyzing the data using stepAIC()
from the
MASS
package instead. Despite its name the method is not
carrying out stepwise multiple regression. Rather, it is using a
stepwise search strategy (hopefully) to find the “best” model (the model
minimizing the AIC score) given certain restrictions. Restrictions
include higher order terms (e.g., interaction between two variables) not
being fitted without including corresponding lower order terms (e.g.,
main effects of those same variables). Unlike dredge()
it
does not test all (restricted) subsets of the global model and so does
not provide a list of all other models that fit the data nearly equally
well as the “best” model. But it can be much faster if there are many
variables.
Return to the data set you just analyzed using
dredge()
and run model selection using
stepAIC()
instead. See the R tips pages on model selection
(under Fit model) for help interpreting the output. Did you arrive at
the same best model?
Run stepAIC
again, but this time use a model that
includes all two-way interaction terms. This is already pushing the data
to the limit, because there are only 56 data points. View the printed
output on the screen to see the sequence of steps that
stepAIC
takes to find the best model.
Estimate the coefficients of the best-fitting model.
Calculate AIC for the best model. How does it compare to the AIC value computed in previously for the best additive model (the best model without interaction terms)?** Does the additive model have “essentially no support”, as defined in lecture**?
* 360.7 vs 371.1
** Yes, because the AIC difference is
large, exceeding 10.
© 2009-2025 Dolph Schluter