In this workshop we explore the “analysis of analyses”. The first exercise will require you to make the step-by-step calculations for fixed and random effects models. The goal is to give a sense of why the two models yield different results. It would be truly wonderful if lm and lmer could take care of all the gritty calculations in a familiar setting, but it is not easy to get the correct variances using these programs. So we’ll start by going step by step. See the “Meta-analysis” tab at the R tips pages for help.


Aggressive bibs

The black throat patch or bib of male house sparrows, Passer domesticus, is called a “badge” because bib size seems to be correlated with male social status. Sanchez-Tojar et al. (2018) compiled results of published and unpublished studies that investigated the relationship between male badge size and measurements of male size, behavior and reproductive success (Meta-analysis challenges a textbook example of status signalling and demonstrates publication bias. eLife 2018;7:e37385). The data are here.

The correlation coefficient \(r\) was the effect size measuring association between bib size and these other traits. 87 estimates of the correlation between bib size and male fighting ability (“status”) are included. The study included both published and unpublished data obtained from researchers. These data are from the Supplement file “Meta4.csv” of Sanchez-Tojar et al. (2018). Most of the correlations were obtained from behavioral observations made of birds in aviaries.

  1. View the data and examine it closely. The data set is not large, but it has aspects in common with larger compilations used in meta-analyses, and raises many of the same questions. First, there are many more entries than published studies. Repeated entries from the same study represent different nearby populations, or measurements taken in different years or on different interacting groups of individuals from the same population. Often these groups are small. Can the 87 effect sizes in the table therefore be considered independent? Should we average all the values from the same population, or from the same study, before continuing? Welcome to meta-analysis.

    For the purposes of this exercise, let’s treat the 87 effect sizes as though they are independent and belong to a single fixed class. I hope this does not shock you.

  2. Create a simple scatter or “funnel” plot depicting the relationship between effect size and sample size for the house sparrow data. Include a dashed line in the plot for \(r = 0\). Does it show the expected funnel shape? Why?

  3. Statistical analysis of correlations begins by converting \(r\) to the Fisher’s \(z\)-scale. The reason is that on the \(z\)-scale, the sampling distribution for the correlation is approximately normal and the standard error is independent of \(z\). The transformed variable is indicated by the variable \(Zr\). The transformation is
    \(z = 0.5 \ln((1 + r)/(1 - r))\), or equivalently, \(z = \tanh^{-1}(r)\).

  4. A convenient feature of the Fisher \(z\) is that the approximate standard error of an estimate depends only on the sample size \(N\) (i.e., number of birds) used in each study. The squared standard error is indicated by the variable \(VZr\). The formula is \(\textrm{SE}_z=1/\sqrt{N-3}\).


Fixed effects meta-analysis

Under the fixed effects model, we assume that all studies in the meta-analysis share the same true effect size. The variation among the studies are assumed to result only from sampling error (i.e., chance). For the purposes of this exercise, let’s begin by fitting the fixed effects model to the sparrow data. Perhaps the fixed effects model is reasonable: the different studies were all carried out on the same species (house sparrow), and they correlated the same variables measured in similar (though not identical) ways. Fitting the fixed effects model involves calculating a weighted average of the separate correlations to yield an estimate the one true effect size.

  1. To estimate the mean effect size, we will weight each correlation according to the magnitude of its sampling variance, which takes into account the different sample sizes of the studies. Each weight is the inverse of the squared standard error. Calculate the weights for Fisher’s \(z\) for the sparrow data. The formula for the standard error is given above.

  2. Fit the model by calculating the weighted mean, \(\bar z\), of the \(z\)-transformed correlations. R will calculate a weighted mean if you ask it. The result is your estimate of the true mean effect size. In what way does it differ from the unweighted mean of the \(z\)-transformed correlations?

  3. Calculate the standard error of the weighted mean. This standard error measures uncertainty of the estimate of true effect size.

  4. Calculate an approximate 95% confidence interval for the effect size using the normal approximation.

  5. Convert your estimated mean effect size from (2) back to the untransformed correlation, to obtain the mean effect size \(\bar r\). This requires back-transforming,* \(\bar r = \tanh(\bar z)\) or, equivalently, \(\bar r = (e^{2\bar z} - 1)/(e^{2\bar z} + 1)\). Add a horizontal line indicating the mean effect size to your funnel plot created in the previous section.

  6. Apply the same back-transformation to the lower and upper limits of your confidence interval in (4) to yield the 95% confidence interval for the mean correlation coefficient.**

* 0.162
** (0.086 0.236)


Random effects meta-analysis

Under the random effects model we assume that each study estimates a system-specific effect size. There is no “one true” effect size under this model, only a mean effect size. This is more realistic than the fixed effects model for most data in ecology and evolution. Even though each study of male bibs was carried out on the same species (house sparrow), there is nevertheless likely to be heterogeneity from population to population, year to year, and even researcher to researcher if study methods are not the same.

  1. To fit the random effects model we need to estimate the variance among the system-specific effect sizes, \(\tau^2\) (“tau squared”). One way to estimate it involves calculating the heterogeneity among the observed effect sizes (\(Q\)), and then “correcting” by subtracting the within-study sampling variance. The correction is needed because the variance among the observed effect sizes among studies is inflated by within-study sampling error. To begin, calculate \(Q\), the weighted heterogeneity among the observed \(Zr\) values.

  2. Then estimate \(\tau^2\) by subtraction, being careful not to allow a negative value (since \(\tau^2\) is a variance, which can’t be negative).*

  3. Using \(\tau^2\), calculate new weights for the effect sizes of each study under the random effects model. Examine these new weights \(w'\) and compare them to the weights \(w\) under the fixed effects model. How are they different? Is as much weight given to large-sample studies, relative to small-sample studies, in the random effects model as in the fixed effects model?

  4. Calculate the weighted mean effect size \(\bar z\) under the random effects model. The procedure is the same as that used before for the fixed effects model except that here we will use the new weights \(w'\) calculated in the previous step. Back-transform to get the estimated mean correlation \(\bar r\).** Add the estimated mean correlation to your funnel plot. Compare your result to the effect size estimated under the fixed effects model. Is it the same?

  5. Calculate the standard error (SE) of the mean \(\bar z\). The formula is the same as that in the fixed-effects model except that here we will use the new weights.

  6. Calculate the 95% confidence interval for the mean effect size under the random effects model. Is the confidence interval narrower, about the same, or wider than that calculated under the fixed effects model? Why?

  7. Finally, back-transform to get the lower and upper limits of the 95% confidence interval for the mean correlation \(\bar r\). ***

* 0.1342
** 0.1513
*** (0.0224 0.2752)


Publication bias?

Use the metafor package to examine and compare correlations from published and unpublished studies.

  1. To begin, use the package to recalculate the quantities that you obtained “by hand” in the above random effects meta-analysis. Analyze the quantities on the Fisher-transformed (\(z\)) scale (You’ll have to back-transform to obtain the results on the \(r\) scale). Did you get the same results? So much easier!

  2. Redo the above analysis but use the restricted maximum likelihood estimates instead (the default) from now on.

  3. Produce a funnel plot of the data.

  4. Produce a forest plot of the data.

  5. Fit a meta-analysis model that includes the moderator variable “publication”. Use the summary command to estimate the difference between the means for published and unpublished studies. (Note that these values are computed on the z-transformed scale.) Do published and unpublished studies yield different results?


Latitudinal diversity gradient

There are more species in the tropics than in the temperate zone, but how strong and how general is this pattern? The latitudinal gradient is steeper for some taxa than for others, and may even be reversed in some groups. It may also vary longitudinally and between the northern and southern hemispheres. There is no consensus on the causes of the latitudinal gradient in species diversity.

Hillebrand (2004; On the generality of the latitudinal diversity gradient. American Naturalist 163:192-211) combed the literature for studies that measured the relationship between diversity and latitude. His database is HERE. Download and read into R. The data file was obtained from Table A1 in Hillebrand (2004).

Inspect the variables in the Hillebrand data set. They include measures of the latitudinal gradient and some grouping variables that will allow us to compare the gradient between situations. His Table B2 explains the various grouping variables. For this exercise, we’ll focus on the estimate from each study of the steepness of the relationship between species diversity and latitude, described by the variables:

Slope.b: The regression slope of the relationship between species diversity and absolute latitude, per degree latitude. A negative slope implies that diversity declines with increasing latitude (from the equator to the poles), whereas a positive slope represents a reversed gradient.

SE.b: The associated standard error of slope.

Measure: The measure of species diversity used in the calculation of regression slope. The most frequent is S, simply species richness (number of species), but studies using other metrics were also included. See Table B2 in the original article for details.

N: The number of sample points used to calculate the slope.

Use the metafor package to examine and compare latitudinal gradients. See the “Meta” tab at the R tips web site for help.

  1. Retain only those studies having a non-missing value for the slope variable, and for which the measure of species diversity is species richness. How many estimates of slope (latitudinal gradients) are included in this data set* ? From how many unique studies are these estimates obtained (as indicated by the variable “Source”)* ? How many unique types of organisms are included, according to the variable “Organisms”* ? What do these numbers indicate about the likelihood that the many estimates of slope included in the meta-analysis represent an independent sample of estimates from nature?

  2. Produce a funnel plot of the relationship between the effect size (slope) and its standard error. You’ll notice that there are some crazy outliers, and I have confirmed from the source that at least one is a mistake. For the purposes of this exercise, retain only those studies having an absolute value of slope less than 40. An estimated rate of change in species diversity of more than 40 species per degree latitude seems a bit unrealistic.

  3. Is the mean slope of the latitudinal gradient positive or negative? Use the metafor package and the REML method to find out**. Should you use the fixed effects or the random effects model? Why? Does your answer agree with the conventional picture of the latitudinal gradient? Note that the confidence interval assumes a normal distribution of effect sizes, so you should take these upper and lower limits with the appropriate degree of skepticism.

  4. Would you expect homeotherms or ectotherms to have a steeper average latitudinal gradient? Analyze the Hillebrand data set to find out. Start by making a table of the frequencies of cases in each category of the variable “Thermoregulation”. When fitting the meta-analysis model, leave out the two cases whose thermoregulation state is blank. Did the answer agree with your expectation?

  5. Is the mean latitudinal gradient in species diversity steepest in marine, terrestrial, or freshwater realms? Leave out the category “Aquatic”, which is ambiguous.

  6. Is the mean latitudinal gradient in species diversity steeper in the northern or southern hemisphere?

* 385; 175; 134. The 385 estimates are unlikely to represent a random sample of estimates from nature.
** -1.8025. On average, species richness drops about 1.8 species per degree latitude. You should use the random effects model, because each taxon is expected to have a different “true” mean latitudinal gradient.

 

© 2009-2025 Dolph Schluter