Meta-analysis


In this workshop we explore the "analysis of analyses". The first exercise will require you to make the step-by-step calculations for fixed and random effects models. My goal is that this will give you a better appreciaton for why the meta-analysis results differ between fixed and random effects models.

It would be truly wonderful if lm and lme could take care of all the gritty calculations in a familiar setting, but it is not easy to get the correct variances using these programs. So we'll start by going step by step. 

Aggressive bibs

The black throat patch or bib of male house sparrows, Passer domesticus, is called a "badge" because bib size seems to be correlated with male social status. Nakagawa et al. compiled results of previously published studies that investigated the relationship between male badge size and measurements of male size, behavior and reproductive success (2007, Assessing the function of House Sparrows' bib size using a flexible meta-analysis method. Behavioral Ecology 18:831–840).

The correlation coefficient r was the effect size measuring association between bib size and these other traits. Fifteen estimates of the correlation between bib size and male fighting ability ("status") are provided here in a .csv file. The data are extracted from Table 1 of Nakagawa et al. (2007).  Most of the correlations were obtained from behavioral observations made of birds in aviaries. 
  1. Print the data and examine it closely. The data set is not large, but it has aspects in common with larger compilations used in meta-analyses. First, there more entries than published studies. In some cases repeated entries from the same study represent different nearby populations. In other cases, repeat entries represent measurements taken in different years from the same population. Can the 15 effect sizes in the table therefore be considered independent? How should they be analyzed? Should we average all the values from the same population, or from the same study, before continuing? Welcome to meta-analysis.
    For the purposes of this exercise, let's treat the 15 effect sizes as though they are independent and belong to a single fixed class. I hope this does not shock you.
  2. Create a simple scatter or "funnel" plot depicting the relationship between effect size and sample size for the house sparrow data. Include a dashed line in the plot for r = 0. What pattern would you expect to see if there was publication bias? Is such a pattern evident in your plot?
  3. Statistical analysis of correlations begins by converting r to the Fisher's z-scale. The reason is that on the z-scale, the sampling distribution for the correlation is approximately normal and the standard error is independent of z. The transformation is

      z <- 0.5 ln((1 + r)/(1 - r))

    or equivalently,

      z <- atanh(r) 

  4. A convenient feature of the Fisher z is that the approximate standard error depends only on the sample size n (i.e., number of birds) used in each study, 

      z.se <- 1/sqrt(n - 3)

    The model fitting will make use of these quantities.

Fixed effects meta-analysis


Under the fixed effects model, we assume that all studies in the meta-analysis share the same true effect size. The variation among the studies are assumed to result only from sampling error (i.e., chance). For the purposes of this exercise, let's begin by fitting the fixed effects model to the sparrow data. Perhaps this is reasonable. The different studies were all carried out on the same species (house sparrow) and they correlated the same variables measured in similar (though not identical) ways. Fitting the fixed effects model involves calculating a weighted average of the separate correlations to yield an estimate the one true effect size.
  1. To estimate the mean effect size, we will weight each correlation according to the magnitude of its sampling variance, which takes into account the different sample sizes of the studies. Each weight is the inverse of the squared standard error,

      w <- 1/(z.se^2)

    Visually examine the weights for the sparrow data. Notice how the weights vary inversely with sample size.
  2. Fit the model by calculating the weighted mean of the z-transformed correlations. I will refer to this quantity as zbar.  (I am assuming that you know how to calculate a weighted mean -- if not, ask R.) This is your estimate of the true mean effect size. In what way does it differ from the unweighted mean of the z-transformed correlations?
  3. It remains to calculate the standard error of the weighted mean. This is calculated as the square root of the inverse of the sum of sample weights,

      SE <- sqrt(1/sum(w))

    Keep in mind the distinction between this standard error (which measures uncertainty of the estimate of true effect size) and the standard error of the individual z-transformed correlations.
  4. Calculate an approximate 95% confidence interval for the mean of the transformed effect sizes. Use the normal approximation. The critical value (1.96) is obtained as

      qnorm(1 - 0.05/2)

  5. Convert your estimated mean effect size from (2) into to the untransformed correlation. This requires back-transforming,*

      rbar <- tanh(zbar)

    or, equivalently,

      rbar <- (exp(2 * zbar) - 1)/(exp(2 * zbar) + 1)

    Add a line with this value to your funnel plot created in the previous section.
  6. Apply the same back-transformation to the lower and upper limits of your confidence interval in (4) to yield the 95% confidence interval for the mean correlation coefficient.**
*  0.4617
** (0.346, 0.563)


Random effects meta-analysis


Under the random effects model we assume that each study estimates a system- specific effect size. There is no "one true" effect size, only a mean effect size. This is more realistic than the fixed effects model for most data in ecology and evolution. Even though each study of male bibs was carried out on the same species (house sparrow), there is nevertheless likely to be heterogeneity from population to population, year to year, and even researcher to researcher if study methods are not identical. 
  1. To fit the random effects model we need to estimate the variance among the system-specific effect sizes, τ2 ("tau squared"). One way to estimate it involves calculating the heterogeneity among the observed effect sizes (Q), and then "correcting" by subtracting the within-study sampling variance. The correction is needed because the variance among the observed effect sizes among studies is inflated by within-study sampling error. Here's how its done.
    First, calculate Q, the weighted heterogeneity among the observed z values,

      Q <- sum(w * (z - zbar)^2)

    Then estimate τ2 by subtraction, being careful not to allow a negative value (since τ2 is a variance, which can't be negative).*

      if (Q <= (length(z)-1)) tau.sq <- 0 else
        tau.sq <- (Q - (length(z)-1))/(sum(w) - sum(w^2)/sum(w))

    z, zbar, and w are as defined in the previous section.
  2. τ2 is then used to calculate new weights for the effect sizes of each study under the random effects model, taking account of both within-study sampling variance and between-study variance in effect sizes,

      wnew <- 1/(SE^2 + tau.sq)

    where SE is the same as that computed in the fixed effects section. Examine these new weights visually and compare them to the weights under the fixed effects model. How are they different? Is as much weight given to large-sample studies, relative to small-sample studies, in the random effects model as in the fixed effects model?
  3. Calculate the weighted mean effect size (z-transformed) under the random effects model. The procedure is the same as that used before for the fixed effects model except that here we will use the new weights calculated in the previous step. Back-transform to get the estimated mean correlation r.** Add the estimated mean correlation to your funnel plot. Compare your result to the effect size estimated under the fixed effects model. Is it the same?
  4. Calculate the standard error of the mean calculated in the previous step. The calculation is the same as that in the fixed-effects model except that here we will use the new weights.
  5. Calculate the 95% confidence interval for the mean effect size under the random effects model. Is the confidence interval narrower, about the same, or wider than that calculated under the fixed effects model? Why?
  6. Finally, back-transform to get the lower and upper limits of the 95% confidence interval for the mean correlation. ***
*   0.0483
**  0.4776
*** (0.321, 0.608)


Use the meta package

The meta package will carry out all these meta-analysis calculations.
  1. Load the meta package in R. It should already be on your computer, but download from CRAN and install if necessary.
  2. Try using the metagen command on the z-transformed correlation coefficients. Provide the z-transformed correlations and their standard errors. Save the results in an object (it will be a "meta" object). The meta command is used as follows,

      metagen(z,SE,data=sparrow)

  3. Use the summary command on the meta object from (2).
  4. Use the plot command on the meta object from (2). The result is called a "forest" or confidence interval plot giving the effect sizes and approximate confidence intervals for each study and for the fixed and random effects means.

Natural selection

How strong is natural selection, typically? Kingsolver et al. surveyed studies published between 1984 and 1997 and tallied up a total of 1364 measurements of directional selection on quantitative phenotypic traits (2001, The strength of phenotypic selection in natural populations. Am. Nat. 157: 245–261). Their database on directional selection is here. (Their additional summaries of non-linear selection are not included). Download and read into R.

Effect sizes were selection coefficients, estimated on traits standardized to have mean 0 and variance 1 before selection. The measures of selection are s and beta (β). s is the directional selection differential, which for survival data is the population mean of the survivors minus the mean before selection (in units of standard deviations). beta is a partial regression coefficient from a multiple regression of relative fitness on a suite of traits. It attempts to estimate selection directly on a trait, while removing those changes to a trait caused by its correlation with other traits under selection.

Most other variables are self-explanatory. The following is not a full list:
  • fit.comp = fitness component (F = Fecundity/Fertility, M = mating success, S= survival, T = total fitness, N = Net reproductive rate, O = parasite burden)
  • traitclass = trait type (MO = Morphology, LH = life history or phenology, BE = behavior, PC = principal component, I = interaction)
  • studytype = study type (C = cross-sectional, L = longitudinal)
  • N = sample size
  1. Examine the first few lines of the data to get a feel for how the data is made up of measurements from many traits and studies. Notice that the same study might be listed multiple times, for example once for each trait measured.
  2. Draw a funnel plot of the selection differential, s. Notice the a huge range of sample sizes. Since the precision of an estimate (standard error, confidence interval) in general scales not with N but with the square root of N, try plotting s against the square root of N or the log of N to better see the patterns. You can also plot s against the inverse of the standard error of s ("se.s"), but you will notice that standard errors are missing for many estimates.
  3. Calculate the mean selection differential. For now, use the ordinary, unweighted mean. Consider that this represents the average shift in the mean of traits before and after selection, in units of standard deviations, averaged over many measurements and studies. Examine the result. If nature was in balance, we might expect the mean to be close to zero (just as much selection upward on traits as downward, overall). How should we interpret the fact that the mean s is not close to zero? (This is a biological question, not a statistics question!)
  4. Let's look at this data in a slightly different way. Draw a funnel plot of the absolute value of the selection coefficient s. What is the mean of the absolute values? This is even larger than before because we are averaging each shift regardless of direction.
  5. Plot the regression line to describe how the absolute value of the selection coefficient changes with sample size (use the measure of sample size from your funnel plot). Notice the pattern of declining selection with increasing sample size. Can publication bias explain this result?(Nobody has come up with a really satisfactory explanation for this pattern.)
  6. Is the absolute value of selection on traits likely to be stronger in association with reproduction or with survival? Why? To test your answer, calculate the median selection differential for different fitness components (median is probably nore reliable than mean because the values are skewed). Also calculate the sample sizes in each category for reference. Repeat the calculation for the selection gradient -- are they the same as for the selection differentials?
  7. If time permits, explore other patterns in the data. Is selection just as strong in plants as in animals? Does selection strength differ depending on whether the study was cross-sectional or longitudinal? Does selection differ among types of traits?  Examine also the sample sizes in each group (plants vs animals, type of study, type of trait). Are some groups overrepresented in the data compared to others?
Postscript: We haven't had a chance to carry out any serious quantitative meta-analyses on these data, as we did with the sparrow data above, because the shortage of standard errors here means that we wouldn't get very far. If we were going to go this route we would at some point  want to take stock of the multiple measurements from each study. It might be prudent to include analyses of summary measures made on data from individual studies to reduce the amount of non-independence in the data.