Planning tools

Before carrying out a time- and fund-consuming experiment, it is useful to get an idea of what to expect from the results. How big an effect are you expecting? What are the chances that you would detect it? What sample size would you need to have a reasonable chance of succeeding? How narrow a confidence interval around the estimated effect would you be happy with? In this workshop we will show how R can be used to address some of these questions.

Random sampling warm-up

To begin, get some practice sampling categorical and normally-distributed data from a population. Consult the "Make data" section of the plan submenu on the R tips pages for help.
  1. Sample 20 observations from a population having two groups of individuals, "infected" and "uninfected", in equal proportions. Use a table to show the frequencies of the two categories obtained.
  2. Repeat the previous step five times to convince yourself that the outcome varies from sample to sample.
  3. Sample 18 individuals from a population having two groups of individuals, "mated" and "unmated", where the proportion mated in the population is 0.7. Summarise the frequencies in a table.
  4. Repeat the previous step five times to convince yourself that the outcome varies from sample to sample.
  5. Sample 30 observations from a normally-distributed population having mean 0 and standard deviation 2. Plot the results in a histogram.
  6. Repeat the previous step 5 times and calculate the mean each time. Convince yourself that the outcome varies from sample to sample.


Detect a preference

Below is an example of a common design to measure mate choice, habitat or food preference, or other sensory capability. 

Consider an experiment to estimate mate preference of females of a species of jumping spiders. Each independent trial involves presenting a female spider with two tethered males. One of the males is from her own species, and the other is from its sister species. Before carrying out the experiment, it is useful to generate data under different scenarios to get a sense of whether the design will accomplish what is intended. 

Estimate weak or no preference

We'll start with the case of no preference: in the population half (0.5) of the females chooses the male from her own species and the other half choose the male of the other species. Will the design of the experiment allow you to reach this conclusion from data? The answer is yes if you end up with an estimate near 0.5 from the data AND a relatively narrow confidence interval. You'll be in a much stronger position to say that the preference is weak if the confidence interval calculated from the data is 0.4 -- 0.6 (a span of 0.2) than if the interval is 0.1 -- 0.9 (a span of 0.8). What sample size is necessary to achieve a reasonably narrow confidence interval?
  1. Randomly sample n = 10 females from a population having equal numbers of "successes" (females who choose males of her own species) and "failures" (females who choose males of the other species). What was the proportion of successes in your sample?
  2. Using the data from step 1, calculate a 95% confidence interval for the population proportion. Use the Agresti-Coull method, which is implemented in the following command named "ci.agresti". Copy and paste the whole command below to the R command line and it will be stored in memory. (If you paste into a text file first, make sure to "paste as ordinary text" to avoid formatting/font issues that can lead to puzzling error messages in R)

      ci.agresti <- function(x,n,conflev=0.95){
        # Computes the Agresti-Coull 'add 4' CI
        # for x successes out of n trials
        # Modified from code at A. Agresti's web site
        # at Univ. Florida

        pprime <- (x+2)/(n+4)

        z <- abs(qnorm((1-conflev)/2))

        stderr <- sqrt(pprime * (1-pprime)/(n+4))

        ul <- pprime + z * stderr

        ll <- pprime - z * stderr
        ll[ll<0] <- 0
        ul[ul>1] <- 1
        conf <- data.frame(lower=ll,upper=ul)
        if(nrow(conf==1)) confint <- unlist(conf)

        return(conf)

        }

    To use the new command, enter

      ci.agresti(x,n)

    where x is the number of "successes" (number of females who chose males of her own species) and n is the sample size. Try it using your data from step 1. What was the span of your confidence interval (upper limit minus lower limit)? Was it as narrow as the desired 0.2?
  3. Repeat steps 1 and 2 five times and keep a record of the confidence intervals you obtained. How often did your confidence interval have a span as low as 0.2? 
  4. You can speed up the effort if you create a "for" loop that automatically repeats steps 1 and 2 as many times as you decide. The basic form of a loop that repeats ten times is as follows (If you paste any of the following lines into a text file, make sure to "paste as ordinary text" to avoid formatting/font issues that can lead to puzzling error messages in R):

      for(i in 1:10){

            (step 1: sample 10 females)
            (count up the number of successes from step 1)
            (step 2: calculate confidence interval)
            (print the confidence interval or just the span)

      }

    The variable "i" in this loop is a counter that starts at 1 and increases by 1 each time the commands in the loop are executed. You'll need to use the "print" command within the loop to see results of each iteration. You'll also need to figure out a way to count up the number of successes each time you sample so that you can plug that number into the ci.agresti command that calculates the confidence interval. Create the loop in a text file, and then cut and paste all the lines into the command window to execute.
  5. Increase the sample size to n = 20 and run the loop from step 4 again. In what fraction of trials did the confidence interval have a span as low as 0.2?
  6. By modifying the sample size and re-running the loop, find a sample size (ballpark, no need to be exact at this point) that usually produces a confidence interval having a span no greater than 0.2. 
    Optional: by this point you might wish to store the results of each iteration in a vector rather than print the results to the screen, so that you can increase the number of iterations (say, to 100 times) and calculate the number of iterations that led to a confidence interval of span 0.2 or less. One way to do this is to modify the loop as follows:

      span <- vector()     # creates a vector with nothing in it

      for(i in 1:10){

            (sample n females)
            (count up the number of successes from step 1)
            (calculate upper and lower limits of confidence interval)
            (store the difference between upper and lower limits in span[i]

      }
      print(span)  # shows values of span from all iterations

  7. Given the results of step 6, you would now have some design options before you. Is the number of trials realistic? If the answer is yes, great, get started! If the answer is no, then you would have some decisions to make:
    • Forget all about doing the experiment. (Consider a thesis based on theory instead.)
    • Revise your concept of what represents a "narrow" confidence interval. Maybe an interval spanning, say, 0.3 -- 0.7 would be good enough to allow you to conclude that the preference was weak or absent. This is still a much better outcome thanan interval of 0.1 -- 0.9.
  8. Repeat the above procedures to find a sample size that usually gives a confidence interval having a span of 0.4 or less.

Detect a preference with hypothesis testing

In this second exercise we assume that the preference really is different from 0.5, and use hypothesis testing to detect it. What strength of preference would we like to be able to detect? To pick an extreme case, if the true proportion of females in the population choosing a male from her own species is 0.51 rather than 0.50, you would need an enormous sample size to detect it. But we don't really care about such a small effect. Let's start instead with the more realistic proportion 0.7. What sample size would be needed to detect it? 
  1. Sample 20 females from a population in which the fraction of "successes" is 0.7
  2. Use the binomial test to test the null hypothesis that the population proportion is 0.5. The binomial test calculates the exact probability of a result as extreme or more extreme as that observed if the null hypothesis is true. The method is implemented in R in the following command,

      z <- binom.test(x,n, p=0.5)

    where x is the observed number of successes in your sample from step 1, and n is the sample size. z here is an object that stores the result. To see the results of the test enter "z" in the command line. If you just want to see the P-value of the test, enter

      z$p.value

    instead. Did you reject the null hypothesis?
  3. Create a loop to repeat steps 1 and 2 ten times. What fraction of the times was the null hypothesis rejected?
  4. By modifying the sample size and re-running the loop several times, find a sample size (ballpark, no need to be exact at this point) that usually results in the null hypothesis being rejected. Compare your results to those from the confidence interval simulation above.
  5. Is the number of trials you obtained from step 4 feasible? If the answer is yes, great! If the answer is no, then you have some decisions to make. You could decide not to run the experiment after all. Or, you could revise your aims. Perhaps your committee would be happy if you if you could detect a preference of 0.8 instead of 0.7. 

Calculate power and sample size

Simulating random samples on the computer, as we did above, is a great way to investigate power and sample size requirements. However, a number of quantitative tools have been developed for mainly simple designs that do the work for you.

Load the "pwr" library and use it to do some of the calculations for you. See the "power tools" section of the plan submenu on the R tips web pages for advice.
  1. Use the pwr package to calculate the approximate minimum sample size needed to detect a preference of 0.6 with a power of 0.80 (i.e., the null hypothesis would be rejected in 80% of experiments). The null hypothesis is that the population proportion p of females who would choose the male from her own population is 0.5. The goal is to design an experiment that can detect a p of 0.6 with reasonable probability.
  2. Repeat the above procedure for a preference of 0.7, 0.8, and 0.9.
  3. Use a line plot (see the "display" tab on the R tips page) to plot the relationship between minimum sample sizes and the different values of p for the case of power=0.80. Are the sample sizes realistic?

Plan a 2 x 2 experiment

In the great tit experiment discussed previously in class, two eggs were removed from 30 nests, causing the attending females to lay one more egg. The 35 unmanipulated nests served as controls. The response variable was incidence of malaria in females at the end of the experiment. The results of the experiment are tabulated below.

 

Control group

Egg-removal group

Malaria

7

15

No malaria

28

15


Imagine that you are considering repeating this experiment on a different species of songbird. What are the chances of detecting an effect? What sample sizes should you plan?
  1. Randomly sample 30 females from a control population in which the fraction of infected birds is 0.2 (the fraction in the tit data). Sample also 30 females from an experimental population in which the fraction of infected birds is 0.5 (the fraction in the tit data). Combined the samples into a data frame. Include a variable indicating treatment.
  2. Display the 2 x 2 frequency table from your random sample. Is there an association?
  3. Display the association in your random samples using a mosaic plot or grouped bar plot.
  4. Repeat steps 1-3 three times to convince yourself that the answer is different each time.
  5. Using the tools in "pwr" calculate the sample size needed to achieve 80% power in this design.
  6. Repeat step 5 holding the probabilities in the control group the same, but varying the probability of "success" in the treatment group from 0.30 to 0.90. Record the total sample size needed to achieve a power of 80% in each case. 
  7. Plot the results of step 6 in a line plot.

Plan a 2-treatment design for a numerical response variable

Imagine that you are considering a two-treatment experiment for a numeric response varaiable. The treatments consist of two grazing regimes and the response variable is  the numbers of plant species in plots at the end of the experiment. How many replicate plots should you set up? As usual, we will investigate only the case of equal sample sizes in the two groups.

We'll assume that the number of plant species in plots has a normal distribution in both treatment and control. We'll round the numbers so that they are integers.
  1. Randomly sample 20 measurements from a normal distribution having a mean of 10 and a standard deviation of the square root of 10. Call this the "control" group. Let's round the numbers so that they are integers.

      control<-round(control,0)

  2. Repeat step 1 for a second sample, this time from a normal distribution having a mean of 15 but the same sample variance, 10. (This is a bit unrealistic, as we would expect the variance in numbers of species to be higher as the mean increases, but never mind for now). Call this the "treatment" group.  In other words, we will investigate the power of this experiment to detect a 1.5-fold change in the mean number of species from control to treatment.
  3. Assemble the samples into a data frame in "long" format, with a second variable indicating which measurements are from the control group and which are from the treatment group. Create a histogram for each of the two samples and compare the distributions by eye.
  4. Using the "power.t.test" command in the basic R stats package, determine the power of the above design -- probability that the experiment will detect a significant difference between the treatment and control means based on random samples.
  5. Using the same command, determine the sample size that would be necessary to achieve a power of 0.80.
  6. Repeat step 5 for a series of differences between means. Keep the mean of the control group the same, but use a series of possible values for the mean of the treatment group, between 11 to 20 in increments. Each time, calculate the sample size needed in each group to achieve a power of 80%. Plot the sample size against the difference between the means using a line plot.