Homework assignments

Try R Markdown

Assignment 1: Improve a graph

Assignment 2: Analyze linear model

Assignment 3: Best genetic model

Assignments will be posted here.

Try R Markdown

Try to write your report using R Markdown. The advantage is that R code chunks written inline are executed when you make the html file.

An example using R Markdown file is found here. Save example files to a folder on your hard disk and then open with RStudio. To render it, click “Knit” at the top of the Source file pane in RStudio.

Email the TA for the course if you have questions.

Assignment 1: Improve a graph

This assignment is due Feb 7.

Find a graph drawn from data and published by your thesis supervisor. If your supervisor is flawless, pick another published graph, eg from a paper published from your lab or department.
Choose a graph that has plenty of room for improvement. Too little improvement means we can’t assign many marks.
Students from the same lab: don’t choose the same or very similar graphs.
In your report, explain the study.
Analyze the graph. What is its goal? Explain what patterns the graph is intended to show. Explain why you think it is not successful. Explain the flaws in the graph. Why do they interfere with the goals of the graph?
Make a new graph in R using principles of effective display (review lecture notes).
Try to obtain and make use of the raw data, otherwise extract them from the graph or simulate raw data.
Analyze your new graph according to principles of good graph design. Remind us of the goal of the graph. Explain how your improvements achieve the goal more effectively than the original. Why does your graph succeed?
Attach your R script at the end (or include as code chunks inline if you are using R Markdown)
Email paper to me as a single .pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT1.PDF
Grade will be based on: the quality of your analysis of the original graph; the magnitude of improvement of the new graph; your interpretation of it and explanation of how it is improved; the quality of your R script.

Assignment 2: Analyze linear model

This assignment is due Friday, March 14.

Obtain a data set and analyze it by fitting a linear, mixed, or generalized linear model in R.

Obtain a data set from your supervisor or online data depository (e.g. datadryad.org).
Include just one response variable.
For the explanatory variables, include at least one categorical fixed factor, such as an experimental or observational treatment.
Include at least 1, and no more than 2, additional explanatory variables (random or fixed factors, blocks, covariates, etc).

Prepare a thorough report on the analysis and interpretation of the data. Below I list some of the things to include in your report, but note that the list might not be complete.

Include all your writing and graphs in a single pdf file (titled LASTNAME.FIRSTNAME.ASSIGNMENT2.PDF) and email to me.

Explain (in a paragraph) the purpose of the study that yielded the data.
Explain the specific data set you are using. For example, say where the data are from, give the meaning of the variables, and so on.
Illustrate and describe the main patterns revealed in the data.
State what parameters (magnitudes) you will estimate with these data.
State what hypotheses you will test with these data.
Fit a linear model to the data in R. Explain in words the model you fit.
Interpret the output. To assess biological significance, explain the parameter estimates (magnitudes). What do they mean and what are your conclusions based on these parameter estimates. To assess statistical significance, explain the null hypotheses and interpret the test results.
Visualize the model fit to the data. Explain what the graph is showing.
Address how well the statistical assumptions of your analysis were met. How did you handle violations?
State the overall conclusions reached from your analyses of biological and statistical significance.
Include your clean R code in an appendix.

Assignment 3: Best genetic model

This assignment is due April 11, 2025.

Clues to the genetic basis of species differences can be gained by fitting linear models to measurements of traits in parents and hybrids. In this assignment you will use model selection methods to compare the fit of alternative genetic models to oviposition preference data in two host races of the planthopper Nilaparvate nugens (Sezer and Butlin 1998, Proc. Roy. Soc. Lond. B 265: 2399-2405). One race occurs on cultivated rice. The other lives on the aquatic plant Leersia, which is probably the ancestral host.

To accomplish this you will need to choose a criterion (AIC or BIC) to evaluate the fit of models to the data. You need to defend your choice of method vigorously in your report, which will require some independent research. Why did you decide to use it instead of the other criterion? Decide on the criterion before you analyze the data.

Host oviposition preference data of females can be downloaded HERE.

Preference is the log-transformed ratio of the number of eggs laid on rice to the number laid on Leersia, when both plants were provided by the experimenters. Genotype refers to the parent race on rice (“rice”), the parent race on Leersia (“leer”), their F1 and F2 hybrids (“f1”, “f2”), and the backcrosses between the F1 hybrid and each parent race (“br” for rice and “bl” for Leersia).

Analyze these data in R according to the following methods. Note that this is not a complete list of expectations for the assignment. Fit linear models with fixed effects only. Assume that all the data for a given cross type are independent. Provide all necessary explanations in your report. No $P$ -values are allowed in your report. Include your R commands in an appendix.

Visualize the oviposition preference of the different genotypes. What is the pattern in the data? Explain.
Create a table of means and standard deviations of the genotypes. Make this a presentation quality table rather than simply computer output. Don’t worry about font.
Add a numeric variable in the data set to represent the proportion of the genome inherited from the rice parent:
1 for the rice parent genotype
0 for the Leersia parent genotype
0.5 for the F1 and F2 hybrids
0.25 for the backcross to the Leersia population
0.75 for the backcross to the rice backcross
Make sure that the variable is numeric rather than a factor or character.
Fit the numeric variable you created in (3) to the preference data using a linear model. This is called the additive model, whereby mean preference for rice increases linearly with the proportion of the genome inherited from the rice parent. Evaluate the model fit. (Remember: no P values!)
Add another numeric variable to the data set to represent dominance effects that might be present in the hybrids:
0 for both parental genotypes
1 for the F1 hybrid genotype
0.5 for the remaining three hybrid genotypes
Make sure that the variable is numeric rather than a factor or character.
Fit a second model to the same preference data that includes both of the numeric variables created in (3) and (5). Leave out any interaction terms. This is the additive plus dominance model. Any dominance effects present will displace the mean value of the hybrids toward one or other of the parents relative to the values predicted by the additive model. Evaluate model fit.
Finally, fit a third model that has the original genotype variable as the only explanatory variable. The fit of this model will deviate from the model fitted in (6) if there is interaction (epistasis) between genes inherited from the two parents.
Present your results, comparing model fits. Which genetic model best fit the data? Explain and summarize.
Explain how the procedure you used above to analyze these data differs from that of conventional null hypothesis significance testing. In your view, would a null hypothesis significance testing approach be a poorer, equivalent, or superior approach to the one used above to decide between the three models? Explain.
Include your clean R code in an appendix.

Email paper to me as a single pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT3.PDF