Instructions
In this assignment you will read the introduction and methods sections of a published research paper. However, the results and discussion sections have been removed, as have most references to statistical methods used. For this exercise to be useful, please do not look-up the original paper. This is not a test, it is a learning exercise.
The redacted version of the paper can be downloaded here.
The dataset the researchers collected can be downloaded here.
Your task is to:
- Identify the research questions and hypotheses of the study.
- Design a set of statistical methods to test those hypotheses, given the data the authors collected, and write the missing Statistical analysis section of the methods section.
- Carry out those statistical tests, and produce tables and figures that could make up the missing Results section of the paper.
Identify the hypotheses
Read through the paper, and highlight any sentence that could form a testable hypothesis. For example on page 3:
“we expected that males perceiving high-density environments would have larger testes and accessory glands (that produce seminal fluid) due to increased sperm competition.”
This is a clear statement of a hypothesis about how perceived population density could have a directional effect on testes and accessory gland size.
Identify at least one hypothesis from the paper in each of the following topics:
- Metabolic rate
- Reproductive investment
- Aggressive behaviours
- Song characteristics
If while reading you develop your own own hypotheses, you can also add them.
Understand the data
Before moving any further, now would be a good time to see what data you have to work with. I suggest you setup RStudio and a RMarkdown file if you haven’t already. Use the guides below to help.
- Make a new folder in your course folder for the exercise (e.g. biob11/writing_assignment)
- Open RStudio
- If you haven’t closed RStudio since the last exercise, I recommend you do so and then re-open it. If it asks if you want to save your R Session data, choose no.
 
- Set your working directory by going to Session -> Set working directory -> Choose directory, then navigate to the folder you just made for this exercise.
- Create a new Rmarkdown document (File -> New file -> R markdown..). Give it a clear title.
 
 
 
- library(package-name)to load a package.
- read_csv("filename_of_file.csv")to load a csv file (requires- tidyverse).
- filter()to subset the data based on specific conditions (requires- tidyverse).
- group_by()to group the data by one or more variables (requires- tidyverse).
- summarise()to calculate summary statistics (e.g., mean, median, sum) for each group (requires- tidyverse).
 
 
 
The dataset (download link at the top of the page) contains 29 variables. You are not requried to use them all. A basic definition is given for each variable below, but consulting the original paper is required to fully understand what each means, and to decide how to use it properly in a statistical test. The dataset also contains many NA values, as not all crickets were used for all experiments. Be mindful of this when working with it.
Cricket variables
- cricket_id: identifier of the cricket
- juvenile_environment: either- high_densityor- low_density
Respiration variables
- resp_mass_g: mass in grams of the cricket prior to the respiration measurements
- resp_age_days: age of the cricket in days prior to the respiration measurements
- resp_volume_co2: rate of \(\text{CO}_2\) production (μL/min)
- resp_volume_o2: rate of \(\text{O}_2\) consumption (μL/min)
Disection variables
- dis_mass_g: mass in grams of the cricket prior to the disection
- dis_age_days: age of the cricket in days prior to disection
- dis_accessory_g: mass in grams of the accessory glands
- dis_testes_g: mass in grams of the testes
Song variables
- song_mass_g: mass in grams of the cricket prior to the song measurements
- song_age_days: age of the cricket in days prior to song measurements
- song_chirp_rate: probably chirps per second
- song_chirp_duration: length of chirps (seconds)
- song_pulses_per_chirp: average number of pulses per chirp
- song_pulse_duration: average length of pulse duration
- song_dominant_frequency: song dominant frequency (kHz)
Aggresion variables
- agg_mass_g: mass in grams of the cricket prior to the aggresion trials
- agg_age_days: age of the cricket in days prior to the aggression trials
- agg_trial_id: identifier of the trial (shared between two crickets)
- agg_winner: was the cricket deemed to have won the aggression trial?
- agg_dot_colour: the colour dot that was used to identify the cricket in the trial
- agg_female_present: did the trial have a female present in the center of the arena?
- agg_num_wins: the number of battles the cricket won
- agg_num_battles: the total number of battles in the trial
- agg_first_behaviour: was the cricket the first in the trial to show an aggressive behaviour?
- agg_first_song: was the cricket the first in the trial to sing aggressively?
- agg_first_song_length: length of the first aggressive song the cricket sang.
- agg_first_winner: was the cricket the winner of the first aggressive encounter?
 
 
 
Once you have the dataset loaded, and have a general idea of the data available to you, move to the next section.
Design the statistical analysis
Take each of your identified hypotheses and complete the following steps:
- Decide which variables you can use to test it. What are your response and explanatory variables?
- Decide on a test statistic (mean, variance, standard deviation, difference in means, \(F\), \(\chi^2\), correlation, etc) that would be best to address the hypothesis.
- Turn the hypothesis into a clear null and alternative hypothesis. Make sure to indicate the direction (lesser, greater, two-sided) if appropriate.
- Write a short paragraph that could be used in the statistical analysis section in the paper.
For each statistical approach, include the following in your text:
- Objective: Describe the rationale for the analysis and how it relates to the study objective.
- Variables: Define the experimental unit and the response and explanatory variables clearly.
- Statistical method: Describe the statistical method (e.g., ANOVA, difference in means, linear regression). How was the null distribution generated? What (if any) \(\alpha\) value will you use (e.g., \(\alpha\) = 0.05?). How will you calculate the p-value?
- Implementation: Describe the function (e.g., calculate()from theinferpackage), package (e.g.infer), and software (e.g., R version 4.4.1) used and include any appropriate citations.
 
 
 
To generate a citation for R itself, you can run:
To cite R in publications use:
  R Core Team (2024). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.
A BibTeX entry for LaTeX users is
  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2024},
    url = {https://www.R-project.org/},
  }
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.
 
 
To get the name of the version of R you are running:
[1] "R version 4.4.1 (2024-06-14)"
 
 
To get the citation for any package you are using:
To cite package 'tidyverse' in publications use:
  Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
  Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
  E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
  Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
  the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
  doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
A BibTeX entry for LaTeX users is
  @Article{,
    title = {Welcome to the {tidyverse}},
    author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
    year = {2019},
    journal = {Journal of Open Source Software},
    volume = {4},
    number = {43},
    pages = {1686},
    doi = {10.21105/joss.01686},
  }
 
 
To cite package 'infer' in publications use:
  Couch et al., (2021). infer: An R package for tidyverse-friendly
  statistical inference. Journal of Open Source Software, 6(65), 3661,
  https://doi.org/10.21105/joss.03661
A BibTeX entry for LaTeX users is
  @Article{,
    title = {{infer}: An {R} package for tidyverse-friendly statistical inference},
    author = {Simon P. Couch and Andrew P. Bray and Chester Ismay and Evgeni Chasnovski and Benjamin S. Baumer and Mine Çetinkaya-Rundel},
    journal = {Journal of Open Source Software},
    year = {2021},
    volume = {6},
    number = {65},
    pages = {3661},
    doi = {10.21105/joss.03661},
  }
 
 
 
 
 
Write a results section
A results section is where you report your findings to all the hypotheses laid out in the introduction and methods. You have already described your statistical analysis in the methods section, so there is no need to go into great detail here. Importantly, in a paper with separate results and discussion sections, you should not discuss your findings in the results section, only report them. Any time you report a result, you should back it up with a statistic, and a relevant figure. A helpful two part structure you can follow for each result is the following:
- Report the overall result in plain language. If the reader reads only this sentence, they should get the whole picture in broad terms.
Perceived population density during development had X effect on reproductive investment.
- Report the result in more specific terms. Make reference to the source of your results, such as statistical tests, tables and figures. You may have several sentences like this for each overall result.
Accessory gland mass [did/did not] differ significantly between males reared in high or low densities (difference in means test, diff = X, $p = Y, figure Z).
Give the paper a title
Often the title of a paper will summarise the results in a single line. Can you think of a good title that summarises what you found?
End of the exercise
Once you are finished with the exercise (or at the end of the exercise session if you have not finished), submit your work to the Canvas assignment. You will not be graded, and instead I will provide broad feedback to the class.