- 1
-
Using the dataset we just made, we pipe
|>it into the next line. - 2
-
We
specify()which columns we are interested in. In this case, since there’s no explanatory variables, we just have a response. - 3
-
We
calculate()our chosen sample statistic. Try to use theinferwebsite or the helpfile for thecalculate()function (write?calculatein your console) to figure out for yourself what you should write here. - 4
-
We assign this to a new object called
observed_ovi. - 5
- We write the name of the object so that its contents (a 1x1 dataframe) get printed into our document.
Statistical inference with bootstrap
Exercise 4
Get RStudio setup
Each time we start a new exercise, you should:
Make a new folder in your course folder for the exercise (e.g.Since we will continue working in the file from Exercise 3, you should keep working in the document initially.biob11/exercise_4).- Open RStudio
- If you haven’t closed RStudio since the last exercise, I recommend you do so and then re-open it. If it asks if you want to save your R Session data, choose no.
- Set your working directory by going to Session -> Set working directory -> Choose directory, then navigate to the folder for this exercise.
Create a new Rmarkdown document (File -> New file -> R markdown..). Give it a clear title.Open the Rmarkdown document you made in exercise 3 (the.Rmdfile, not the.htmlfile)
You are now ready to start.
Peer review
To help you remember what you did in the last exercise, and to help you learn from each other, we will start today by reviewing our code from Exercise 3.
Tephritis phenotype II
Getting setup
In the last exericse, we will used only the tidyverse package. In this exercise, we are going to use one additional packaged called infer.
Average ovipositor
In this exercise, we are going to calculate a 95% confidence interval around a sample statistic using a bootstrap approach:
infer inference pipelines.Let’s start with a simple question:
What is the average ovipositor length of Tephritis conura?
Data
Plot
Observed sample statistic
While you already have a way to calculate statistics using summarise(), we will avoid this when performing statistical inferences. Save summarise() for making pivot tables. To calcualte our observed statistic, we are going to use the infer package.
To use infer to calculate an observed statistic, we write:
Bootstrap sampling distribution
Now we need a sampling distribution. To get that, we will use bootstrap resampling.
To use infer to generate a bootstrap sampling distribution, we write:
- 1
-
Notice how this is almost identical to the code we used to get the observed statistic, but we have one extra step, where we
generate()10000 bootstrap samples, then calculate the statistic for all of them. - 2
-
We assign this to a new object, called
bootstrap_ovi
Confidence intervals
We can use the percentile method to calculate a 95% confidence interval.
infer has a helpful function to do this calculation for us:
ovi_ci <-
bootstrap_ovi |>
1 get_confidence_interval(level = ______, type = ______)
ovi_ci - 1
-
Try to use helpfiles and the
inferwebsite to figure out what you should write here.
Visualising
infer has a helpful function to quickly plot a distribution of statistics. If you want more control on how it looks, you can also use ggplot()
bootstrap_ovi |>
visualize()We can shade the confidence interval we just made on this plot as well:
bootstrap_ovi |>
visualize() +
1 shade_confidence_interval(ovi_ci)- 1
-
Since the output of
visualize()is aggplot, we need to add layers using+.
Answering the question
What is the average ovipositor length of Tephritis conura?
Back to your own research question
Return to the question you answered at the end of Exercise 3 (where you had to choose 1 out of 6 to answer).