Lecture 2
BIOC13 - Ekologi
Department of Biology, Lund University
2025-04-25
| region | host_plant | patry | sex | body_length_mm | ovipositor_length_mm | wing_length_mm | wing_width_mm | melanized_percent | baltic |
|---|---|---|---|---|---|---|---|---|---|
| estonia | heterophyllum | sympatry | male | 3.94 | NA | 4.455026 | 1.873016 | 55.96312 | east |
| estonia | heterophyllum | sympatry | female | 4.48 | 1.65 | 4.710000 | 2.170000 | 63.89701 | east |
| estonia | heterophyllum | sympatry | male | 4.61 | NA | 4.990196 | 2.294118 | 62.52199 | east |
| estonia | heterophyllum | sympatry | female | 5.31 | 1.78 | 5.611650 | 2.582524 | 61.37781 | east |
| estonia | heterophyllum | sympatry | male | 4.51 | NA | 4.468750 | 2.093750 | 62.25657 | east |
| estonia | heterophyllum | sympatry | male | 4.74 | NA | 4.906250 | 2.333333 | 57.46406 | east |
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?obs_stat <-
tephritis_data |>
specify(response = ovipositor_length_mm, explanatory = host_plant) |>
calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))
obs_statResponse: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
stat
<dbl>
1 0.108
Example in R for a difference in means:
library(infer)
boot_dist <-
1 tephritis_data |>
2 specify(response = ovipositor_length_mm, explanatory = host_plant) |>
3 generate(reps = 10000, type = "bootstrap") |>
4 calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))specify which variables we are interested in.
generate new samples with bootstrap resampling.
calculate the chosen statistic for each new sample generated.
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?Response: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
stat
<dbl>
1 0.108
Example in R:
infer to calculate a 95% CI with "percentile" method
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?Response: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
stat
<dbl>
1 0.108
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.host_plant variable has no predictive value for ovipositor_length_mmhost_plant should produce samples similar to observed samplehost_plant variable has predictive value for ovipositor_length_mmhost_plant should produce samples different to observed samplenull_dist <-
tephritis_data |>
specify(response = ovipositor_length_mm, explanatory = host_plant) |>
1 hypothesise(null = "independence") |>
2 generate(reps = 10000, type = "permute") |>
calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))response and explanatory variables are independent of each other.
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
\[ F = \frac{\text{Variance 1}}{\text{Variance 2}} \]
\[ F = \frac{\text{Mean variance between-group}}{\text{Mean variance within-group}} \]
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?Observed difference:
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?01:30
Think 30 sec, discuss 60 sec
What is our null and alternative hypothesis?
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?
Calculate the test statistic:
# A tibble: 120 × 1
phenotype
<chr>
1 A-B-
2 A-B-
3 A-B-
4 A-B-
5 A-B-
6 A-B-
7 A-B-
8 A-B-
9 A-B-
10 A-B-
# ℹ 110 more rows
\[ \chi^2 = \sum \frac{(Observed_i - Expected_i)^2}{Expected_i} \]
# A tibble: 120 × 2
treatment germination_success
<chr> <chr>
1 coating_a germinated
2 coating_a germinated
3 coating_a germinated
4 coating_a failed_to_germinate
5 coating_a germinated
6 coating_a germinated
7 coating_a germinated
8 coating_a germinated
9 coating_a germinated
10 coating_a germinated
# ℹ 110 more rows
Correlation coefficient (Pearson):
\[ r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} \]
\[ r = \frac{\text{Covariance}(x,y)}{\text{Standard deviation}(x) \times \text{Standard deviation}(y)} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]

02:00
Think 30 sec, discuss 90 sec
What is our null and alternative hypothesis? How could we generate data compatible with the null?
\[ y = \text{Slope}\times x + \text{Intercept} \]
\[ y = mx+c \]
\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5 \]
\[ y = \beta_1x+\beta_0 \]
\[ y = 3.83x-0.68 \]


Base R:
With infer:
boot_dist <-
mating_data |>
specify(reproductive_success ~ mating_success) |>
generate(reps = 1000, type = "bootstrap") |>
fit()
boot_dist# A tibble: 2,000 × 3
# Groups: replicate [1,000]
replicate term estimate
<int> <chr> <dbl>
1 1 intercept -3.75
2 1 mating_success 4.32
3 2 intercept -4.96
4 2 mating_success 4.73
5 3 intercept -0.267
6 3 mating_success 3.80
7 4 intercept 6.72
8 4 mating_success 2.84
9 5 intercept -1.08
10 5 mating_success 3.96
# ℹ 1,990 more rows
null_dist <-
mating_data |>
specify(reproductive_success ~ mating_success) |>
hypothesize(null = "independence") |>
generate(reps = 1000, type = "permute") |>
fit()
null_dist# A tibble: 2,000 × 3
# Groups: replicate [1,000]
replicate term estimate
<int> <chr> <dbl>
1 1 intercept 20.3
2 1 mating_success 0.0382
3 2 intercept 23.9
4 2 mating_success -0.603
5 3 intercept 25.8
6 3 mating_success -0.951
7 4 intercept 23.7
8 4 mating_success -0.570
9 5 intercept 27.2
10 5 mating_success -1.20
# ℹ 1,990 more rows
\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5 \]
"permute""draw""F-statistic" (compare variances to infer if means differ)"Chi-sq" (compare distributions of categorical data)"correlation" (do two variables covary linearly?)fit())