Lecture 2
BIOC13 - Ekologi
Department of Biology, Lund University
2025-04-25
| region | host_plant | patry | sex | body_length_mm | ovipositor_length_mm | wing_length_mm | wing_width_mm | melanized_percent | baltic | 
|---|---|---|---|---|---|---|---|---|---|
| estonia | heterophyllum | sympatry | male | 3.94 | NA | 4.455026 | 1.873016 | 55.96312 | east | 
| estonia | heterophyllum | sympatry | female | 4.48 | 1.65 | 4.710000 | 2.170000 | 63.89701 | east | 
| estonia | heterophyllum | sympatry | male | 4.61 | NA | 4.990196 | 2.294118 | 62.52199 | east | 
| estonia | heterophyllum | sympatry | female | 5.31 | 1.78 | 5.611650 | 2.582524 | 61.37781 | east | 
| estonia | heterophyllum | sympatry | male | 4.51 | NA | 4.468750 | 2.093750 | 62.25657 | east | 
| estonia | heterophyllum | sympatry | male | 4.74 | NA | 4.906250 | 2.333333 | 57.46406 | east | 
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?obs_stat <-
  tephritis_data |> 
  specify(response = ovipositor_length_mm, explanatory = host_plant) |> 
  calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))
obs_statResponse: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 0.108Example in R for a difference in means:
library(infer)
boot_dist <-
1  tephritis_data |>
2  specify(response = ovipositor_length_mm, explanatory = host_plant) |>
3  generate(reps = 10000, type = "bootstrap") |>
4  calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))specify which variables we are interested in.
generate new samples with bootstrap resampling.
calculate the chosen statistic for each new sample generated.
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?Response: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 0.108Example in R:
infer to calculate a 95% CI with "percentile" method
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?Response: ovipositor_length_mm (numeric)
Explanatory: host_plant (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 0.108"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.ovipositor_length_mm between flies that use "heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant.host_plant variable has no predictive value for ovipositor_length_mmhost_plant should produce samples similar to observed samplehost_plant variable has predictive value for ovipositor_length_mmhost_plant should produce samples different to observed samplenull_dist <-
  tephritis_data |> 
  specify(response = ovipositor_length_mm, explanatory = host_plant) |>
1  hypothesise(null = "independence") |>
2  generate(reps = 10000, type = "permute") |>
  calculate(stat = "diff in means", order = c("heterophyllum", "oleraceum"))response and explanatory variables are independent of each other.
"heterophyllum" as a host_plant and flies that use "oleraceum" as a host_plant differ in their ovipositor_length_mm?
\[ F = \frac{\text{Variance 1}}{\text{Variance 2}} \]
\[ F = \frac{\text{Mean variance between-group}}{\text{Mean variance within-group}} \]
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?Observed difference:
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?01:30
 Think 30 sec, discuss 60 sec
What is our null and alternative hypothesis?
ovipositor_length_mm in the heterophyllum host race different from 1.78 mm?
Calculate the test statistic:
# A tibble: 120 × 1
   phenotype
   <chr>    
 1 A-B-     
 2 A-B-     
 3 A-B-     
 4 A-B-     
 5 A-B-     
 6 A-B-     
 7 A-B-     
 8 A-B-     
 9 A-B-     
10 A-B-     
# ℹ 110 more rows\[ \chi^2 = \sum \frac{(Observed_i - Expected_i)^2}{Expected_i} \]
# A tibble: 120 × 2
   treatment germination_success
   <chr>     <chr>              
 1 coating_a germinated         
 2 coating_a germinated         
 3 coating_a germinated         
 4 coating_a failed_to_germinate
 5 coating_a germinated         
 6 coating_a germinated         
 7 coating_a germinated         
 8 coating_a germinated         
 9 coating_a germinated         
10 coating_a germinated         
# ℹ 110 more rowsCorrelation coefficient (Pearson):
\[ r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} \]
\[ r = \frac{\text{Covariance}(x,y)}{\text{Standard deviation}(x) \times \text{Standard deviation}(y)} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]

02:00
 Think 30 sec, discuss 90 sec
What is our null and alternative hypothesis? How could we generate data compatible with the null?
\[ y = \text{Slope}\times x + \text{Intercept} \]
\[ y = mx+c \]
\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5 \]
\[ y = \beta_1x+\beta_0 \]
\[ y = 3.83x-0.68 \]


Base R:
With infer:
boot_dist <-
  mating_data |>
  specify(reproductive_success ~ mating_success) |>
  generate(reps = 1000, type = "bootstrap") |>
  fit()
boot_dist# A tibble: 2,000 × 3
# Groups:   replicate [1,000]
   replicate term           estimate
       <int> <chr>             <dbl>
 1         1 intercept        -3.75 
 2         1 mating_success    4.32 
 3         2 intercept        -4.96 
 4         2 mating_success    4.73 
 5         3 intercept        -0.267
 6         3 mating_success    3.80 
 7         4 intercept         6.72 
 8         4 mating_success    2.84 
 9         5 intercept        -1.08 
10         5 mating_success    3.96 
# ℹ 1,990 more rowsnull_dist <-
  mating_data |>
  specify(reproductive_success ~ mating_success) |>
  hypothesize(null = "independence") |>
  generate(reps = 1000, type = "permute") |>
  fit()
null_dist# A tibble: 2,000 × 3
# Groups:   replicate [1,000]
   replicate term           estimate
       <int> <chr>             <dbl>
 1         1 intercept       20.3   
 2         1 mating_success   0.0382
 3         2 intercept       23.9   
 4         2 mating_success  -0.603 
 5         3 intercept       25.8   
 6         3 mating_success  -0.951 
 7         4 intercept       23.7   
 8         4 mating_success  -0.570 
 9         5 intercept       27.2   
10         5 mating_success  -1.20  
# ℹ 1,990 more rows\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5 \]
"permute""draw""F-statistic" (compare variances to infer if means differ)"Chi-sq" (compare distributions of categorical data)"correlation" (do two variables covary linearly?)fit())