
Lecture 6
Wednesday 1st April, 2026
Correlation coefficient (Pearson):
\[ r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} \]
\[ r = \frac{\text{Covariance}(x,y)}{\text{Standard deviation}(x) \times \text{Standard deviation}(y)} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]
\[ r = \frac{\text{Cov}(x,y)}{\sigma_x \sigma_y} \]

Correlation coefficient (\(r\)): \[ r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} \]
Coefficient of determination (\(r^2\)): \[ r^2 = \left(\frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}\right)^2 \]
# A tibble: 1 × 1
p_value
<dbl>
1 0
reps\[ y = \text{Slope}\times x + \text{Intercept} \]
\[ y = mx+c \]
\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5 \]
\[ y = \beta_1x+\beta_0 \]
\[ y = 3.83x-0.68 \]


With infer (if no interest in the intercept):
Response: reproductive_success (numeric)
Explanatory: mating_success (numeric)
# A tibble: 1,000 × 2
replicate stat
<int> <dbl>
1 1 4.32
2 2 4.73
3 3 3.80
4 4 2.84
5 5 3.96
6 6 3.98
7 7 3.38
8 8 3.60
9 9 4.08
10 10 3.98
# ℹ 990 more rows
Response: reproductive_success (numeric)
Explanatory: mating_success (numeric...
# A tibble: 1,000 × 2
replicate stat
<int> <dbl>
1 1 0.0382
2 2 -0.603
3 3 -0.951
4 4 -0.570
5 5 -1.20
6 6 -0.0117
7 7 -0.548
8 8 -0.707
9 9 -0.423
10 10 -0.0407
# ℹ 990 more rows
\[ y = \beta_1x+\beta_0 \]
\[ y = \beta_0+\beta_1x+\beta_2x_2+\beta_3x_3+\beta_4x_4+\beta_5x_5... \]

\[ \text{flipper_length_mm} = \beta_0+ \\ \beta_1 \times \text{body_mass_g}+ \\ \beta_2 \times \text{species} \]

\[ \text{flipper_length_mm} = \beta_0 + \\ \beta_1 \times \text{body_mass_g} + \\ \beta_2 \times \text{species} + \\ \beta_3 \times \text{body_mass_g} \times \text{species} \]
infer for hypothesis testing in multiple linear regression are a bit limited.
parsnip package or base R lm() function.\[ \text{logit}(p) = \log\left(\frac{p}{1-p}\right) \]
\[ \text{logit}(p) = \beta_0+\beta_1x+\beta_2x_2+... \]
\[ p = \frac{e^{\beta_0+\beta_1x...}}{1 + e^{\beta_0+\beta_1x...}} \]
\(\beta_0 \approx -27.5\)
body_mass_g = 0\(\beta_1 \approx 0.00623\)
\[ \text{logit}(p) = -27.5 + 0.00623 \times \text{body_mass_g} \]
Inverse logit (to get probability): \[ p = \frac{e^{\text{logit}(p)}}{1 + e^{\text{logit}(p)}} \]
infer for hypothesis testing in multiple logistic regression are a bit limited.
parsnip package.