+ - 0:00:00
Notes for current slide
Notes for next slide

Partitioning variability

1 / 24

Partitioning variability

2 / 24

Why?

  • yy¯=(y^y¯)+(yy^)
3 / 24

Why?

  • yy¯=(y^y¯)+(yy^)

  • (yy¯)2=(y^y¯)2+(yy^)2

3 / 24

Why?

  • yy¯=(y^y¯)+(yy^)

  • (yy¯)2=(y^y¯)2+(yy^)2

  • SSTotal = SSModel + SSE

3 / 24

Degrees of freedom

  • SSTotal: n1

4 / 24

Degrees of freedom

  • SSTotal: n1

  • SSE: n2

4 / 24

Degrees of freedom

  • SSTotal: n1

  • SSE: n2

  • SSModel: n1=1+(n2) - so 1!

4 / 24

Mean Squares

  • MSModel=SSModel1
5 / 24

Mean Squares

  • MSModel=SSModel1
  • MSE=SSEn2
5 / 24

F=MSModelMSE

6 / 24

F-distribution

Under the null hypothesis

7 / 24

Sparrows

We can see all of these pieces using the anova() function

lm(Weight ~ WingLength, data = Sparrows) %>%
anova()
## Analysis of Variance Table
##
## Response: Weight
## Df Sum Sq Mean Sq F value Pr(>F)
## WingLength 1 355.05 355.05 181.25 < 2.2e-16
## Residuals 114 223.31 1.96
8 / 24

Sparrows

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
9 / 24

Sparrows

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • F-statistic: 181.25
9 / 24

Sparrows

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • F-statistic: 181.25
  • p-value: 2.62e-25
9 / 24

Sparrows

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • F-statistic: 181.25
  • p-value: 2.62e-25
  • Where did this p-value come from?
9 / 24

p-value

The probability of getting a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true

10 / 24

F-distribution

Under the null hypothesis

11 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
12 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
  • SSE: ?
12 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
  • SSE: ?
  • SSModel: ?
12 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
13 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
  • SSE: n2 = 114
13 / 24

Sparrows: Degrees of freedom

  • SSTotal: n1 = 115
  • SSE: n2 = 114
  • SSModel: 115 - 114 = 1
13 / 24

Sparrows

To calculate the p-value under the t-distribution we used pt(). What do you think we use to calculate the p-value under the F-distribution?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
14 / 24

Sparrows

To calculate the p-value under the t-distribution we used pt(). What do you think we use to calculate the p-value under the F-distribution?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • pf()
15 / 24

Sparrows

To calculate the p-value under the t-distribution we used pt(). What do you think we use to calculate the p-value under the F-distribution?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • pf()
  • it takes 3 arguments: q, df1, and df2. What do you think df1 and df2 are?
15 / 24

Sparrows

To calculate the p-value under the t-distribution we used pt(). What do you think we use to calculate the p-value under the F-distribution?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
pf(181.2535, 1, 114, lower.tail = FALSE)
## [1] 2.621946e-25
16 / 24

Sparrows

Why don't we multiple this p-value by 2 when we use pf()?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
pf(181.2535, 1, 114, lower.tail = FALSE)
## [1] 2.621946e-25
17 / 24

F-Distribution

Under the null hypothesis

  • We observed an F-statistic of 181.25, but for demonstration purposes, let's assume we saw one of 2.
18 / 24

F-Distribution

Under the null hypothesis

  • We observed an F-statistic of 181.25, but for demonstration purposes, let's assume we saw one of 2.
19 / 24

F-Distribution

Under the null hypothesis

  • Are there any negative values in an F-distribution?
20 / 24

F-Distribution

Under the null hypothesis

  • The p-value calculates values "as extreme or more extreme", in the t-distribution "more extreme values", defined as farther from 0, can be positive or negative. Not so for the F!
21 / 24

Sparrows

Notice the p-value for the F-test is the same as the p-value for the t-test for β1!

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
lm(Weight ~ WingLength, data = Sparrows) %>%
tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1
## 2 WingLength 0.467 0.0347 13.5 2.62e-25
22 / 24

Sparrows

What is the F-test testing?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
23 / 24

Sparrows

What is the F-test testing?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • null hypothesis: the fit of the intercept only model (with β^0 only) and your model (in this case, β^0+β^1x ) are equivalent
23 / 24

Sparrows

What is the F-test testing?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • null hypothesis: the fit of the intercept only model (with β^0 only) and your model (in this case, β^0+β^1x ) are equivalent
  • alternative hypothesis: The fit of the intercept only model is significantly worse compared to your model
23 / 24

Sparrows

What is the F-test testing?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
  • null hypothesis: the fit of the intercept only model (with β^0 only) and your model (in this case, β^0+β^1x ) are equivalent
  • alternative hypothesis: The fit of the intercept only model is significantly worse compared to your model
  • When we only have one variable in our model, x, the p-values from the F and t are going to be equivalent
23 / 24

Sparrows

How are the test statistics related between the F and the t?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
lm(Weight ~ WingLength, data = Sparrows) %>%
tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1
## 2 WingLength 0.467 0.0347 13.5 2.62e-25
24 / 24

Sparrows

How are the test statistics related between the F and the t?

lm(Weight ~ WingLength, data = Sparrows) %>%
glance()
## # A tibble: 1 x 11
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.614 0.611 1.40 181. 2.62e-25 2 -203. 411. 419.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
lm(Weight ~ WingLength, data = Sparrows) %>%
tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1
## 2 WingLength 0.467 0.0347 13.5 2.62e-25
13.5^2
## [1] 182.25
## [1] 182.25
24 / 24

Partitioning variability

2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow