Drawing inference 
 🤔1 / 55

by Dr. Lucy D'Agostino McGowan

 Porsche Price (3)Go to RStudio Cloud and open Porsche Price (3)
2 / 55

in·fer·ence

a conclusion reached on the basis of evidence and reasoning.

3 / 55

by Dr. Lucy D'Agostino McGowan

Inferenceso far we've only been able to make claims about our sample
4 / 55

by Dr. Lucy D'Agostino McGowan

Inferenceso far we've only been able to make claims about our sample
for example, we've just been describing ^β1β^1, the estimated slope of the relationship between xx and yy.
4 / 55

by Dr. Lucy D'Agostino McGowan

Inferenceso far we've only been able to make claims about our sample
for example, we've just been describing ^β1β^1, the estimated slope of the relationship between xx and yy.
what if we want to extend these claims to the population?
4 / 55

Sparrow data

So far, we've been looking at a sample of 116 sparrows from Kent Island.

5 / 55

Sparrows

What if this were the true population, and the sample that we saw was just related by chance?

6 / 55

Sparrows

Ultimately What do we want to know?

7 / 55

Sparrows

Ultimately What do we want to know?

Does the slope in the population differ from 0?

8 / 55

Sparrows

Ultimately What do we want to know?

Does $β_{1}$ differ from 0?

9 / 55

Sparrows

Ultimately What do we want to know?

Does $β_{1}$ differ from 0?
notice the lack of a hat!

9 / 55

Sparrows

Ultimately What do we want to know?

null hypothesis $H_{0} : β_{1} = 0$
alternative hypothesis $H_{A} : β_{1} \neq 0$

10 / 55

Sparrows

How can we quantify how much we'd expect the slope to differ from one random sample to another?

11 / 55

Sparrows

How can we quantify how much we'd expect the slope to differ from one random sample to another?

We need a measure of uncertainty

12 / 55

Sparrows

How can we quantify how much we'd expect the slope to differ from one random sample to another?

How about the standard error of ${\hat{β}}_{1}$ ?

13 / 55

Sparrows

How can we quantify how much we'd expect the slope to differ from one random sample to another?

the standard error of $\hat{β_{1}}$ ( $S E_{{\hat{β}}_{1}}$ ) is how much we expect the sample slope to vary from one random sample to another.

14 / 55

Sparrows

How can we quantify how much we'd expect the slope to differ from one random sample to another?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

15 / 55

Sparrows

We need a test statistic that incorporates ${\hat{β}}_{1}$ and the standard error $S E_{{\hat{β}}_{1}}$

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

16 / 55

Sparrows

We need a test statistic that incorporates ${\hat{β}}_{1}$ and the standard error $S E_{{\hat{β}}_{1}}$

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

$t = \frac{{\hat{β}}_{1}}{S E_{{\hat{β}}_{1}}}$

17 / 55

Sparrows

We need a test statistic that incorporates ${\hat{β}}_{1}$ and the standard error $S E_{{\hat{β}}_{1}}$

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

0.467 / 0.0347

## [1] 13.45821

18 / 55

Sparrows

How do we interpret this?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

0.467 / 0.0347

## [1] 13.45821

19 / 55

Sparrows

How do we interpret this?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

"the sample slope is more than 13 standard errors above a slope of zero"

20 / 55

Sparrows

How do we know what values of this statistic are worth paying attention to?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

21 / 55

Sparrows

How do we know what values of this statistic are worth paying attention to?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

confidence intervals
p-values

22 / 55

Sparrows

How do we know what values of this statistic are worth paying attention to?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy(conf.int = TRUE)

## # A tibble: 2 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1   -0.531     3.26 
## 2 WingLength     0.467    0.0347     13.5  2.62e-25    0.399     0.536

confidence intervals
p-values

23 / 55

Sparrows

Where do these come from?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy(conf.int = TRUE)

## # A tibble: 2 x 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1   -0.531     3.26 
## 2 WingLength     0.467    0.0347     13.5  2.62e-25    0.399     0.536

confidence intervals
p-values

24 / 55

Sparrows

What if we knew what the distribution of the "statistic" would be under the null hypothesis?

lm(Weight ~ WingLength, data = Sparrows) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    1.37     0.957       1.43 1.56e- 1
## 2 WingLength     0.467    0.0347     13.5  2.62e-25

25 / 55

Sparrows

null_sparrow_data <- data.frame(
  WingLength = rnorm(10, 27, 4),
  Weight = rnorm(10, 14, 3)
)
lm(Weight ~ WingLength, data = null_sparrow_data) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)    6.17      3.55       1.74  0.121 
## 2 WingLength     0.334     0.129      2.58  0.0326

26 / 55

Sparrows

null_sparrow_data <- data.frame(
  WingLength = rnorm(10, 27, 4),
  Weight = rnorm(10, 14, 3)
)
lm(Weight ~ WingLength, data = null_sparrow_data) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)  11.2        5.96      1.88   0.0968
## 2 WingLength    0.0840     0.206     0.407  0.695

27 / 55

Sparrows

null_sparrow_data <- data.frame(
  WingLength = rnorm(10, 27, 4),
  Weight = rnorm(10, 14, 3)
)
lm(Weight ~ WingLength, data = null_sparrow_data) %>%
  tidy()

## # A tibble: 2 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)   11.7       5.94      1.97   0.0846
## 2 WingLength     0.121     0.223     0.543  0.602

28 / 55

Sparrows

gen_null_stat <- function() {
  null_sparrow_data <- data.frame(
    WingLength = rnorm(10, 27, 4),
    Weight = rnorm(10, 14, 3)
  )
  lm(Weight ~ WingLength, data = null_sparrow_data) %>%
    tidy() %>%
    filter(term == "WingLength") %>%
    select("statistic")
}

gen_null_stat()

## # A tibble: 1 x 1
##   statistic
##       <dbl>
## 1    -0.661

29 / 55

by Dr. Lucy D'Agostino McGowan

Sparrowsgen_null_stat()

## # A tibble: 1 x 1
##   statistic
##       <dbl>
## 1    -0.422
gen_null_stat()

## # A tibble: 1 x 1
##   statistic
##       <dbl>
## 1     0.536
gen_null_stat()

## # A tibble: 1 x 1
##   statistic
##       <dbl>
## 1     -2.19
30 / 55

Sparrows

null_stats <- map_df(1:10000, ~ gen_null_stat())

31 / 55

Sparrows

null_stats <- map_df(1:10000, ~ gen_null_stat())