Porsche Price (3)
Porsche Price (3)
a conclusion reached on the basis of evidence and reasoning.
So far, we've been looking at a sample of 116 sparrows from Kent Island.
What if this were the true population, and the sample that we saw was just related by chance?
Ultimately What do we want to know?
Ultimately What do we want to know?
Ultimately What do we want to know?
Ultimately What do we want to know?
Ultimately What do we want to know?
How can we quantify how much we'd expect the slope to differ from one random sample to another?
How can we quantify how much we'd expect the slope to differ from one random sample to another?
How can we quantify how much we'd expect the slope to differ from one random sample to another?
How can we quantify how much we'd expect the slope to differ from one random sample to another?
How can we quantify how much we'd expect the slope to differ from one random sample to another?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
We need a test statistic that incorporates ^β1 and the standard error SE^β1
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
We need a test statistic that incorporates ^β1 and the standard error SE^β1
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
We need a test statistic that incorporates ^β1 and the standard error SE^β1
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
0.467 / 0.0347
## [1] 13.45821
How do we interpret this?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
0.467 / 0.0347
## [1] 13.45821
How do we interpret this?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
How do we know what values of this statistic are worth paying attention to?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
How do we know what values of this statistic are worth paying attention to?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
How do we know what values of this statistic are worth paying attention to?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
Where do these come from?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
What if we knew what the distribution of the "statistic" would be under the null hypothesis?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
null_sparrow_data <- data.frame( WingLength = rnorm(10, 27, 4), Weight = rnorm(10, 14, 3))lm(Weight ~ WingLength, data = null_sparrow_data) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 6.17 3.55 1.74 0.121 ## 2 WingLength 0.334 0.129 2.58 0.0326
null_sparrow_data <- data.frame( WingLength = rnorm(10, 27, 4), Weight = rnorm(10, 14, 3))lm(Weight ~ WingLength, data = null_sparrow_data) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 11.2 5.96 1.88 0.0968## 2 WingLength 0.0840 0.206 0.407 0.695
null_sparrow_data <- data.frame( WingLength = rnorm(10, 27, 4), Weight = rnorm(10, 14, 3))lm(Weight ~ WingLength, data = null_sparrow_data) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 11.7 5.94 1.97 0.0846## 2 WingLength 0.121 0.223 0.543 0.602
gen_null_stat <- function() { null_sparrow_data <- data.frame( WingLength = rnorm(10, 27, 4), Weight = rnorm(10, 14, 3) ) lm(Weight ~ WingLength, data = null_sparrow_data) %>% tidy() %>% filter(term == "WingLength") %>% select("statistic")}
gen_null_stat()
## # A tibble: 1 x 1## statistic## <dbl>## 1 -0.661
gen_null_stat()
## # A tibble: 1 x 1## statistic## <dbl>## 1 -0.422
gen_null_stat()
## # A tibble: 1 x 1## statistic## <dbl>## 1 0.536
gen_null_stat()
## # A tibble: 1 x 1## statistic## <dbl>## 1 -2.19
null_stats <- map_df(1:10000, ~ gen_null_stat())
null_stats <- map_df(1:10000, ~ gen_null_stat())
What distribution does this look like?
What distribution does this look like?
What distribution does this look like?
What distribution does this look like?
What distribution does this look like?
The distribution of test statistics we would expect given the null hypothesis is true, β1=0, is t-distribution with n-2 degrees of freedom.
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
How can we compare this line to the distribution under the null?
How can we compare this line to the distribution under the null?
The probability of getting a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true
lm(Weight ~ WingLength, data = Sparrows) %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1## 2 WingLength 0.467 0.0347 13.5 2.62e-25
The proportion of area less than 1.5
pt(1.5, df = 18)
## [1] 0.9245248
The proportion of area greater than 1.5
pt(1.5, df = 18, lower.tail = FALSE)
## [1] 0.07547523
The proportion of area greater than 1.5 or less than -1.5.
The proportion of area greater than 1.5 or less than -1.5.
pt(1.5, df = 18, lower.tail = FALSE) * 2
## [1] 0.1509505
The probability of getting a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( β1 ) to fall within the interval estimates 95% of the time.
^β1±t∗×SE^β1
^β1±t∗×SE^β1
^β1±t∗×SE^β1
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
qt(0.025, df = nrow(Sparrows) - 2, lower.tail = FALSE)
## [1] 1.980992
Why 0.025?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
qt(0.025, df = nrow(Sparrows) - 2, lower.tail = FALSE)
## [1] 1.980992
Why lower.tail = FALSE
?
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
qt(0.025, df = nrow(Sparrows) - 2, lower.tail = FALSE)
## [1] 1.980992
lm(Weight ~ WingLength, data = Sparrows) %>% tidy(conf.int = TRUE)
## # A tibble: 2 x 7## term estimate std.error statistic p.value conf.low conf.high## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 ## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536
t_star <- qt(0.025, df = nrow(Sparrows) - 2, lower.tail = FALSE)0.467 + t_star * 0.0347
## [1] 0.536
0.467 - t_star * 0.0347
## [1] 0.398
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( β1 ) to fall within the interval estimates 95% of the time.
Porsche Price (3)
Porsche Price (3)
Porsche Price (3)
Porsche Price (3)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |