class: center, middle, inverse, title-slide # Prediction intervals --- layout: true <div class="my-footer"> <span> by Dr. Lucy D'Agostino McGowan </span> </div> --- class: middle # confidence intervals If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter ( `\(\beta_1\)` ) to fall within the interval estimates 95% of the time. --- # Confidence interval for `\(\beta_1\)` .question[ How do we calculate the confidence interval for the slope? ] -- .center[ `\(\Huge\hat\beta_1\pm t^*SE_{\hat\beta_1}\)` ] --- ## How do we calculate it in R? * using the **broom** package .small[ ```r lm(Weight ~ WingLength, Sparrows) %>% tidy(conf.int = TRUE) ``` ``` ## # A tibble: 2 x 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 1.37 0.957 1.43 1.56e- 1 -0.531 3.26 *## 2 WingLength 0.467 0.0347 13.5 2.62e-25 0.399 0.536 ``` ] -- * "by hand" .small[ ```r t_star <- qt(0.025, df = 116 - 2, lower.tail = FALSE) # or t_star <- qt(0.975, df = 116 - 2) ``` .pull-left[ ```r 0.467 - t_star * 0.0347 ``` ``` ## [1] 0.3982596 ``` ] .pull-right[ ```r 0.467 + t_star * 0.0347 ``` ``` ## [1] 0.5357404 ``` ] ] --- ## Confidence intervals There are ✌️ other types of confidence intervals we may want to calculate -- * The confidence interval for the **mean response** in `\(y\)` for a given `\(x^*\)` value -- * The confidence interval for an **individual response** `\(y\)` for a given `\(x^*\)` value --- ## Confidence intervals There are ✌️ other types of confidence intervals we may want to calculate * The confidence interval for the **mean response** in `\(y\)` for a given `\(x^*\)` value: **confidence interval for** `\(\mu_y\)` * The confidence interval for an **individual response** `\(y\)` for a given `\(x^*\)` value: **prediction interval** -- * Why are these different? Which do you think is easier to estimate? -- * It is **harder** to predict one response than to predict a mean response. What does this mean in terms of the standard error? -- * The SE of the prediction interval is going to be **larger** --- ## Confidence intervals **confidence interval for** `\(\mu_y\)` and **prediction interval** .center[ `\(\Huge \hat{y}\pm t^* SE\)` ] * `\(\hat{y}\)` is the predicted `\(y\)` for a given `\(x^*\)` * `\(t^*\)` is the critical value for the `\(t_{n-2}\)` density curve * `\(SE\)` takes ✌️ different values depending on which interval you're interested in -- * `\(SE_{\hat\mu}\)` -- * `\(SE_{\hat{y}}\)` -- * Which will be larger? --- ## Confidence intervals **confidence interval for** `\(\mu_y\)` and **prediction interval** .center[ `\(\Huge \hat{y}\pm t^* SE\)` ] * `\(\hat{y}\)` is the predicted `\(y\)` for a given `\(x^*\)` * `\(t^*\)` is the critical value for the `\(t_{n-2}\)` density curve * `\(SE\)` takes ✌️ different values depending on which interval you're interested in * `\(SE_{\hat\mu} = \hat{\sigma}_\epsilon\sqrt{\frac{1}{n}+\frac{(x^*-\bar{x})^2}{\Sigma(x-\bar{x})^2}}\)` * `\(SE_{\hat{y}}=\hat{\sigma}_\epsilon\sqrt{1 + \frac{1}{n}+\frac{(x^*-\bar{x})^2}{\Sigma(x-\bar{x})^2}}\)` * Which will be larger? -- * What is the difference between these two equations? --- ## Confidence intervals **confidence interval for** `\(\mu_y\)` and **prediction interval** .center[ `\(\Huge \hat{y}\pm t^* SE\)` ] * `\(\hat{y}\)` is the predicted `\(y\)` for a given `\(x^*\)` * `\(t^*\)` is the critical value for the `\(t_{n-2}\)` density curve * `\(SE\)` takes ✌️ different values depending on which interval you're interested in * `\(SE_{\hat\mu} = \hat{\sigma}_\epsilon\sqrt{\frac{1}{n}+\frac{(x^*-\bar{x})^2}{\Sigma(x-\bar{x})^2}}\)` * `\(SE_{\hat{y}}=\hat{\sigma}_\epsilon\sqrt{\color{red}1 + \frac{1}{n}+\frac{(x^*-\bar{x})^2}{\Sigma(x-\bar{x})^2}}\)` -- * an individual response will vary from the mean response `\(\mu_y\)` with a standard deviation of `\(\sigma_\epsilon\)` --- ## Let's do it in R! .small[ ```r lm(Weight ~ WingLength, data = Sparrows) %>% predict() ``` ``` ## 1 2 3 ## 14.92020 15.85501 13.05059 ``` ] -- .small[ ```r lm(Weight ~ WingLength, data = Sparrows) %>% * predict(interval = "confidence") ``` ``` ## fit lwr upr ## 1 14.92020 14.63801 15.20240 ## 2 15.85501 15.49396 16.21607 ## 3 13.05059 12.74776 13.35342 ``` ] -- .small[ ```r lm(Weight ~ WingLength, data = Sparrows) %>% * predict(interval = "prediction") ``` `## WARNING predictions on current data refer to _future_ responses` ``` ## fit lwr upr ## 1 14.92020 12.13329 17.70712 ## 2 15.85501 13.05902 18.65101 ## 3 13.05059 10.26151 15.83966 ``` ] --- ## Let's do it in R! What if we have new data? .small[ ```r new_sparrows <- data.frame( WingLength = c(30, 28, 25) ) new_sparrows ``` ``` ## WingLength ## 1 30 ## 2 28 ## 3 25 ``` ] -- .small[ ```r lm(Weight ~ WingLength, data = Sparrows) %>% predict(newdata = new_sparrows, interval = "prediction") ``` ``` ## fit lwr upr ## 1 15.38761 12.59700 18.17822 ## 2 14.45280 11.66790 17.23771 ## 3 13.05059 10.26151 15.83966 ``` ]