class: center, middle, inverse, title-slide # Model Comparisons --- layout: true <div class="my-footer"> <span> by Dr. Lucy D'Agostino McGowan </span> </div> --- # <i class="fas fa-laptop"></i> `First Year GPA` - Go to RStudio Cloud and open `First Year GPA` --- ## 🛠toolkit for comparing models -- ### 👉 F-test -- ### 👉 `\(\Large R^2\)` --- ## 🛠F-test for Multiple Linear Regression * Comparing the full model to the intercept only model -- * `\(\Large H_0: \beta_1 = \beta_2 = \dots = \beta_k = 0\)` -- * `\(\Large H_A: \textrm{at least one } \beta_i \neq 0\)` --- ## 🛠F-test for Multiple Linear Regression * `\(\Large F = \frac{MSModel}{MSE}\)` -- * df for the Model? -- * k -- * df for the errors? -- * n - k - 1 --- ## 🛠Nested F-test for Multiple Linear Regression * What does "nested" mean? * You have a "small" model and a "large" model where the "small" model is completely contained in the "large" model -- * The F-test we have learned so far is one example of this, comparing: * `\(y = \beta_0 + \epsilon\)` (**small**) * `\(y = \beta_0 + \beta_1 + \beta_2 + \dots +\beta_k + \epsilon\)` (**large**) -- * The full (**large**) model has `\(k\)` predictors, the reduced (**small**) model has `\(k - p\)` predictors --- ## 🛠Nested F-test for Multiple Linear Regression * The full (**large**) model has `\(k\)` predictors, the reduced (**small**) model has `\(k - p\)` predictors -- * What is `\(H_0\)`? -- * `\(H_0:\)` `\(\beta_i=0\)` for all `\(p\)` predictors being dropped from the full model -- * What is `\(H_A\)`? -- * `\(H_A:\)` `\(\beta_i\neq 0\)` for at least one of the `\(p\)` predictors dropped from the full model -- * Does the full model do a (statistically significant) better job of explaining the variability in the response than the reduced model? --- ## 🛠Nested F-test for Multiple Linear Regression * The full (**large**) model has `\(k\)` predictors, the reduced (**small**) model has `\(k - p\)` predictors * `\(F = \frac{SSMODEL_{Full} - SSMODEL_{Reduced}/p}{SSE_{Full}/n-k-1}\)` --- ## 🛠Nested F-test for Multiple Linear Regression * Which of these are nested models? (1) `\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\)` (2) `\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 * x_2 + \epsilon\)` (3) `\(y = \beta_0 + \beta_1 x_3 + \epsilon\)` (4) `\(y = \beta_0 + \beta_2 x_2 + \epsilon\)` (5) `\(y = \beta_0 + \beta_1 x_4 + \epsilon\)` -- * (4) in (1) in (2) --- ## 🛠Nested F-test for Multiple Linear Regression * Comparing these two models, what is `\(p\)`? (1) `\(y = \beta_0 + \beta_2 x_2 + \epsilon\)` (2) `\(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 * x_2 + \epsilon\)` -- * `\(p = 2\)` -- * What is `\(k\)`? -- * `\(k = 3\)` --- ## 🛠Nested F-test for Multiple Linear Regression * Goal: Trying to predict the weight of fish based on their length and width .small[ ```r data("Perch") model1 <- lm( Weight ~ Length + Width + Length * Width, data = Perch ) model2 <- lm( Weight ~ Length + Width + I(Length ^ 2) + I(Width ^ 2) + Length * Width, data = Perch ) ``` ] -- * What is the equation for `model1`? -- * What is the equation for `model2`? --- ## 🛠Nested F-test for Multiple Linear Regression .small[ ```r data("Perch") model1 <- lm( Weight ~ Length + Width + Length * Width, data = Perch ) model2 <- lm( Weight ~ Length + Width + I(Length ^ 2) + I(Width ^ 2) + Length * Width, data = Perch ) ``` ] -- * If we want to do a _nested F-test_, what is `\(H_0\)`? -- * `\(H_0: \beta_3 = \beta_4 = 0\)` -- * What is `\(H_A\)`? -- * `\(H_A: \beta_3\neq 0\)` or `\(\beta_4\neq 0\)` -- * What are the degrees of freedom of this test? (n = 56) -- * 2, 50 --- ## 🛠Nested F-test for Multiple Linear Regression .small[ ```r anova(model1) ``` ``` ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## Length 1 6118739 6118739 3126.6 < 2e-16 ## Width 1 110593 110593 56.5 7.4e-10 ## Length:Width 1 314997 314997 161.0 < 2e-16 ## Residuals 52 101765 1957 ``` ] .small[ ```r (SSModel1 <- 6118739 + 110593 + 314997) ``` ``` ## [1] 6544329 ``` ] --- ## 🛠Nested F-test for Multiple Linear Regression .small[ ```r anova(model2) ``` ``` ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## Length 1 6118739 6118739 3289.64 < 2e-16 ## Width 1 110593 110593 59.46 4.7e-10 ## I(Length^2) 1 314899 314899 169.30 < 2e-16 ## I(Width^2) 1 5381 5381 2.89 0.095 ## Length:Width 1 3482 3482 1.87 0.177 ## Residuals 50 93000 1860 ``` ] .small[ ```r (SSModel1 <- 6118739 + 110593 + 314997) ``` ``` ## [1] 6544329 ``` ```r (SSModel2 <- 6118739 + 110593 + 314899 + 5381 + 3482) ``` ``` ## [1] 6553094 ``` ] --- ## 🛠Nested F-test for Multiple Linear Regression * `\(F = \frac{SSMODEL_{Full} - SSMODEL_{Reduced}/p}{SSE_{Full}/n-k-1}\)` -- * `\(SSMODEL_{Full} - SSMODEL_{Reduced}\)`: ```r SSModel2 - SSModel1 ``` ``` ## [1] 8765 ``` -- * What is `\(p\)`? --- ## 🛠Nested F-test for Multiple Linear Regression * `\(F = \frac{SSMODEL_{Full} - SSMODEL_{Reduced}/p}{SSE_{Full}/n-k-1}\)` -- * `\(SSMODEL_{Full} - SSMODEL_{Reduced}\)` / p: ```r (SSModel2 - SSModel1) / 2 ``` ``` ## [1] 4382 ``` --- ## 🛠Nested F-test for Multiple Linear Regression * `\(F = \frac{SSMODEL_{Full} - SSMODEL_{Reduced}/p}{SSE_{Full}/n-k-1}\)` * `\(SSE_{Full}/n-k-1\)` .small[ ```r anova(model2) ``` ``` ## Analysis of Variance Table ## ## Response: Weight ## Df Sum Sq Mean Sq F value Pr(>F) ## Length 1 6118739 6118739 3289.64 < 2e-16 ## Width 1 110593 110593 59.46 4.7e-10 ## I(Length^2) 1 314899 314899 169.30 < 2e-16 ## I(Width^2) 1 5381 5381 2.89 0.095 ## Length:Width 1 3482 3482 1.87 0.177 *## Residuals 50 93000 1860 ``` ] --- ## 🛠Nested F-test for Multiple Linear Regression * `\(F = \frac{SSMODEL_{Full} - SSMODEL_{Reduced}/p}{SSE_{Full}/n-k-1}\)` ```r ((SSModel2 - SSModel1) / 2) / 1860 ``` ``` ## [1] 2.36 ``` -- * What are the degrees of freedom for this test? -- * 2, 50 -- ```r pf(2.356183, 2, 50, lower.tail = FALSE) ``` ``` ## [1] 0.105 ``` --- ## 🛠Nested F-test for Multiple Linear Regression _An easier way_ .small[ ```r anova(model1, model2) ``` ``` ## Analysis of Variance Table ## ## Model 1: Weight ~ Length + Width + Length * Width ## Model 2: Weight ~ Length + Width + I(Length^2) + I(Width^2) + Length * ## Width ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 52 101765 *## 2 50 93000 2 8765 2.36 0.11 ``` ] --- ## 🛠`\(R^2\)` for Multiple Linear Regression -- * `\(\Large R^2= \frac{SSModel}{SSTotal}\)` -- * `\(\Large R^2= 1 - \frac{SSE}{SSTotal}\)` -- * As is, if you add a predictor this will _always_ increase. Therefore, we have `\(R^2_{adj}\)` that has a small "penalty" for adding more predictors -- * `\(\Large R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)` -- * `\(\Large \frac{SSTotal}{n-1} = \frac{\sum(y - \bar{y})^2}{n-1}\)` What is this? -- * Sample variance! `\(S_Y^2\)` -- * `\(\Large R^2_{adj} = 1 - \frac{\hat\sigma^2_\epsilon}{S_Y^2}\)` --- ## 🛠`\(R^2_{adj}\)` for Multiple Linear Regression * `\(\Large R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)` * The denominator stays the same for all models fit to the same response variable and data * the numerator actually increase when a new predictor is added to a model if the decrease in the SSE is not sufficient to offset the decrease in the error degrees of freedom. * So `\(R^2_{adj}\)` can 👇 when a weak predictor is added to a model --- ## 🛠`\(R^2_{adj}\)` for Multiple Linear Regression .small[ ```r glance(model1) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.985 0.984 44.2 1115. 3.75e-47 4 -290. 589. 599. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ```r glance(model2) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.986 0.985 43.1 705. 4.41e-45 6 -287. 588. 602. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ] -- * so far we know what the first 6 columns are --- ## Model Comparision criteria * We are looking for reasonable ways to balance "goodness of fit" (how well the model fits the data) with "parsimony" -- * `\(R^2_{adj}\)` gets at this by adding a **penalty** for adding variables -- * AIC and BIC are two more methods that balance goodness of fit and parsimony --- ## Log Likelihood * Both AIC and BIC are calculated using the **log likelihood** `\(\Large\log(\mathcal{L}) = -\frac{n}{2}[\log(2\pi) + \log(SSE/n) + 1]\)` -- * `\(\log = \log_e\)`, `log()` in `R` -- .small[ ```r glance(model1) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.985 0.984 44.2 1115. 3.75e-47 4 -290. 589. 599. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ```r -56 / 2 * (log(2 * pi) + log(101765 / 56) + 1) ``` ``` ## [1] -290 ``` ] -- * "goodness of fit" measure * **higher** log likelihood is better --- ## Log Likelihood **What I want you to remember** * Both AIC and BIC are calculated using the **log likelihood** `\(\Large\log(\mathcal{L}) = -\frac{n}{2}[\log(SSE/n) ]+\textrm{some constant}\)` * `\(\log = \log_e\)`, `log()` in `R` * "goodness of fit" measure * **higher** log likelihood is better --- ## AIC * Akaike's Information Criterion * `\(AIC = 2(k+1) - 2\log(\mathcal{L})\)` * `\(k\)` is the number of predictors in the model * **lower** AIC values are better -- .small[ ```r glance(model1) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.985 0.984 44.2 1115. 3.75e-47 4 -290. 589. 599. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ```r glance(model2) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.986 0.985 43.1 705. 4.41e-45 6 -287. 588. 602. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ] --- ## BIC * Bayesian Information Criterion * `\(BIC = \log(n)(k+1) - 2\log(\mathcal{L})\)` * `\(k\)` is the number of predictors in the model * **lower** BIC values are better -- .small[ ```r glance(model1) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.985 0.984 44.2 1115. 3.75e-47 4 -290. 589. 599. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ```r glance(model2) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.986 0.985 43.1 705. 4.41e-45 6 -287. 588. 602. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ] --- ## AIC and BIC can disagree! .small[ ```r glance(model1) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.985 0.984 44.2 1115. 3.75e-47 4 -290. 589. 599. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ```r glance(model2) ``` ``` ## # A tibble: 1 x 11 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.986 0.985 43.1 705. 4.41e-45 6 -287. 588. 602. ## # … with 2 more variables: deviance <dbl>, df.residual <int> ``` ] * the penalty term is larger in BIC than in AIC -- * What to do? Both are valid, **pre-specify** which you are going to use before running your models in the **methods** section of your analysis --- ## 🛠toolkit for comparing models ### 👉 F-test ### 👉 `\(\Large R^2\)` ### 👉 `\(AIC\)` ### 👉 `\(BIC\)` --- # <i class="fas fa-laptop"></i> `First Year GPA` - Go to RStudio Cloud and open `First Year GPA` ---