class: center, middle, inverse, title-slide # Confounding and Variable Transformations --- layout: true <div class="my-footer"> <span> by Dr. Lucy D'Agostino McGowan </span> </div> --- ## Adjusting for confounders * What is the relationship between average SAT scores and average teacher salaries? .small[
] -- * Are we doing inference or prediction? --- ## Adjusting for confounders * I fit a linear model for `\(\hat{sat} = \hat\beta_0 + \hat\beta_1 salary\)` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 1159. 57.7 20.1 5.13e-25 ## 2 salary -5.54 1.63 -3.39 1.39e- 3 ``` -- * How do we interpret this result? --- ## Adjusting for confounders * There is a **third variable**, the fraction of students that took the SAT in that state. It is grouped as "Low", "Medium", and, "High". ``` ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 852. 38.9 21.9 5.56e-26 ## 2 salary 1.09 0.988 1.10 2.76e- 1 ## 3 frac_groupLOW 150. 12.8 11.7 2.09e-15 ## 4 frac_groupMED 38.6 14.1 2.75 8.59e- 3 ``` -- * What is the referent category? -- * How do you interpret the `\(\hat{\beta}\)` for `frac_groupLOW`? -- * How do you interpret the `\(\hat{\beta}\)` for `salary` now? --- class: middle ## `\(\hat\beta\)` interpretation in multiple linear regression The coefficient for `\(x\)` is `\(\hat\beta\)` (95% CI: `\(LB_\hat\beta, UB_\hat\beta\)`). A one-unit increase in `\(x\)` yields an expected increase in y of `\(\hat\beta\)`, **holding all other variables constant**. --- class: middle ## `\(\hat\beta\)` interpretation in multiple linear regression The coefficient for average salary is 1.09 (95% CI: -0.90, 3.08). A one-unit increase in average salary yields an expected increase in average SAT score of 1.09, **holding the fraction of students that took the SAT constant**. --- ## Adjusting for confounders ![](09-variable-transformations_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Adjusting for confoundrs ![](09-variable-transformations_files/figure-html/unnamed-chunk-6-1.png)<!-- --> -- * What is this called? Where the direction reverses? -- * Notice here the lines are **parallel** so holding the group constant, this is the effect we see. -- * 😱 what if the lines aren't parallel? --- ## Interactions * Data looking at the growth rate for kids .small[
] --- ## Interactions ![](09-variable-transformations_files/figure-html/unnamed-chunk-8-1.png)<!-- --> -- * Will `\(\hat\beta_{age}\)` be positive or negative? --- ## Interactions * Let's look at this relationship split by `sex` (blue: Girl, black: Boy) ![](09-variable-transformations_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ## Interactions * Let's look at this relationship split by `sex` (blue: Girl, black: Boy) ![](09-variable-transformations_files/figure-html/unnamed-chunk-10-1.png)<!-- --> -- * 😱 the lines cross! That means there is an **interaction**, that is the slopes differ based on the group --- ## Interactions * Let's look at this relationship split by `sex` (blue: Girl, black: Boy) ![](09-variable-transformations_files/figure-html/unnamed-chunk-11-1.png)<!-- --> -- * What is the equation for this relationship? --- ## Interactions `\(Weight = \beta_0 + \beta_1 Age + \beta_2 Girl + \beta_3 Age \times Girl + \epsilon\)` .small[ ```r lm(Weight ~ Age + Sex + Age * Sex, data = Kids198) ``` ``` ## ## Call: ## lm(formula = Weight ~ Age + Sex + Age * Sex, data = Kids198) ## ## Coefficients: ## (Intercept) Age Sex Age:Sex ## -33.6925 0.9087 31.8506 -0.2812 ``` ] -- * What does this model become for **boys** (When `Sex = 0`) -- * `\(Weight = \beta_0 + \beta_1 Age + \epsilon\)` -- * What does this model become for **girls** (When `Sex = 1`) -- * `\(Weight = \beta_0 + \beta_1 Age + \beta_2 1 + \beta_3 Age \times 1 + \epsilon\)` -- * `\(Weight = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) Age + \epsilon\)` -- * How do you interpret `\(\hat\beta_0\)` now? --- ## Interactions .small[ ```r lm(Weight ~ Age + Sex + Age * Sex, data = Kids198) ``` ``` ## ## Call: ## lm(formula = Weight ~ Age + Sex + Age * Sex, data = Kids198) ## ## Coefficients: ## (Intercept) Age Sex Age:Sex ## -33.6925 0.9087 31.8506 -0.2812 ``` ] * What does this model become for **boys** (When `Sex = 0`) * `\(Weight = \beta_0 + \beta_1 Age + \epsilon\)` * What does this model become for **girls** (When `Sex = 1`) * `\(Weight = \beta_0 + \beta_1 Age + \beta_2 1 + \beta_3 Age \times 1 + \epsilon\)` * `\(Weight = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) Age + \epsilon\)` * How do you interpret `\(\hat\beta_{2}\)` now? -- * The difference in intercepts between boys and girls --- ## Interactions .small[ ```r lm(Weight ~ Age + Sex + Age * Sex, data = Kids198) ``` ``` ## ## Call: ## lm(formula = Weight ~ Age + Sex + Age * Sex, data = Kids198) ## ## Coefficients: ## (Intercept) Age Sex Age:Sex ## -33.6925 0.9087 31.8506 -0.2812 ``` ] * What does this model become for **boys** (When `Sex = 0`) * `\(Weight = \beta_0 + \beta_1 Age + \epsilon\)` * What does this model become for **girls** (When `Sex = 1`) * `\(Weight = \beta_0 + \beta_1 Age + \beta_2 1 + \beta_3 Age \times 1 + \epsilon\)` * `\(Weight = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) Age + \epsilon\)` * How do you interpret `\(\hat\beta_{3}\)` now? -- * How much the slope changes as we move from the regression line for boys to that for girls --- ## Interactions `\(Weight = \beta_0 + \beta_1 Age + \beta_2 Girl + \beta_3 Age \times Girl + \epsilon\)` * Hypothesis testing: What if you want to test whether the slope is different between groups? * Is the growth rate different for boys and girls? * What is `\(H_0\)`? -- * `\(H_0: \beta_3 = 0\)` -- * What is `\(H_A\)`? -- * `\(H_A:\beta_3 \neq 0\)` --- ## Interactions .small[ ```r lm(Weight ~ Age + Sex + Age * Sex, data = Kids198) %>% tidy(conf.int = TRUE) ``` ``` ## # A tibble: 4 x 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -33.7 10.0 -3.37 9.17e- 4 -53.4 -14.0 ## 2 Age 0.909 0.0611 14.9 6.47e-34 0.788 1.03 ## 3 Sex 31.9 13.2 2.41 1.71e- 2 5.73 58.0 *## 4 Age:Sex -0.281 0.0816 -3.44 7.00e- 4 -0.442 -0.120 ``` ] -- * What is the result of our hypothesis test? --- class: middle ## `\(\hat\beta\)` interpretation for interactions between `\(x\)` and a binary indicator `\(I\)` The coefficient for the interaction between `\(x\)` and `\(I\)` is `\(\hat\beta\)` (95% CI: `\(LB_\hat\beta, UB_\hat\beta\)`). This means that the effect of `\(x\)` on `\(y\)` differs by `\(\hat\beta\)` when `\(I = 1\)` compared to `\(I = 0\)` **holding all other variables constant***. -- * You must include this line if there are **additional variables in your model**. --- class: middle ## `\(\hat\beta\)` interpretation for interactions between `\(x\)` and a binary indicator `\(I\)` The coefficient for the interaction between `Age` and `Sex` is -0.28 (95% CI: -0.44, -0.12). This means that the effect of `Age` on `Weight` lower by 0.28 among girls compared to boys. --- ## Non-linear relationships * Sometimes the relationships between the outcome `\(y\)` and `\(x\)` variables are _nonlinear_. * We can use _polynomials_ to address this! * Returning to the **Diamonds** data, let's say we are interested in predicting Total Price from the Carats. -- * Is this an example of inference or prediction? --- ## Non-linear relationships ![](09-variable-transformations_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ## Non-linear relationships ```r lm(TotalPrice ~ Carat, data = Diamonds) ``` ![](09-variable-transformations_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- ## Non-linear relationships ```r lm(TotalPrice ~ Carat + I(Carat^2), data = Diamonds) ``` ![](09-variable-transformations_files/figure-html/unnamed-chunk-20-1.png)<!-- --> -- * What is the equation for this relationship? --- ## Interpreting `\(\hat\beta\)`s in the presence of polynomials `\(Total Price = \beta_0 + \beta_1 Carat + \beta_2 Carat^2 + \epsilon\)` * What is the interpretation of `\(\hat\beta_1\)`? -- * Typically, in multiple linear regression, the interpretation of `\(\hat\beta_i\)` is: a one-unit change in `\(x\)` yields an expected change in `\(y\)` of `\(\hat\beta_i\)` **holding all other variables constant**. -- * What does it mean to see a change in `Caret` holding `Carat` `\(^2\)` constant? -- * When you have a polynomial term, you need to **specify the values you are changing between**, since the change is no longer constant across all values of `\(x\)`. --- ## Interpreting `\(\hat\beta\)` in the presence of polynomials .small[ ```r lm(TotalPrice ~ Carat + I(Carat^2), data = Diamonds) %>% tidy() ``` ``` ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -523. 466. -1.12 2.63e- 1 ## 2 Carat 2386. 753. 3.17 1.66e- 3 ## 3 I(Carat^2) 4498. 263. 17.1 5.09e-48 ``` ] What is the expected change in `TotalPrice` for a one-unit change in `Carat`, changing from 0.8 to 1.8? -- .pull-left[ .small[ ```r (-522.7 + 2386 * 1.8 + 4498.2 * 1.8^2) - (-522.7 + 2386 * 0.8 + 4498.2 * 0.8^2) ``` ``` ## [1] 14081.32 ``` ] ] -- .pull-right[ .small[ ```r 2386 * (1.8 - 0.8) + 4498.2 * (1.8^2 - 0.8^2) ``` ``` ## [1] 14081.32 ``` ] ] --- ## Interpreting `\(\hat\beta\)` in the presence of polynomials .small[ ```r lm(TotalPrice ~ Carat + I(Carat^2), data = Diamonds) %>% tidy() ``` ``` ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -523. 466. -1.12 2.63e- 1 ## 2 Carat 2386. 753. 3.17 1.66e- 3 ## 3 I(Carat^2) 4498. 263. 17.1 5.09e-48 ``` ] What is the expected change in `TotalPrice` for a one-unit change in `Carat`, changing from 1.8 to 2.8? -- .small[ ```r 2386 * (2.8 - 1.8) + 4498.2 * (2.8^2 - 1.8^2) ``` ``` ## [1] 23077.72 ``` ] -- * Can we talk about `\(\hat\beta_1\)` and `\(\hat\beta_2\)` in the context of a one-unit change in `Carat`? --- ## Interpreting `\(\hat\beta\)` in the presence of polynomials * `\(\hat\beta\)` coefficients that are transformations of the same `\(x\)` variable **must** be interpreted together -- * You must first choose to values of `\(x\)` to change between, and then report the change. -- * A sensible choice for the two `\(x\)` values can be the 25th% quantile and the 75th% quantile. --- class: middle ## General `\(\hat\beta\)` interpretation with quadratic terms The linear term in the model for `\(x\)` has a coefficient of `\(\hat\beta_1\)` (95% CI: `\((LB_{\hat\beta_1}, UB_{\hat\beta_1})\)`). The quadratic term in the model for `\(x\)` has a coefficient of `\(\hat\beta_2\)` (95% CI: `\((LB_{\hat\beta_2}, UB_{\hat\beta_2})\)`). A change in `\(x\)` from `\(a\)` to `\(b\)` yields an expected change in `\(y\)` of `\(\hat\beta_1 (b - a) + \hat\beta_2 (b^2 - a^2)\)` **holding all other variables constant***. -- * You must include this line if there are **additional variables in your model**. --- class: middle ## Specific `\(\hat\beta\)` interpretation for `\(y = \beta_0 + \beta_1 Carat + \beta_2 Carat^2 + \epsilon\)` model The linear term in the model for `Carat` has a coefficient of 2386 (95% CI: `\((906, 3866)\)`). The quadratic term in the model for `Carat` has a coefficient of `\(4498\)` (95% CI: `\((3981, 5016)\)`). A change in `Carat` from `\(0.7\)` to `\(1.24\)` yields an expected change in `TotalPrice` of `\(6000.5\)`. -- * Why didn't I say **holding all other variables constant?** --- ## Take aways * The interpretation of `\(\hat\beta\)` in multiple linear regression * A one-unit change in `\(x\)` yields an expected change in `\(y\)` of `\(\hat\beta\)` **holding all other included variables constant** -- * If the _slope differs_ between groups (the lines cross in a scatterplot), an **interaction** is present -- * You can include polynomial terms to address **non-linear** relationships -- * The coefficients for a polynomial must be interpreted together --- ## <i class="fas fa-laptop"></i> `Diamonds` - Go to RStudio Cloud and open `Diamonds` - Fit the model `\(TotalPrice = \beta_0 + \beta_1Carat + \beta_2 Carat^2 + \beta_3 Color+\epsilon\)` - Find the 0.25 quantile and 0.75 quantile of `Carat` - What is the interpretation of `\(\hat\beta_1\)`, `\(\hat\beta_2\)`, and `\(\hat\beta_3\)`?