class: center, middle, inverse, title-slide # Data transformations --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- ## <i class="fas fa-laptop"></i> `Species Area` - Go to RStudio Cloud and open `Species Area` --- # Steps for modeling ![](img/03/flowchart.png) --- # Steps for modeling ![](img/03/flowchart-arrow.png) --- # Conditions for simple linear regression * Linearity * Zero Mean * Constant Variance * Independence * Random * Normality --- # Conditions for simple linear regression .question[ What can we do when these conditions aren't met? ] * Linearity * Zero Mean * Constant Variance * Independence * Random * Normality --- class: center, middle # transformations! --- ## Example Number of MDs and community hospitals for sample of 83 metropolitan areas .small[ |City | NumMDs| NumHospitals| |:-----------------------------|------:|------------:| |Holland-Grand Haven, MI | 349| 3| |Louisville, KY-IN | 4042| 18| |Battle Creek, MI | 256| 3| |Madison, WI | 2679| 7| |Fort Smith, AR-OK | 502| 8| |Sarasota-Bradenton-Venice, FL | 2352| 7| |Anderson, IN | 200| 2| |Honolulu, HI | 3478| 13| |Asheville, NC | 1489| 5| |Winston-Salem, NC | 2018| 6| ] --- ## Example Number of MDs and community hospitals for sample of 83 metropolitan areas ![](05-data-transformations_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Choose Number of MDs and community hospitals for sample of 83 metropolitan areas ### `\(\hat{\textrm{Number of MDs}} = \hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals}\)` --- ## Fit Number of MDs and community hospitals for sample of 83 metropolitan areas ```r lm(NumMDs ~ NumHospitals, data = MetroHealth83) ``` ``` ## ## Call: ## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals ## -385.1 282.0 ``` --- ## Fit Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ Refresher: What is `\(\hat{\beta}_0\)` and what does it mean? ] ```r lm(NumMDs ~ NumHospitals, data = MetroHealth83) ``` ``` ## ## Call: ## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals ## -385.1 282.0 ``` --- ## Fit Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ Refresher: What is `\(\hat{\beta}_1\)` and what does it mean? ] ```r lm(NumMDs ~ NumHospitals, data = MetroHealth83) ``` ``` ## ## Call: ## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals ## -385.1 282.0 ``` --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What can I use to assess the linearity and constant variance assumptions? ] --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What can I use to assess the linearity and constant variance assumptions? ] ![](05-data-transformations_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What do you think? ] ![](05-data-transformations_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What can I use to assess the normality assumption? ] --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What do you think? ] ![](05-data-transformations_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ## Assess Number of MDs and community hospitals for sample of 83 metropolitan areas .question[ What do you think? ] ![](05-data-transformations_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ## Choose * to **stabilize** the variance of the response ( `\(y\)`, in this case `NumMDs`) across different values of the predictor ( `\(x\)`, in this case `NumHospitals`), we can **transform** `\(y\)` or `\(x\)` -- * typical transformations: * `\(\sqrt{y}\)` * `\(\log{y}\)` * `\(x^2\)` * `\(1/x\)` -- * For count data, such as the number of doctors or hospitals where the variability increases along with the magnitudes of the variables, a **square root transformation** is often helpful --- ## Choose Number of MDs and community hospitals for sample of 83 metropolitan areas ### `\(\hat{\sqrt{\textrm{Number of MDs}}} = \hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals}\)` --- ## Fit Number of MDs and community hospitals for sample of 83 metropolitan areas ```r lm(sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83) ``` ``` ## ## Call: ## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals ## 14.033 2.915 ``` --- ## Assess ![](05-data-transformations_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- ## Assess ![](05-data-transformations_files/figure-html/unnamed-chunk-15-1.png)<!-- --> ![](05-data-transformations_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- ## Use Number of MDs and community hospitals for sample of 83 metropolitan areas ### `\(\hat{\sqrt{\textrm{Number of MDs}}} = \hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals}\)` ### `\(\hat{\textrm{Number of MDs}} = (\hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals})^2\)` --- ## Use ### `\(\hat{\textrm{Number of MDs}} = (\hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals})^2\)` ``` ## ## Call: ## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals *## 14.033 2.915 ``` ``` ## City NumMDs NumHospitals *## 1 Louisville, KY-IN 4042 18 ``` --- ## Use ### `\(\hat{\textrm{Number of MDs}} = (\hat{\beta}_0 + \hat{\beta}_1 \textrm{Number of hospitals})^2\)` ``` ## ## Call: ## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83) ## ## Coefficients: ## (Intercept) NumHospitals *## 14.033 2.915 ``` ``` ## City NumMDs NumHospitals *## 1 Louisville, KY-IN 4042 18 ``` ```r (14.033 + 2.915 * 18)^2 ``` ``` ## [1] 4422.649 ``` --- ## Use ![](05-data-transformations_files/figure-html/unnamed-chunk-22-1.png)<!-- --> --- ## <i class="fas fa-laptop"></i> `Species Area` - Go to RStudio Cloud and open `Species Area` - For each question you work on, set the `eval` chunk option to `TRUE` and knit