+ - 0:00:00
Notes for current slide
Notes for next slide

Data transformations

1 / 29

Species Area

  • Go to RStudio Cloud and open Species Area
2 / 29

Steps for modeling

3 / 29

Steps for modeling

4 / 29

Conditions for simple linear regression

  • Linearity
  • Zero Mean
  • Constant Variance
  • Independence
  • Random
  • Normality
5 / 29

Conditions for simple linear regression

What can we do when these conditions aren't met?

  • Linearity
  • Zero Mean
  • Constant Variance
  • Independence
  • Random
  • Normality
6 / 29

transformations!

7 / 29

Example

Number of MDs and community hospitals for sample of 83 metropolitan areas

City NumMDs NumHospitals
Holland-Grand Haven, MI 349 3
Louisville, KY-IN 4042 18
Battle Creek, MI 256 3
Madison, WI 2679 7
Fort Smith, AR-OK 502 8
Sarasota-Bradenton-Venice, FL 2352 7
Anderson, IN 200 2
Honolulu, HI 3478 13
Asheville, NC 1489 5
Winston-Salem, NC 2018 6
8 / 29

Example

Number of MDs and community hospitals for sample of 83 metropolitan areas

9 / 29

Choose

Number of MDs and community hospitals for sample of 83 metropolitan areas

Number of MDs^=β^0+β^1Number of hospitals

10 / 29

Fit

Number of MDs and community hospitals for sample of 83 metropolitan areas

lm(NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Call:
## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## -385.1 282.0
11 / 29

Fit

Number of MDs and community hospitals for sample of 83 metropolitan areas

Refresher: What is β^0 and what does it mean?

lm(NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Call:
## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## -385.1 282.0
12 / 29

Fit

Number of MDs and community hospitals for sample of 83 metropolitan areas

Refresher: What is β^1 and what does it mean?

lm(NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Call:
## lm(formula = NumMDs ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## -385.1 282.0
13 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What can I use to assess the linearity and constant variance assumptions?

14 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What can I use to assess the linearity and constant variance assumptions?

15 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What do you think?

16 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What can I use to assess the normality assumption?

17 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What do you think?

18 / 29

Assess

Number of MDs and community hospitals for sample of 83 metropolitan areas

What do you think?

19 / 29

Choose

  • to stabilize the variance of the response ( y, in this case NumMDs) across different values of the predictor ( x, in this case NumHospitals), we can transform y or x
20 / 29

Choose

  • to stabilize the variance of the response ( y, in this case NumMDs) across different values of the predictor ( x, in this case NumHospitals), we can transform y or x
  • typical transformations:
    • y
    • logy
    • x2
    • 1/x
20 / 29

Choose

  • to stabilize the variance of the response ( y, in this case NumMDs) across different values of the predictor ( x, in this case NumHospitals), we can transform y or x
  • typical transformations:
    • y
    • logy
    • x2
    • 1/x
  • For count data, such as the number of doctors or hospitals where the variability increases along with the magnitudes of the variables, a square root transformation is often helpful
20 / 29

Choose

Number of MDs and community hospitals for sample of 83 metropolitan areas

Number of MDs^=β^0+β^1Number of hospitals

21 / 29

Fit

Number of MDs and community hospitals for sample of 83 metropolitan areas

lm(sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83)
##
## Call:
## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## 14.033 2.915
22 / 29

Assess

23 / 29

Assess

24 / 29

Use

Number of MDs and community hospitals for sample of 83 metropolitan areas

Number of MDs^=β^0+β^1Number of hospitals

Number of MDs^=(β^0+β^1Number of hospitals)2

25 / 29

Use

Number of MDs^=(β^0+β^1Number of hospitals)2

##
## Call:
## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## 14.033 2.915
## City NumMDs NumHospitals
## 1 Louisville, KY-IN 4042 18
26 / 29

Use

Number of MDs^=(β^0+β^1Number of hospitals)2

##
## Call:
## lm(formula = sqrt(NumMDs) ~ NumHospitals, data = MetroHealth83)
##
## Coefficients:
## (Intercept) NumHospitals
## 14.033 2.915
## City NumMDs NumHospitals
## 1 Louisville, KY-IN 4042 18
(14.033 + 2.915 * 18)^2
## [1] 4422.649
27 / 29

Use

28 / 29

Species Area

  • Go to RStudio Cloud and open Species Area
  • For each question you work on, set the eval chunk option to TRUE and knit
29 / 29

Species Area

  • Go to RStudio Cloud and open Species Area
2 / 29
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow