+ - 0:00:00
Notes for current slide
Notes for next slide

Lab 05: Multiple regression

1 / 11

Agenda

  • Loading a .csv file
  • (Re)coding factor variables
  • Lab 05: on your own
2 / 11

Reading in a .csv file

  • So far, the data we've been using has been included in an R package
  • To access this data we just run data("data set")
  • What if we want to read in other data, for example from a .csv file?
3 / 11

Reading in a .csv file

  • So far, the data we've been using has been included in an R package
  • To access this data we just run data("data set")
  • What if we want to read in other data, for example from a .csv file?
  • enter: read_csv()
3 / 11

Reading in a .csv file

  • So far, the data we've been using has been included in an R package
  • To access this data we just run data("data set")
  • What if we want to read in other data, for example from a .csv file?
  • enter: read_csv()
  • read_csv() is a function from the readr package, which is included when you load the tidyverse
3 / 11

Reading in a .csv file

  • So far, the data we've been using has been included in an R package
  • To access this data we just run data("data set")
  • What if we want to read in other data, for example from a .csv file?
  • enter: read_csv()
  • read_csv() is a function from the readr package, which is included when you load the tidyverse
  • it works like this:
df <- read_csv("the-path-to-your-file.csv")

Where df can be whatever you'd like to call your new dataset

3 / 11

Recoding factor variables

  • Sometimes variables come in as numeric, but we want them to be factors

4 / 11

Recoding factor variables

  • This dataset contains data on a sample of 1450 birth records that statistician John Holcomb selected from the North Carolina State Center for Health and Environmental Statistics.
glimpse(NCbirths)
## Observations: 1,450
## Variables: 15
## $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
## $ Plural <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Sex <int> 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1,…
## $ MomAge <int> 32, 32, 27, 27, 25, 28, 25, 15, 21, 27, 26, 20, 19…
## $ Weeks <int> 40, 37, 39, 39, 39, 43, 39, 42, 39, 40, 41, 41, 40…
## $ Marital <int> 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1,…
## $ RaceMom <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 5, 1,…
## $ HispMom <fct> N, N, N, N, N, N, N, N, N, N, N, N, N, P, N, M, N,…
## $ Gained <int> 38, 34, 12, 15, 32, 32, 75, 25, 28, 37, 45, 52, 26…
## $ Smoke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,…
## $ BirthWeightOz <int> 111, 116, 138, 136, 121, 117, 143, 113, 120, 124, …
## $ BirthWeightGm <dbl> 3146.85, 3288.60, 3912.30, 3855.60, 3430.35, 3316.…
## $ Low <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ Premie <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ MomRace <fct> white, white, white, white, white, white, white, w…
5 / 11

Recoding factor variables

lm(BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Call:
## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Coefficients:
## (Intercept) RaceMom
## 116.27732 -0.01624
6 / 11

Recoding factor variables

NCbirths <- NCbirths %>%
mutate(
RaceMom = recode_factor(
RaceMom,
`1` = "white",
`2` = "black",
`3` = "American Indian",
`4` = "Chinese",
`5` = "Japanese",
`6` = "Hawaiian",
`7` = "Filipino",
`8` = "Other Asian or Pacific Islander"
)
)
7 / 11

Recoding factor variables

lm(BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Call:
## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Coefficients:
## (Intercept)
## 117.8720
## RaceMomblack
## -7.3087
## RaceMomAmerican Indian
## -2.5538
## RaceMomChinese
## 8.1280
## RaceMomJapanese
## 0.6463
## RaceMomFilipino
## -20.8720
## RaceMomOther Asian or Pacific Islander
## 1.1280
8 / 11

Recoding factor variables

lm(BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Call:
## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Coefficients:
## (Intercept)
## 117.8720
## RaceMomblack
## -7.3087
## RaceMomAmerican Indian
## -2.5538
## RaceMomChinese
## 8.1280
## RaceMomJapanese
## 0.6463
## RaceMomFilipino
## -20.8720
## RaceMomOther Asian or Pacific Islander
## 1.1280
  • What is the referent category?
8 / 11

Recoding factor variables

lm(BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Call:
## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths)
##
## Coefficients:
## (Intercept)
## 117.8720
## RaceMomblack
## -7.3087
## RaceMomAmerican Indian
## -2.5538
## RaceMomChinese
## 8.1280
## RaceMomJapanese
## 0.6463
## RaceMomFilipino
## -20.8720
## RaceMomOther Asian or Pacific Islander
## 1.1280
  • What is the referent category?
  • What if I wanted to change that?
8 / 11

Recoding factor variables

new_levels <- c("American Indian", "white", "black", "Chinese", "Japanese", "Filipino", "Other Asian or Pacific Islander")
NCbirths <- NCbirths %>%
mutate(
RaceMom = fct_relevel(RaceMom, new_levels)
)
9 / 11

Recoding factor variables

  • Sometimes variables come in as numeric, but we want them to be factors

10 / 11

Open Lab 05: multiple regression in RStudio

11 / 11

Agenda

  • Loading a .csv file
  • (Re)coding factor variables
  • Lab 05: on your own
2 / 11
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow