class: center, middle, inverse, title-slide # Lab 05: Multiple regression --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- # Agenda * **Loading a .csv file** * **(Re)coding factor variables** * **Lab 05:** on your own --- ## Reading in a .csv file * So far, the data we've been using has been included in an **R package** * To access this data we just run `data("data set")` * What if we want to read in other data, for example from a `.csv` file? -- * enter: `read_csv()` -- * `read_csv()` is a function from the **readr** package, which is included when you load the **tidyverse** -- * it works like this: ```r df <- read_csv("the-path-to-your-file.csv") ``` Where `df` can be whatever you'd like to call your new dataset --- ## Recoding factor variables * Sometimes variables come in as _numeric_, but we want them to be factors ![](img/05/bad-code.jpg) --- ## Recoding factor variables * This dataset contains data on a sample of 1450 birth records that statistician John Holcomb selected from the North Carolina State Center for Health and Environmental Statistics. ```r glimpse(NCbirths) ``` ``` ## Observations: 1,450 ## Variables: 15 ## $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,… ## $ Plural <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,… ## $ Sex <int> 1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1,… ## $ MomAge <int> 32, 32, 27, 27, 25, 28, 25, 15, 21, 27, 26, 20, 19… ## $ Weeks <int> 40, 37, 39, 39, 39, 43, 39, 42, 39, 40, 41, 41, 40… ## $ Marital <int> 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1,… *## $ RaceMom <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 5, 1,… ## $ HispMom <fct> N, N, N, N, N, N, N, N, N, N, N, N, N, P, N, M, N,… ## $ Gained <int> 38, 34, 12, 15, 32, 32, 75, 25, 28, 37, 45, 52, 26… ## $ Smoke <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1,… ## $ BirthWeightOz <int> 111, 116, 138, 136, 121, 117, 143, 113, 120, 124, … ## $ BirthWeightGm <dbl> 3146.85, 3288.60, 3912.30, 3855.60, 3430.35, 3316.… ## $ Low <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,… ## $ Premie <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,… ## $ MomRace <fct> white, white, white, white, white, white, white, w… ``` --- ## Recoding factor variables ```r lm(BirthWeightOz ~ RaceMom, data = NCbirths) ``` ``` ## ## Call: ## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths) ## ## Coefficients: ## (Intercept) RaceMom ## 116.27732 -0.01624 ``` --- ## Recoding factor variables ```r NCbirths <- NCbirths %>% mutate( RaceMom = recode_factor( RaceMom, `1` = "white", `2` = "black", `3` = "American Indian", `4` = "Chinese", `5` = "Japanese", `6` = "Hawaiian", `7` = "Filipino", `8` = "Other Asian or Pacific Islander" ) ) ``` --- ## Recoding factor variables .small[ ```r lm(BirthWeightOz ~ RaceMom, data = NCbirths) ``` ``` ## ## Call: ## lm(formula = BirthWeightOz ~ RaceMom, data = NCbirths) ## ## Coefficients: ## (Intercept) ## 117.8720 ## RaceMomblack ## -7.3087 ## RaceMomAmerican Indian ## -2.5538 ## RaceMomChinese ## 8.1280 ## RaceMomJapanese ## 0.6463 ## RaceMomFilipino ## -20.8720 ## RaceMomOther Asian or Pacific Islander ## 1.1280 ``` ] -- * What is the referent category? -- * What if I wanted to change that? --- ## Recoding factor variables ```r new_levels <- c("American Indian", "white", "black", "Chinese", "Japanese", "Filipino", "Other Asian or Pacific Islander") NCbirths <- NCbirths %>% mutate( RaceMom = fct_relevel(RaceMom, new_levels) ) ``` --- ## Recoding factor variables * Sometimes variables come in as _numeric_, but we want them to be factors ![](img/05/bad-jama.png) --- ## Open `Lab 05: multiple regression` in RStudio