Types of variables

Types of variables1 / 27

by Dr. Lucy D'Agostino McGowan

 DiamondsGo to RStudio Cloud and open Diamonds
2 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesThere are two major classes of variablesnumeric (quantitative) 
categorical

3 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
4 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
data("PorschePrice")
glimpse(PorschePrice)

## Observations: 30
## Variables: 3
## $ Price   <dbl> 69.4, 56.9, 49.9, 47.4, 42.9, 36.9, 83.0, 72.9, 69.9, 67…
## $ Age     <int> 3, 3, 2, 4, 4, 6, 0, 0, 2, 0, 2, 2, 4, 3, 10, 11, 4, 4, …
## $ Mileage <dbl> 21.50, 43.00, 19.90, 36.00, 44.00, 49.80, 1.30, 0.67, 13…
4 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
data("PorschePrice")
glimpse(PorschePrice)

## Observations: 30
## Variables: 3
## $ Price   <dbl> 69.4, 56.9, 49.9, 47.4, 42.9, 36.9, 83.0, 72.9, 69.9, 67…
## $ Age     <int> 3, 3, 2, 4, 4, 6, 0, 0, 2, 0, 2, 2, 4, 3, 10, 11, 4, 4, …
## $ Mileage <dbl> 21.50, 43.00, 19.90, 36.00, 44.00, 49.80, 1.30, 0.67, 13…
What are the variables here?
4 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
data("Diamonds")
glimpse(Diamonds)

## Observations: 351
## Variables: 6
## $ Carat      <dbl> 1.08, 0.31, 0.31, 0.32, 0.33, 0.33, 0.35, 0.35, 0.37,…
## $ Color      <fct> E, F, H, F, D, G, F, F, F, D, E, F, D, D, F, F, D, D,…
## $ Clarity    <fct> VS1, VVS1, VS1, VVS1, IF, VVS1, VS1, VS1, VVS1, IF, V…
## $ Depth      <dbl> 68.6, 61.9, 62.1, 60.8, 60.8, 61.5, 62.5, 62.3, 61.4,…
## $ PricePerCt <dbl> 6693.3, 3159.0, 1755.0, 3159.0, 4758.8, 2895.8, 2457.…
## $ TotalPrice <dbl> 7228.8, 979.3, 544.1, 1010.9, 1570.4, 955.6, 860.0, 8…
5 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
data("Diamonds")
glimpse(Diamonds)

## Observations: 351
## Variables: 6
## $ Carat      <dbl> 1.08, 0.31, 0.31, 0.32, 0.33, 0.33, 0.35, 0.35, 0.37,…
## $ Color      <fct> E, F, H, F, D, G, F, F, F, D, E, F, D, D, F, F, D, D,…
## $ Clarity    <fct> VS1, VVS1, VS1, VVS1, IF, VVS1, VS1, VS1, VVS1, IF, V…
## $ Depth      <dbl> 68.6, 61.9, 62.1, 60.8, 60.8, 61.5, 62.5, 62.3, 61.4,…
## $ PricePerCt <dbl> 6693.3, 3159.0, 1755.0, 3159.0, 4758.8, 2895.8, 2457.…
## $ TotalPrice <dbl> 7228.8, 979.3, 544.1, 1010.9, 1570.4, 955.6, 860.0, 8…
What are the variables here?
5 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
data("Diamonds")
glimpse(Diamonds)

## Observations: 351
## Variables: 6
## $ Carat      <dbl> 1.08, 0.31, 0.31, 0.32, 0.33, 0.33, 0.35, 0.35, 0.37,…
## $ Color      <fct> E, F, H, F, D, G, F, F, F, D, E, F, D, D, F, F, D, D,…
## $ Clarity    <fct> VS1, VVS1, VS1, VVS1, IF, VVS1, VS1, VS1, VVS1, IF, V…
## $ Depth      <dbl> 68.6, 61.9, 62.1, 60.8, 60.8, 61.5, 62.5, 62.3, 61.4,…
## $ PricePerCt <dbl> 6693.3, 3159.0, 1755.0, 3159.0, 4758.8, 2895.8, 2457.…
## $ TotalPrice <dbl> 7228.8, 979.3, 544.1, 1010.9, 1570.4, 955.6, 860.0, 8…
What are the variables here?
fct: "factor" this is a type of categorical variable
5 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
glimpse(starwars)

## Observations: 87
## Variables: 5
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "L…
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, …
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "bro…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "lig…
6 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesRecall from the first week of class, you can use the glimpse() function to see all of your variables and their types
glimpse(starwars)

## Observations: 87
## Variables: 5
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "L…
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, …
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "bro…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "lig…
chr: "character" this is a type of categorical variable
6 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesSo far, our models have only included numeric (quantitative) variables
7 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesSo far, our models have only included numeric (quantitative) variablesWhat would the equation be for predicting yy from xx when xx is numeric?

7 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesSo far, our models have only included numeric (quantitative) variablesWhat would the equation be for predicting yy from xx when xx is numeric?

What would happen if xx is categorical?
7 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesSo far, our models have only included numeric (quantitative) variablesWhat would the equation be for predicting yy from xx when xx is numeric?

What would happen if xx is categorical?What would the equation be for predicting yy from xx if xx is categorical with 2 levels?

7 / 27

by Dr. Lucy D'Agostino McGowan

Variable typesSo far, our models have only included numeric (quantitative) variablesWhat would the equation be for predicting yy from xx when xx is numeric?

What would happen if xx is categorical?What would the equation be for predicting yy from xx if xx is categorical with 2 levels?
What would the equation be for predicting yy from xx if xx is categorical with 3 levels?

7 / 27

indicator variable

An indicator variable uses two values, usually 0 and 1, to indicate whether a data case does (1) or does not (0) belong to a specific category

8 / 27

data("Diamonds")

Show entries

Search:

	TotalPrice	Color	Carat
1	7228.8	E	1.08
2	979.3	F	0.31
3	544.1	H	0.31
4	1010.9	F	0.32
5	1570.4	D	0.33
6	955.6	G	0.33
7	860	F	0.35
8	860	F	0.35
9	1258.7	F	0.37
10	1923.8	D	0.38

Showing 1 to 10 of 351 entries

Previous1 2 3 4 5…36Next

9 / 27

Indicator variables

What does this line of code do?

Diamonds <- Diamonds %>%
  mutate(
    ColorD = ifelse(Color == "D", 1, 0),
    ColorE = ifelse(Color == "E", 1, 0),
    ColorF = ifelse(Color == "F", 1, 0),
    ColorG = ifelse(Color == "G", 1, 0),
    ColorH = ifelse(Color == "H", 1, 0),
    ColorI = ifelse(Color == "I", 1, 0),
    ColorJ = ifelse(Color == "J", 1, 0)
  )

10 / 27

Indicator variables

What does this line of code do?

Diamonds <- Diamonds %>%
  mutate(
    ColorD = ifelse(Color == "D", 1, 0), 
    ColorE = ifelse(Color == "E", 1, 0),
    ColorF = ifelse(Color == "F", 1, 0),
    ColorG = ifelse(Color == "G", 1, 0),
    ColorH = ifelse(Color == "H", 1, 0),
    ColorI = ifelse(Color == "I", 1, 0),
    ColorJ = ifelse(Color == "J", 1, 0)
  )

11 / 27

Indicator variables

Show entries

Search:

	TotalPrice	Carat	Color	ColorD	ColorE	ColorF	ColorG	ColorH
1	7228.8	1.08	E	0	1	0	0	0
2	979.3	0.31	F	0	0	1	0	0
3	544.1	0.31	H	0	0	0	0	1
4	1010.9	0.32	F	0	0	1	0	0
5	1570.4	0.33	D	1	0	0	0	0
6	955.6	0.33	G	0	0	0	1	0
7	860	0.35	F	0	0	1	0	0
8	860	0.35	F	0	0	1	0	0
9	1258.7	0.37	F	0	0	1	0	0
10	1923.8	0.38	D	1	0	0	0	0

Showing 1 to 10 of 351 entries

Previous1 2 3 4 5…36Next

12 / 27

Indicator variables

What if I wanted to model the relationship between TotalPrice and Color?

Show entries

Search:

	TotalPrice	Carat	Color	ColorD	ColorE	ColorF	ColorH
1	7228.8	1.08	E	0	1	0	0
2	979.3	0.31	F	0	0	1	0
3	544.1	0.31	H	0	0	0	1
4	1010.9	0.32	F	0	0	1	0
5	1570.4	0.33	D	1	0	0	0

Showing 1 to 5 of 351 entries

Previous1 2 3 4 5…71Next

13 / 27

Indicator variables

Why is ColorJ NA?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI + ColorJ,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI + ColorJ, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI       ColorJ  
##        6732         5704           NA

14 / 27

Indicator variables

Why is ColorJ NA?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI + ColorJ,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI + ColorJ, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI       ColorJ  
##        6732         5704           NA

When including indicator variables in a model for k categories, always include k-1

14 / 27

Indicator variables

Why is ColorJ NA?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI + ColorJ,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI + ColorJ, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI       ColorJ  
##        6732         5704           NA

When including indicator variables in a model for k categories, always include k-1
The one that is left out is the "reference" category

14 / 27

Indicator variables

What is the reference category?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704

15 / 27

Indicator variables

What is the reference category?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704

Interpretation: A diamond with Color D compared to color J increases the expected total price by 3632.

15 / 27

Indicator variables

What is the reference category?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704

Interpretation: A diamond with Color D compared to color J increases the expected total price by 3632.
Interpretation: A diamond with Color E compared to color J increases the expected total price by 2423

15 / 27

Indicator variables

What is the reference category?

lm(TotalPrice ~ ColorD + ColorE + ColorF + ColorG + ColorH + ColorI,
   data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG + 
##     ColorH + ColorI, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704

Interpretation: A diamond with Color D compared to color J increases the expected total price by 3632.
What is the interpretation for a diamond with Color F?

16 / 27

by Dr. Lucy D'Agostino McGowan

R is smartlm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632
17 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632

18 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632

What is the interpretation for Color E now?

18 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632

What is the interpretation for Color E now?
What if we wanted a different referent category?

18 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632

What is the interpretation for Color E now?
What if we wanted a different referent category?
- We could code the indicators ourselves

18 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorE       ColorF       ColorG       ColorH  
##        5569        -1209         3592         3990         3100  
##      ColorI       ColorJ  
##        2071        -3632

What is the interpretation for Color E now?
What if we wanted a different referent category?
- We could code the indicators ourselves
- We could use the forcats

18 / 27

forcats

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values.
The forcats package is loaded with the tidyverse, it helps you do things like order your factors

Source: forcats.tidyverse.org

19 / 27

forcats

levels(Diamonds$Color)

## [1] "D" "E" "F" "G" "H" "I" "J"

20 / 27

forcats

levels(Diamonds$Color)

## [1] "D" "E" "F" "G" "H" "I" "J"

new_levels <- c("J", "D", "E", "F", "G", "H", "I")
Diamonds <- Diamonds %>%
  mutate(Color = fct_relevel(Color, new_levels))

levels(Diamonds$Color)

## [1] "J" "D" "E" "F" "G" "H" "I"

20 / 27

by Dr. Lucy D'Agostino McGowan

R is smartlm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704
21 / 27

R is smart

What is the reference category?

lm(TotalPrice ~ Color, data = Diamonds)

## 
## Call:
## lm(formula = TotalPrice ~ Color, data = Diamonds)
## 
## Coefficients:
## (Intercept)       ColorD       ColorE       ColorF       ColorG  
##        1936         3632         2423         7224         7623  
##      ColorH       ColorI  
##        6732         5704

22 / 27

by Dr. Lucy D'Agostino McGowan

What if the variable is binaryA binary variable is a special type of categorical variable with two levels
23 / 27

by Dr. Lucy D'Agostino McGowan

ICU exampleA sample of 200 patients in an ICU unit
Want to see if the patient's heart rate is related to whether they were admitted via the emergency room 
24 / 27

by Dr. Lucy D'Agostino McGowan

ICU exampleA sample of 200 patients in an ICU unit
Want to see if the patient's heart rate is related to whether they were admitted via the emergency room y: Heart rate (beats per minute)
x: indicator for emergency room admission

24 / 27

by Dr. Lucy D'Agostino McGowan

ICU exampleA sample of 200 patients in an ICU unit
Want to see if the patient's heart rate is related to whether they were admitted via the emergency room y: Heart rate (beats per minute)
x: indicator for emergency room admission

Aside: Is this inference or prediction?
24 / 27

Binary x variable

data("ICU")
lm(Pulse ~ Emergency, data = ICU)

## 
## Call:
## lm(formula = Pulse ~ Emergency, data = ICU)
## 
## Coefficients:
## (Intercept)    Emergency  
##       91.11        10.63

25 / 27

Binary x variable

data("ICU")
lm(Pulse ~ Emergency, data = ICU)

## 
## Call:
## lm(formula = Pulse ~ Emergency, data = ICU)
## 
## Coefficients:
## (Intercept)    Emergency  
##       91.11        10.63

How can we interpret ${\hat{β}}_{0}$ now?

25 / 27

Binary x variable

data("ICU")
lm(Pulse ~ Emergency, data = ICU)

## 
## Call:
## lm(formula = Pulse ~ Emergency, data = ICU)
## 
## Coefficients:
## (Intercept)    Emergency  
##       91.11        10.63

How can we interpret ${\hat{β}}_{0}$ now?
How can we interpret ${\hat{β}}_{1}$ ?

25 / 27

by Dr. Lucy D'Agostino McGowan

 DiamondsGo to RStudio Cloud and open Diamonds
26 / 27

by Dr. Lucy D'Agostino McGowan

27 / 27

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Types of variables

Diamonds

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

Variable types

indicator variable

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

Indicator variables

R is smart

R is smart

R is smart

R is smart

R is smart

R is smart

forcats

forcats

forcats

R is smart

R is smart

What if the variable is binary

ICU example

ICU example

ICU example

Binary x variable

Binary x variable

Binary x variable

Diamonds

Diamonds

Help

`Diamonds`

`Diamonds`

`Diamonds`