Starwars
Starwars
project."The simple graph has brought more information to the data analyst’s mind than any other device." — John Tukey
gg
in "ggplot2" stands for Grammar of Graphics† Source: BloggoType
Which function creates the plot?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values (geom_point).
What is the dataset being plotted?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values (geom_point).
Which variables are on the x-axis and y-axis?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values (geom_point).
What about that warning?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
## Warning: Removed 28 rows containing missing values (geom_point).
What does geom_smooth()
do? What else changed between the previous plot and this one?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + geom_smooth() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
ggplot()
is the main function in ggplot2 and plots are constructed in layersggplot + geom_xxx
ggplot()
is the main function in ggplot2 and plots are constructed in layersggplot + geom_xxx
or, more precisely
ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) + geom_xxx() + other options
library(tidyverse)
library(tidyverse)
What does each row represent? What does each column represent?
starwars
## # A tibble: 5 x 13## name height mass hair_color skin_color eye_color birth_year gender## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> ## 4 Dart… 202 136 none white yellow 41.9 male ## 5 Leia… 150 49 brown light brown 19 female## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,## # vehicles <list>, starships <list>
What does each row represent? What does each column represent?
starwars
## # A tibble: 5 x 13## name height mass hair_color skin_color eye_color birth_year gender## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> ## 4 Dart… 202 136 none white yellow 41.9 male ## 5 Leia… 150 49 brown light brown 19 female## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,## # vehicles <list>, starships <list>
Take a glimpse
at the data:
glimpse(starwars)
## Observations: 87## Variables: 13## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "L…## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, …## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.…## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "bro…## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "lig…## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "…## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, …## $ gender <chr> "male", NA, NA, "male", "female", "male", "female", N…## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaa…## $ species <chr> "Human", "Droid", "Droid", "Human", "Human", "Human",…## $ films <list> [<"Revenge of the Sith", "Return of the Jedi", "The …## $ vehicles <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <…## $ starships <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanc…
How many rows and columns does this dataset have? What does each row represent? What does each column represent?
Run the following in the Console to view the help
?starwars
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point()
## Warning: Removed 28 rows containing missing values (geom_point).
## Warning: Removed 28 rows containing missing values (geom_point).
How would you describe this relationship? What other variables would help us understand data points that don't follow the overall trend? Who is the not so tall but really chubby character?
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + labs(title = "Mass vs. height of Starwars characters", x = "Height (cm)", y = "Weight (kg)")
We can map additional variables to various features of the plot:
Visual characteristics of plotting characters that can be mapped to a specific variable in the data are
color
size
shape
alpha
(transparency)ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender)) + geom_point()
Let's map the size to birth_year:
ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender, size = birth_year )) + geom_point()
Let's now increase the size of all points not based on the values of a variable in the data:
ggplot(data = starwars, mapping = aes(x = height, y = mass, color = gender)) + geom_point(size = 2)
aesthetics | discrete | continuous |
---|---|---|
color | rainbow of colors | gradient |
size | discrete steps | linear mapping between radius and value |
shape | different shape for each | shouldn't (and doesn't) work |
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + facet_grid(. ~ gender) + geom_point() + labs(title = "Mass vs. height of Starwars characters", subtitle = "Faceted by gender", x = "Height (cm)", y = "Weight (kg)")
In the next few slides describe what each plot displays. Think about how the code relates to the output.
In the next few slides describe what each plot displays. Think about how the code relates to the output.
The plots in the next few slides do not have proper titles, axis labels, etc. because we want you to figure out what's happening in the plots. But you should always label your plots!
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_grid(gender ~ .)
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_grid(. ~ gender)
ggplot(data = starwars, mapping = aes(x = height, y = mass)) + geom_point() + facet_wrap(~ eye_color)
facet_grid()
: rows ~ cols
.
for no splitfacet_wrap()
: 1d ribbon wrapped into 2dStarwars
Starwars
project.mean
), median (median
), mode (not always useful)range
), standard deviation (sd
), inter-quartile range (IQR
)ggplot(data = starwars, mapping = aes(x = height)) + geom_histogram(binwidth = 10)
ggplot(data = starwars, mapping = aes(x = height)) + geom_density()
ggplot(data = starwars, mapping = aes(y = height, x = gender)) + geom_boxplot()
ggplot(data = starwars, mapping = aes(y = height, x = gender)) + geom_boxplot(outlier.shape = NA) + geom_jitter()
ggplot(data = starwars, mapping = aes(x = gender)) + geom_bar()
ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color)) + geom_bar()
ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color)) + geom_bar(position = "fill") + labs(y = "proportion")
Which plot is a more useful representation for visualizing the relationship between gender and height?
Starwars
Starwars
project.Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |