NC bike crashes
NC bike crashes
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Characteristics of tidy data:
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Characteristics of tidy data:
Characteristics of untidy data:
!@#$%^&*()
Is each of the following a dataset or a summary table?
## # A tibble: 87 x 3## name height mass## <chr> <int> <dbl>## 1 Luke Skywalker 172 77## 2 C-3PO 167 75## 3 R2-D2 96 32## 4 Darth Vader 202 136## 5 Leia Organa 150 49## 6 Owen Lars 178 120## 7 Beru Whitesun lars 165 75## 8 R5-D4 97 32## 9 Biggs Darklighter 183 84## 10 Obi-Wan Kenobi 182 77## # … with 77 more rows
## # A tibble: 5 x 2## gender avg_height## <chr> <dbl>## 1 <NA> 120 ## 2 female 165.## 3 hermaphrodite 175 ## 4 male 179.## 5 none 200
The pipe operator is implemented in the package magrittr, it's pronounced "and then".
park(drive(start_car(find("keys")), to = "campus"))
find("keys") %>% start_car() %>% drive(to = "campus") %>% park()
To send results to a function argument other than first one or to use the previous result for multiple arguments, use .
:
starwars %>% filter(species == "Human") %>% lm(mass ~ height, data = .)
## ## Call:## lm(formula = mass ~ height, data = .)## ## Coefficients:## (Intercept) height ## -116.58 1.11
The dataset is in the dsbox package:
library(dsbox)ncbikecrash
View the names of variables via
names(ncbikecrash)
## [1] "object_id" "city" "county" ## [4] "region" "development" "locality" ## [7] "on_road" "rural_urban" "speed_limit" ## [10] "traffic_control" "weather" "workzone" ## [13] "bike_age" "bike_age_group" "bike_alcohol" ## [16] "bike_alcohol_drugs" "bike_direction" "bike_injury" ## [19] "bike_position" "bike_race" "bike_sex" ## [22] "driver_age" "driver_age_group" "driver_alcohol" ## [25] "driver_alcohol_drugs" "driver_est_speed" "driver_injury" ## [28] "driver_race" "driver_sex" "driver_vehicle_type" ## [31] "crash_alcohol" "crash_date" "crash_day" ## [34] "crash_group" "crash_hour" "crash_location" ## [37] "crash_month" "crash_severity" "crash_time" ## [40] "crash_type" "crash_year" "ambulance_req" ## [43] "hit_run" "light_condition" "road_character" ## [46] "road_class" "road_condition" "road_configuration" ## [49] "road_defects" "road_feature" "road_surface" ## [52] "num_bikes_ai" "num_bikes_bi" "num_bikes_ci" ## [55] "num_bikes_ki" "num_bikes_no" "num_bikes_to" ## [58] "num_bikes_ui" "num_lanes" "num_units" ## [61] "distance_mi_from" "frm_road" "rte_invd_cd" ## [64] "towrd_road" "geo_point" "geo_shape"
See detailed descriptions with ?ncbikecrash
.
data(ncbikecrash)
, click on the name of the data frame to view it in the data viewerglimpse
function to take a peekglimpse(ncbikecrash)
## Observations: 7,467## Variables: 66## $ object_id <int> 1686, 1674, 1673, 1687, 1653, 1665, 1642, 1…## $ city <chr> "None - Rural Crash", "Henderson", "None - …## $ county <chr> "Wayne", "Vance", "Lincoln", "Columbus", "N…## $ region <chr> "Coastal", "Piedmont", "Piedmont", "Coastal…## $ development <chr> "Farms, Woods, Pastures", "Residential", "F…## $ locality <chr> "Rural (<30% Developed)", "Mixed (30% To 70…## $ on_road <chr> "SR 1915", "NICHOLAS ST", "US 321", "W BURK…## $ rural_urban <chr> "Rural", "Urban", "Rural", "Urban", "Urban"…## $ speed_limit <chr> "50 - 55 MPH", "30 - 35 MPH", "50 - 55 M…## $ traffic_control <chr> "No Control Present", "Stop Sign", "Double …## $ weather <chr> "Clear", "Clear", "Clear", "Rain", "Clear",…## $ workzone <chr> "No", "No", "No", "No", "No", "No", "No", "…## $ bike_age <chr> "52", "66", "33", "52", "22", "15", "41", "…## $ bike_age_group <chr> "50-59", "60-69", "30-39", "50-59", "20-24"…## $ bike_alcohol <chr> "No", "No", "No", "Yes", "No", "No", "No", …## $ bike_alcohol_drugs <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…## $ bike_direction <chr> "With Traffic", "With Traffic", "With Traff…## $ bike_injury <chr> "B: Evident Injury", "C: Possible Injury", …## $ bike_position <chr> "Bike Lane / Paved Shoulder", "Travel Lane"…## $ bike_race <chr> "Black", "Black", "White", "Black", "White"…## $ bike_sex <chr> "Male", "Male", "Male", "Male", "Female", "…## $ driver_age <chr> "34", NA, "37", "55", "25", "17", NA, "50",…## $ driver_age_group <chr> "30-39", NA, "30-39", "50-59", "25-29", "0-…## $ driver_alcohol <chr> "No", "Missing", "No", "No", "No", "No", "M…## $ driver_alcohol_drugs <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…## $ driver_est_speed <chr> "51-55 mph", "6-10 mph", "41-45 mph", "11-1…## $ driver_injury <chr> "O: No Injury", "Unknown Injury", "O: No In…## $ driver_race <chr> "White", "Unknown/Missing", "Hispanic", "Bl…## $ driver_sex <chr> "Male", NA, "Female", "Male", "Male", "Fema…## $ driver_vehicle_type <chr> "Single Unit Truck (2-Axle, 6-Tire)", NA, "…## $ crash_alcohol <chr> "No", "No", "No", "Yes", "No", "No", "No", …## $ crash_date <chr> "11DEC2013", "20NOV2013", "03NOV2013", "14D…## $ crash_day <chr> "Wednesday", "Wednesday", "Sunday", "Saturd…## $ crash_group <chr> "Motorist Overtaking Bicyclist", "Bicyclist…## $ crash_hour <int> 6, 20, 18, 18, 13, 17, 17, 7, 15, 2, 12, 22…## $ crash_location <chr> "Non-Intersection", "Intersection", "Non-In…## $ crash_month <chr> "December", "November", "November", "Decemb…## $ crash_severity <chr> "B: Evident Injury", "C: Possible Injury", …## $ crash_time <drtn> 06:10:00, 20:41:00, 18:05:00, 18:34:00, 13…## $ crash_type <chr> "Motorist Overtaking - Undetected Bicyclist…## $ crash_year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2…## $ ambulance_req <chr> "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Y…## $ hit_run <chr> "No", "Yes", "No", "No", "No", "No", "Yes",…## $ light_condition <chr> "Dark - Roadway Not Lighted", NA, "Dark - R…## $ road_character <chr> "Straight - Level", "Straight - Level", "St…## $ road_class <chr> "State Secondary Route", "Local Street", "U…## $ road_condition <chr> "Dry", "Dry", "Dry", "Water (Standing, Movi…## $ road_configuration <chr> "Two-Way, Not Divided", "Two-Way, Divided, …## $ road_defects <chr> "None", NA, "None", "None", "None", "None",…## $ road_feature <chr> "No Special Feature", "T-Intersection", "No…## $ road_surface <chr> "Coarse Asphalt", "Smooth Asphalt", "Smooth…## $ num_bikes_ai <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_bi <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_ci <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_ki <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_no <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_to <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_bikes_ui <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ num_lanes <chr> "2 lanes", "2 lanes", "2 lanes", "1 lane", …## $ num_units <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…## $ distance_mi_from <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0"…## $ frm_road <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…## $ rte_invd_cd <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…## $ towrd_road <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…## $ geo_point <chr> "35.3336070056, -77.9955023901", "36.315187…## $ geo_shape <chr> "{\"type\": \"Point\", \"coordinates\": [-7…
dplyr is based on the concepts of functions as verbs that manipulate data frames.
filter
: pick rows matching criteriaslice
: pick rows using index(es)select
: pick columns by namepull
: grab a column as a vectorarrange
: reorder rowsmutate
: add new variablesdistinct
: filter for unique rowssample_n
/ sample_frac
: randomly sample rowssummarise
: reduce variables to values%>%
operator in dplyr functions is called the pipe operator. This means you "pipe" the output of the previous line of code as the first input of the next line of code.%>%
operator in dplyr functions is called the pipe operator. This means you "pipe" the output of the previous line of code as the first input of the next line of code.+
operator in ggplot2 functions is used for "layering". This means you create the plot in layers, separated by +
.filter
to select a subset of rowsfor crashes in Durham County
ncbikecrash %>% filter(county == "Durham")
## # A tibble: 340 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 2452 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 2 2441 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 3 2466 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 4 549 Durh… Durham Piedm… Residential Urban (… PARK A… Urban ## 5 598 Durh… Durham Piedm… Residential Urban (… BELT S… Urban ## 6 603 Durh… Durham Piedm… Residential Urban (… HINSON… Urban ## 7 3974 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 8 7134 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 9 1670 Durh… Durham Piedm… Commercial Urban (… INFINI… Urban ## 10 1773 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## # … with 330 more rows, and 58 more variables: speed_limit <chr>,## # traffic_control <chr>, weather <chr>, workzone <chr>, bike_age <chr>,## # bike_age_group <chr>, bike_alcohol <chr>, bike_alcohol_drugs <chr>,## # bike_direction <chr>, bike_injury <chr>, bike_position <chr>,## # bike_race <chr>, bike_sex <chr>, driver_age <chr>,## # driver_age_group <chr>, driver_alcohol <chr>,## # driver_alcohol_drugs <chr>, driver_est_speed <chr>,## # driver_injury <chr>, driver_race <chr>, driver_sex <chr>,## # driver_vehicle_type <chr>, crash_alcohol <chr>, crash_date <chr>,## # crash_day <chr>, crash_group <chr>, crash_hour <int>,## # crash_location <chr>, crash_month <chr>, crash_severity <chr>,## # crash_time <drtn>, crash_type <chr>, crash_year <int>,## # ambulance_req <chr>, hit_run <chr>, light_condition <chr>,## # road_character <chr>, road_class <chr>, road_condition <chr>,## # road_configuration <chr>, road_defects <chr>, road_feature <chr>,## # road_surface <chr>, num_bikes_ai <int>, num_bikes_bi <int>,## # num_bikes_ci <int>, num_bikes_ki <int>, num_bikes_no <int>,## # num_bikes_to <int>, num_bikes_ui <int>, num_lanes <chr>,## # num_units <int>, distance_mi_from <chr>, frm_road <chr>,## # rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>, geo_shape <chr>
filter
for many conditions at oncefor crashes in Durham County where biker was 0-5 years old
ncbikecrash %>% filter(county == "Durham", bike_age_group == "0-5")
## # A tibble: 4 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 4062 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 2 414 Durh… Durham Piedm… Residential Urban (… PVA 90… Urban ## 3 3016 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 4 1383 Durh… Durham Piedm… Residential Urban (… PVA 62… Urban ## # … with 58 more variables: speed_limit <chr>, traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>,## # bike_sex <chr>, driver_age <chr>, driver_age_group <chr>,## # driver_alcohol <chr>, driver_alcohol_drugs <chr>,## # driver_est_speed <chr>, driver_injury <chr>, driver_race <chr>,## # driver_sex <chr>, driver_vehicle_type <chr>, crash_alcohol <chr>,## # crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <drtn>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>,## # frm_road <chr>, rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>,## # geo_shape <chr>
operator | definition | operator | definition | |
---|---|---|---|---|
< |
less than | x | y |
x OR y |
|
<= |
less than or equal to | is.na(x) |
test if x is NA |
|
> |
greater than | !is.na(x) |
test if x is not NA |
|
>= |
greater than or equal to | x %in% y |
test if x is in y |
|
== |
exactly equal to | !(x %in% y) |
test if x is not in y |
|
!= |
not equal to | !x |
not x |
|
x & y |
x AND y |
select
to keep variablesncbikecrash %>% filter(county == "Durham", bike_age_group == "0-5") %>% select(locality, speed_limit)
## # A tibble: 4 x 2## locality speed_limit ## <chr> <chr> ## 1 Urban (>70% Developed) 30 - 35 MPH## 2 Urban (>70% Developed) 5 - 15 MPH ## 3 Urban (>70% Developed) 20 - 25 MPH## 4 Urban (>70% Developed) 20 - 25 MPH
select
to exclude variablesncbikecrash %>% select(-object_id)
## # A tibble: 7,467 x 65## city county region development locality on_road rural_urban speed_limit## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 None… Wayne Coast… Farms, Woo… Rural (… SR 1915 Rural 50 - 55 M…## 2 Hend… Vance Piedm… Residential Mixed (… NICHOL… Urban 30 - 35 M…## 3 None… Linco… Piedm… Farms, Woo… Rural (… US 321 Rural 50 - 55 M…## 4 Whit… Colum… Coast… Commercial Urban (… W BURK… Urban 30 - 35 M…## 5 Wilm… New H… Coast… Residential Urban (… RACINE… Urban <NA> ## 6 None… Robes… Coast… Farms, Woo… Rural (… SR 1513 Rural 50 - 55 M…## 7 None… Richm… Piedm… Residential Mixed (… SR 1903 Rural 30 - 35 M…## 8 Rale… Wake Piedm… Commercial Urban (… PERSON… Urban 30 - 35 M…## 9 Whit… Colum… Coast… Residential Rural (… FLOWER… Urban 30 - 35 M…## 10 New … Craven Coast… Residential Urban (… SUTTON… Urban 20 - 25 M…## # … with 7,457 more rows, and 57 more variables: traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>,## # bike_sex <chr>, driver_age <chr>, driver_age_group <chr>,## # driver_alcohol <chr>, driver_alcohol_drugs <chr>,## # driver_est_speed <chr>, driver_injury <chr>, driver_race <chr>,## # driver_sex <chr>, driver_vehicle_type <chr>, crash_alcohol <chr>,## # crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <drtn>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>,## # frm_road <chr>, rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>,## # geo_shape <chr>
select
a range of variablesncbikecrash %>% select(city:locality)
## # A tibble: 7,467 x 5## city county region development locality ## <chr> <chr> <chr> <chr> <chr> ## 1 None - Rural … Wayne Coastal Farms, Woods, Pa… Rural (<30% Develop…## 2 Henderson Vance Piedmo… Residential Mixed (30% To 70% D…## 3 None - Rural … Lincoln Piedmo… Farms, Woods, Pa… Rural (<30% Develop…## 4 Whiteville Columbus Coastal Commercial Urban (>70% Develop…## 5 Wilmington New Hanov… Coastal Residential Urban (>70% Develop…## 6 None - Rural … Robeson Coastal Farms, Woods, Pa… Rural (<30% Develop…## 7 None - Rural … Richmond Piedmo… Residential Mixed (30% To 70% D…## 8 Raleigh Wake Piedmo… Commercial Urban (>70% Develop…## 9 Whiteville Columbus Coastal Residential Rural (<30% Develop…## 10 New Bern Craven Coastal Residential Urban (>70% Develop…## # … with 7,457 more rows
slice
for certain row numbersFirst five
ncbikecrash %>% slice(1:5)
## # A tibble: 5 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 1686 None… Wayne Coast… Farms, Woo… Rural (… SR 1915 Rural ## 2 1674 Hend… Vance Piedm… Residential Mixed (… NICHOL… Urban ## 3 1673 None… Linco… Piedm… Farms, Woo… Rural (… US 321 Rural ## 4 1687 Whit… Colum… Coast… Commercial Urban (… W BURK… Urban ## 5 1653 Wilm… New H… Coast… Residential Urban (… RACINE… Urban ## # … with 58 more variables: speed_limit <chr>, traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>,## # bike_sex <chr>, driver_age <chr>, driver_age_group <chr>,## # driver_alcohol <chr>, driver_alcohol_drugs <chr>,## # driver_est_speed <chr>, driver_injury <chr>, driver_race <chr>,## # driver_sex <chr>, driver_vehicle_type <chr>, crash_alcohol <chr>,## # crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <drtn>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>,## # frm_road <chr>, rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>,## # geo_shape <chr>
slice
for certain row numbersLast five
last_row <- nrow(ncbikecrash)ncbikecrash %>% slice((last_row - 4):last_row)
## # A tibble: 5 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 6989 High… Guilf… Piedm… Residential Urban (… <NA> Urban ## 2 6991 Wilm… New H… Coast… Residential Urban (… <NA> Urban ## 3 6995 Kins… Lenoir Coast… Commercial Urban (… <NA> Urban ## 4 6998 Faye… Cumbe… Coast… Residential Urban (… <NA> Urban ## 5 7000 None… Onslow Coast… Farms, Woo… Rural (… <NA> Rural ## # … with 58 more variables: speed_limit <chr>, traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>,## # bike_sex <chr>, driver_age <chr>, driver_age_group <chr>,## # driver_alcohol <chr>, driver_alcohol_drugs <chr>,## # driver_est_speed <chr>, driver_injury <chr>, driver_race <chr>,## # driver_sex <chr>, driver_vehicle_type <chr>, crash_alcohol <chr>,## # crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <drtn>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>,## # frm_road <chr>, rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>,## # geo_shape <chr>
pull
to extract a column as a vectorncbikecrash %>% slice(1:6) %>% pull(locality)
## [1] "Rural (<30% Developed)" "Mixed (30% To 70% Developed)"## [3] "Rural (<30% Developed)" "Urban (>70% Developed)" ## [5] "Urban (>70% Developed)" "Rural (<30% Developed)"
vs.
ncbikecrash %>% slice(1:6) %>% select(locality)
## # A tibble: 6 x 1## locality ## <chr> ## 1 Rural (<30% Developed) ## 2 Mixed (30% To 70% Developed)## 3 Rural (<30% Developed) ## 4 Urban (>70% Developed) ## 5 Urban (>70% Developed) ## 6 Rural (<30% Developed)
sample_n
/ sample_frac
for a random samplesample_n
: randomly sample 5 observationsncbikecrash_n5 <- ncbikecrash %>% sample_n(5, replace = FALSE)dim(ncbikecrash_n5)
## [1] 5 66
sample_n
/ sample_frac
for a random samplesample_n
: randomly sample 5 observationsncbikecrash_n5 <- ncbikecrash %>% sample_n(5, replace = FALSE)dim(ncbikecrash_n5)
## [1] 5 66
sample_frac
: randomly sample 20% of observationsncbikecrash_perc20 <-ncbikecrash %>% sample_frac(0.2, replace = FALSE)dim(ncbikecrash_perc20)
## [1] 1493 66
distinct
to filter for unique rowsAnd arrange
to order alphabetically
ncbikecrash %>% select(county, city) %>% distinct() %>% arrange(county, city)
## # A tibble: 391 x 2## county city ## <chr> <chr> ## 1 Alamance Alamance ## 2 Alamance Burlington ## 3 Alamance Elon ## 4 Alamance Elon College ## 5 Alamance Gibsonville ## 6 Alamance Graham ## 7 Alamance Green Level ## 8 Alamance Mebane ## 9 Alamance None - Rural Crash## 10 Alexander None - Rural Crash## # … with 381 more rows
summarise
to reduce variables to valuesncbikecrash %>% summarise(avg_hr = mean(crash_hour))
## # A tibble: 1 x 1## avg_hr## <dbl>## 1 14.7
group_by
to do calculations on groupsncbikecrash %>% group_by(hit_run) %>% summarise(avg_hr = mean(crash_hour))
## # A tibble: 2 x 2## hit_run avg_hr## <chr> <dbl>## 1 No 14.6## 2 Yes 15.0
count
observations in groupsncbikecrash %>% count(driver_alcohol_drugs)
## # A tibble: 6 x 2## driver_alcohol_drugs n## <chr> <int>## 1 <NA> 6654## 2 Missing 99## 3 No 695## 4 Yes-Alcohol, impairment suspected 12## 5 Yes-Alcohol, no impairment detected 3## 6 Yes-Drugs, impairment suspected 4
mutate
to add new variablesncbikecrash %>% mutate(driver_alcohol_drugs_simplified = case_when( driver_alcohol_drugs == "Missing" ~ NA, str_detect(driver_alcohol_drugs, "Yes") ~ "Yes", TRUE ~ "No" ))
mutate
Most often when you define a new variable with mutate
you'll also want to save the resulting data frame, often by writing over the original data frame.
ncbikecrash <- ncbikecrash %>% mutate(driver_alcohol_drugs_simplified = case_when( str_detect(driver_alcohol_drugs, "Yes") ~ "Yes", TRUE ~ driver_alcohol_drugs ))
mutate
Most often when you define a new variable with mutate
you'll also want to save the resulting data frame, often by writing over the original data frame.
ncbikecrash %>% mutate(driver_alcohol_drugs_simplified = case_when( str_detect(driver_alcohol_drugs, "Yes") ~ "Yes", TRUE ~ driver_alcohol_drugs )) -> ncbikecrash
ncbikecrash %>% count(driver_alcohol_drugs, driver_alcohol_drugs_simplified)
## # A tibble: 6 x 3## driver_alcohol_drugs driver_alcohol_drugs_simplified n## <chr> <chr> <int>## 1 <NA> <NA> 6654## 2 Missing Missing 99## 3 No No 695## 4 Yes-Alcohol, impairment suspected Yes 12## 5 Yes-Alcohol, no impairment detected Yes 3## 6 Yes-Drugs, impairment suspected Yes 4
ncbikecrash %>% count(driver_alcohol_drugs_simplified)
## # A tibble: 4 x 2## driver_alcohol_drugs_simplified n## <chr> <int>## 1 <NA> 6654## 2 Missing 99## 3 No 695## 4 Yes 19
NC bike crashes
NC bike crashes
eval
chunk option to TRUE
and knitNC bike crashes
NC bike crashes
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |