Annotated code to create racing bar charts using nycflight13 dataset
Data Visualization
Author
Aditya Dahiya
Published
October 17, 2023
Background
We’re about to embark on a thrilling journey through the world of animated racing bar charts in R - dynamic, action-packed data visualization that showcases the ebb and flow of information over time.
Inspired by the ingenious work of Deepsha Meghnani’s article on TidyTuesday and drawing creative insights from the brilliant minds at datacornering.com, we’ll be crafting our very own data-driven racing bar chart masterpiece.
Our canvas is the nycflights13 dataset, with details on flights departing from New York City’s three iconic airports, courtesy of various carriers, all throughout the year 2013.
But that’s not all. We won’t stop at just displaying the numbers. We’ll also throw in some flair by illustrating the average delays associated with each of these carriers, injecting a dose of character into the aviation landscape of the Big Apple. We’re going to unravel the secrets of creating animated racing bar charts using the formidable ggplot2 and gganimate packages in R.
Code
library(tidyverse) # Loading Tidyverse for data wranglinglibrary(gt) # Loading gt package for beautiful tableslibrary(gganimate) # For animationslibrary(nycflights13) # for the flights data-setlibrary(lubridate) # to handle dates in tidyverse
Code
# Loading the flights datasetdata("flights")# Pick out the top nine airline carriers only, to avoid crowding the# upcoming animated plotcarriers_to_plot <- flights |># Count the number of flights for each carrier and sort them in descending ordercount(carrier, sort =TRUE) |># Select the top 9 carriers based on flight countslice_head(n =9) |># Extract the 'carrier' column from the resultpull(carrier)df <- flights |># Filter the flights dataset to include only the top 9 carriersfilter(carrier %in% carriers_to_plot) |># Create a new 'date' column by combining year, month, and day# This allows us to make a single date variable, that nicely evolves# over time in an animated plotmutate(date =make_date(year = year, month = month, day = day)) |># Select only the 'date' and 'carrier' columnsselect(date, carrier) |># Joining the full names of airlines for the annotations in animated plotleft_join(nycflights13::airlines, by =join_by(carrier)) |># Remove the 'carrier' column after joiningselect(-carrier) |># Rename the 'name' column to 'carrier'rename(carrier = name) |># Count the number of flights for each date and carrier combinationcount(date, carrier)
Example 1
The visualization below captures the total number of flights operated by each carrier each month, spanning the entire year from January to December 2013. This is an animated bar chart, evolving over time, rather than a truly “racing” bar chart.
Code
gganim <- df |># Create two new columns, 'month' and 'month_anim'mutate(month =month(date, label =TRUE, abbr =FALSE),month_anim =month(date)) |># Group the data by 'month' and 'month_anim', and count the number # of flights for each 'carrier'group_by(month, month_anim) |>count(carrier, wt = n) |># Calculate the rank of each 'carrier' based on the flight countmutate(rank_car =rank(n)) |># Remove grouping informationungroup() |># Create a ggplot object with specific aesthetics for rectanglesggplot(aes(xmin =0,xmax = n,y = rank_car,ymin = rank_car -0.45,ymax = rank_car +0.45,fill = carrier,label =round(n, 0))) +# Add filled rectangles with transparencygeom_rect(alpha =0.5) +# Add text labels for flight countsgeom_text(aes(x = n, label =as.character(n)), hjust ="left") +# Add text labels for carriersgeom_text(aes(x =0, label = carrier), hjust ="left") +# Adjust the x-axis scale limitsscale_x_continuous(limits =c(0, 5500)) +# Customize labels and titleslabs(x =NULL, y =NULL, title ="Number of flights each month") +# Add a label indicating the monthgeom_label(aes(label = month), x =4500, y =1, fill ="white", col ="black",size =10, label.padding =unit(0.5, "lines")) +# Apply a classic themetheme_classic() +# Customize plot appearancetheme(legend.position ="none",axis.line.y =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank(),axis.line.x =element_blank(),title =element_text(size =20, hjust =0.5)) +# Create multiple subplots for each monthfacet_wrap(~month_anim) +# Remove facet labels: this allows us to superlay the facets on top of each other# and then animate the facetsfacet_null() +# Create a time-based animation based on 'month_anim'transition_time(month_anim)
Example 2
The visualization below offers a unique perspective, showcasing the cumulative total of flights operated by each carrier from January to December 2013, steadily building the story month by month. A truly “racing” bar chart.
Code
df1 <- df |># Group the data by 'carrier'group_by(carrier) |># Calculate the cumulative sum of 'n' within each carrier group# This allows us to ahve a cumulative number of flights over time in a # truly "racing" bar cahrt over timemutate(cum_n =cumsum(n)) |># Remove grouping informationungroup() |># Group the data by 'date'group_by(date) |># Calculate the rank of 'cum_n' within each date groupmutate(day_rank =rank(cum_n, ties.method ="first"))gganim <- df1 |># Create a ggplot object with specific aesthetics for reactanglesggplot(aes(xmin =0,xmax = cum_n,y = day_rank,ymin = day_rank -0.45,ymax = day_rank +0.45,fill = carrier,label = cum_n)) +# Add filled rectangles with transparencygeom_rect(alpha =0.5) +# Add text labels for cumulative flight countsgeom_text(aes(x = cum_n, label =as.character(cum_n)), hjust ="left") +# Add text labels for carriersgeom_text(aes(x =0, label = carrier), hjust ="left") +# Customize labels and titles. Adding {closest_state} adds the transition# variable value to the plot titlelabs(x =NULL, y =NULL, title ="Number of total flights operated up to {closest_state}") +# Apply a classic themetheme_classic() +# Customize plot appearancetheme(legend.position ="none",axis.line.y =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank(),axis.line.x =element_blank(),title =element_text(size =20, hjust =0.5)) +# Create multiple subplots for each datefacet_wrap(~ date) +# Remove facet labelsfacet_null() +# Transition the plot by 'date'transition_states(date) +# Follow the view with a fixed x-axisview_follow(fixed_x =FALSE)# Animate the ggplot object with specified settingsanimate(gganim,duration =40,fps =6,width =800,height =500,start_pause =10, end_pause =20)
Example 3
In the final visualiation below, we delve into the average flights’ arrival delay (in minutes) for each carrier, every month, over the course of the year. What makes this data dance even more exciting is how it ranks carriers from the highest delay to the lowest delay, and as we traverse the months, watch as these rankings twirl and pirouette.
Code
gganim2 <- flights |># Filter the flights dataset to include only the top 9 carriersfilter(carrier %in% carriers_to_plot) |># Create new columns: 'date' by combining year, month, and day, and # 'month' to represent the month as a labelmutate(date =make_date(year = year, month = month, day = day),month =month(date, label =TRUE, abbr =FALSE)) |># Select specific columns for the subsequent analysisselect(date, month, carrier, arr_delay) |># Group the data by 'month' and 'carrier', and calculate the average arrival delaygroup_by(month, carrier) |>summarize(avg_delay =mean(arr_delay, na.rm =TRUE) ) |># Join the full names of airlines for the annotations in the animated plotleft_join(nycflights13::airlines, by =join_by(carrier)) |># Remove the 'carrier' column after joining and rename 'name' to 'carrier'select(-carrier) |>rename(carrier = name) |># Calculate the rank of average delay, considering tiesmutate(delay_rank =rank(avg_delay, ties.method ="first")) |># Create a ggplot object with specific aesthetics for the rectanglesggplot(aes(xmin =0,xmax = avg_delay,y = delay_rank,ymin = delay_rank -0.45,ymax = delay_rank +0.45,fill = carrier ) ) +# Add filled rectangles with transparencygeom_rect(alpha =0.5) +# Add text labels for average delay valuesgeom_text(aes(x = avg_delay, label =as.character(round(avg_delay, 1))), hjust ="left") +# Add text labels for carriersgeom_text(aes(x =0, label = carrier), hjust ="left") +# Add a label indicating the monthgeom_label(aes(label = month),x =4500, y =1,fill ="white", col ="black",size =10,label.padding =unit(0.5, "lines")) +# Customize labels and titleslabs(x =NULL, y =NULL,title ="Average flight arrival delay (in minutes) during {closest_state}") +# Apply a classic themetheme_classic() +# Customize plot appearancetheme(legend.position ="none",axis.line.y =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank(),axis.line.x =element_blank(),title =element_text(size =20, hjust =0.5)) +# Create multiple subplots for each monthfacet_wrap(~ month) +# Remove facet labelsfacet_null() +# Transition the plot by 'month'transition_states(month)# Animate the ggplot object with specified settingsanimate(gganim2,duration =40,fps =10,width =800,height =500,start_pause =10, end_pause =20)
Notice that in some bad weather months (like June, July and December), almost every airline has considerable delays. On the contrary, if you like being on time, the best months to fly seem to be September to November.