Dancing Through the Years: A Data-Driven Look at Taylor Swift’s Music
Looking at the W. Jake Thompson’s curated data-set of Taylor Swift songs as a part of #TidyTuesday (Oct 10, 2023)
#TidyTuesday
Author
Aditya Dahiya
Published
October 18, 2023
Step 1: Data Import
Getting the data from TidyTuesday Retrieve the data originally from the taylorR package from W. Jake Thompson is a curated data set of Taylor Swift songs, including lyrics and audio characteristics. The data comes from Genius and the Spotify API.
Code
library(tidyverse) # Data Wrangling and Visualizationlibrary(visdat) # View data in Exploratory Data Analysislibrary(gganimate) # For animationlibrary(transformr) # to smoothly animate polygons and paths# Using Option 2: Read data directly from GitHubtaylor_album_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-17/taylor_album_songs.csv')taylor_all_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-17/taylor_all_songs.csv')taylor_albums <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-17/taylor_albums.csv')
Step 2: Some Exploratory Data Analysis
Code
# Since all songs of Taylor Swift occur in taylor_all_songs, let us# focus on that data set only for nowtaylor_album_songs |>anti_join(taylor_all_songs)# Seeing the number of distinct values for each variabletaylor_all_songs |>summarise(across(.cols =everything(),.fns = n_distinct)) |>pivot_longer(cols =everything(),names_to ="Variable",values_to ="n_distinct")
Using a popular function vis_dat() to see the structure of the data: –
Code
# Vis_dat the datataylor_all_songs |>vis_dat()
And, seeing the change in different song characteristics over time: –
Code
# We see the patterns over time for different variables of her # songs to see any distinct patternstaylor_all_songs |>select(album_name, track_name, track_release, danceability:duration_ms) |>pivot_longer(cols =-c(album_name, track_name, track_release),names_to ="indicator",values_to ="value") |>ggplot(aes(x = track_release,y = value)) +geom_point() +geom_smooth() +facet_wrap(~ indicator, scales ="free") +theme_classic()
Creating a static graph which we will animate later, and setting the span parameter for loess smoother: –
Code
#define span to usespan_taylor =0.75# Take the taylor_all_songs data frame and select specific columns:taylor_all_songs |>select(album_name, track_release, danceability, acousticness) |># Pivot the selected columns into a longer format with "indicators" and "values" columns:pivot_longer(cols =-c(album_name, track_release),names_to ="indicators",values_to ="values") |># Create a ggplot visualization, setting aesthetics and geometries:ggplot(aes(x = track_release,y = values,col = indicators,label = indicators)) +# Add jittered points to the plot with specified width, height, and alpha:geom_jitter(width =20, height =0.001, alpha =0.2) +# Add a smoothed line to the plot with specified span, se, and alpha:geom_smooth(span = span_taylor, se =FALSE,alpha =0.6,lwd =1.2) +# Add text labels to the plot, referencing data from taylor_albums:geom_text(data = taylor_albums, mapping =aes(x = album_release,y =0,label = album_name),col ="black",angle =90, hjust ="left") +# Apply a minimal theme to the plot:theme_minimal() +# Customize the x-axis labels using breaks and formatted labels:scale_x_continuous(breaks = taylor_albums$album_release,labels =format(taylor_albums$album_release, "%b %Y")) +# Using color palettes from the tayloRswift package for Taylor Swift's albums: tayloRswift::scale_color_taylor(palette ="lover") +# Add labels and customize the appearance of the plot:labs(x =NULL,y ="Spotify App Score for songs",color =NULL) +# Further customize the appearance of the plot using theme settings:theme(axis.text.x =element_text(angle =90),panel.grid.minor.x =element_blank(),panel.grid.minor.y =element_blank(), legend.position ="bottom")