The Simpsons Sweet Spot: Where Ratings and Viewership Meet
This visualization was created using {ggplot2} for plotting, {biscale} for the bivariate color scale, and {ggimage} to display each episode’s image at its corresponding IMDb rating and viewership position. The bivariate color-coded circumferences classify episodes into tertiles along both axes, blending blue, red, and violet to highlight different rating-viewership combinations. Data wrangling and processing were done in R, making use of the {tidyverse} ecosystem.
#TidyTuesday
{ggblend}
{magick}
Author
Aditya Dahiya
Published
February 4, 2025
About the Data
The Simpsons Dataset provides a comprehensive look into the iconic animated series, covering over 600 episodes. Curated by Prashant Banerjee on Kaggle, this dataset includes four key files: characters, locations, episodes, and script lines. Originally, the dataset was scraped by Todd W. Schneider for his analysis, with the scraper made available on GitHub. The dataset has since been rehosted and refined for broader accessibility.
The characters dataset (simpsons_characters.csv) provides details on individual characters, including gender and name normalization. The episodes dataset (simpsons_episodes.csv) captures IMDb ratings, viewership in millions, and air dates, enabling insights into episode popularity. The locations dataset (simpsons_locations.csv) records frequently featured places, while the script lines dataset (simpsons_script_lines.csv) documents spoken words, character dialogues, and word counts.
Figure 1: This graphic visualizes the relationship between IMDb ratings (X-axis) and U.S. viewership in millions (Y-axis) for 151 episodes of The Simpsons. Each episode is represented by its official image, placed according to its rating and viewership, with a bivariate color-coded circumference created using the {biscale} package. The colors reflect a 3×3 tertile classification, blending blue, red, and violet to indicate different combinations of high, medium, and low ratings and viewership. Data was processed and visualized in R using {ggplot2}, {biscale}, and {ggimage} for image integration.
How I made this graphic?
Key Learnings
Getting a specialized Simpsons palette from {ggsci} (Xiao 2024)
Making bivariate colour scales using {biscale} - inspiration article.
Loading required libraries, data import & creating custom functions.
Code
# Data Import and Wrangling Toolslibrary(tidyverse) # All things tidy# Final plot toolslibrary(scales) # Nice Scales for ggplot2library(fontawesome) # Icons display in ggplot2library(ggtext) # Markdown text support for ggplot2library(showtext) # Display fonts in ggplot2library(colorspace) # Lighten and Darken colourslibrary(magick) # Image manipulationlibrary(httr) # Downloading Google imageslibrary(biscale) # Bivariate Scales in Rlibrary(patchwork) # Composing gpglot2 plotssimpsons_episodes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-04/simpsons_episodes.csv')
Visualization Parameters
Code
# Font for titlesfont_add_google("Rock Salt",family ="title_font") # Font for the captionfont_add_google("Saira Extra Condensed",family ="caption_font") # Font for plot textfont_add_google("Titillium Web",family ="body_font") showtext_auto()# Get a specialized Simpsons Palette from {ggsci}library("scales")mypal <- ggsci::pal_simpsons("springfield")(16)seecolor::print_color(mypal)# Using scales::show_col() to display a palettemypal |> scales::show_col(ncol =4)# A base Colourbg_col <- mypal[1]seecolor::print_color(bg_col)# Colour for highlighted texttext_hil <- mypal[13]seecolor::print_color(text_hil)# Colour for the texttext_col <- mypal[7]seecolor::print_color(text_col)blend_cols <-c(mypal[11], mypal[15])seecolor::print_color(blend_cols)# Define Base Text Sizebts <-90# Caption stuff for the plotsysfonts::font_add(family ="Font Awesome 6 Brands",regular = here::here("docs", "Font Awesome 6 Brands-Regular-400.otf"))github <-""github_username <-"aditya-dahiya"xtwitter <-""xtwitter_username <-"@adityadahiyaias"social_caption_1 <- glue::glue("<span style='font-family:\"Font Awesome 6 Brands\";'>{github};</span> <span style='color: {text_hil}'>{github_username} </span>")social_caption_2 <- glue::glue("<span style='font-family:\"Font Awesome 6 Brands\";'>{xtwitter};</span> <span style='color: {text_hil}'>{xtwitter_username}</span>")plot_caption <-paste0("**Data:** Todd W. Schneider & Prashant Banerjee", " | **Code:** ", social_caption_1, " | **Graphics:** ", social_caption_2 )rm(github, github_username, xtwitter, xtwitter_username, social_caption_1, social_caption_2)# Add text to plot-------------------------------------------------plot_title <-"The Simpsons:\nRatings vs. Viewership"plot_subtitle <-"While some Simpsons episodes boast sky-high IMDb ratings, they aren’t necessarily the most-watched. Overall, there appears to be little to no correlation between viewership numbers and episode ratings."
Exploratory Data Analysis and Wrangling
Code
# Prepare the data for the bivariate scaledf1 <- simpsons_episodes |>arrange(number_in_season, number_in_series) |>mutate(id =row_number()) |>rename(viewers = us_viewers_in_millions) |>select(id, imdb_rating, viewers, number_in_season, image_url, title, season) |> biscale::bi_class(x = imdb_rating, y = viewers,dim =3,style ="quantile" )# Creating a bivariate palette: view the palettebiscale::bi_pal("DkViolet", dim =3)# Draw a legend as a {ggplot2} objectbiscale::bi_legend(pal ="DkViolet",xlab ="Higher % White ",ylab ="Higher Income ",size =12,arrows = T )# Draw a legend as a {ggplot2} objectg_legend <- biscale::bi_legend(pal ="DkViolet",xlab ="Higher Ratings on IMDb",ylab ="Higher viewership in USA",size = bts *0.7,pad_color = bg_col,pad_width =0.5,arrows = T ) +theme(axis.title =element_text(family ="caption_font",colour = text_col,margin =margin(0,0,0,0, "mm") ),panel.background =element_rect(fill ="transparent",colour ="transparent" ),plot.background =element_rect(fill ="transparent",colour ="transparent" ) )
Downloading images for the selected episodes
Code
# Few outlier episodes to display the photoselected_ids <-c(60, 61, 57, 69, 65, 59, 55, 39, 36, 134, 146, 5, 102, 127, 115, 1, 41)df2 <- df1 |>filter(id %in% selected_ids) |>mutate(image_path =paste0("data_vizs/temp_simpsons_", id, ".png") )# Get a custom google search engine and API key# Tutorial: https://developers.google.com/custom-search/v1/overview# Tutorial 2: https://programmablesearchengine.google.com/# From:https://developers.google.com/custom-search/v1/overview# google_api_key <- "LOAD YOUR GOOGLE API KEY HERE"# From: https://programmablesearchengine.google.com/controlpanel/all# my_cx <- "GET YOUR CUSTOM SEARCH ENGINE ID HERE"# Improved function to download and save food imagesdownload_simpsons <-function(i) { api_key <- google_api_key cx <- my_cx# Build the API request URL with additional filters url <-paste0("https://www.googleapis.com/customsearch/v1?q=",URLencode(paste0("Simpsons episode ", df2$title[i], " image")),"&cx=", cx,"&searchType=image","&key=", api_key,"&num=1"# Fetch only one result )# Make the request response <-GET(url)if (response$status_code !=200) {warning("Failed to fetch data for: ", dfww2$country[i])return(NULL) }# Parse the response result <-content(response, "parsed")# Extract the image URLif (!is.null(result$items)) { image_url <- result$items[[1]]$link } else {warning("No results found for: ", dfww2$country[i])return(NULL) }# Process the image im <- magick::image_read(image_url) |># Crop the image into a circle # (Credits: https://github.com/doehm/cropcircles) cropcircles::circle_crop(border_colour ="black",border_size =0.1 ) |>image_read() |>image_background(color = bg_col) |>image_resize("x300") |># Save or display the resultimage_write( here::here("data_vizs", paste0("temp_simpsons_", df2$id[i], ".png") ) )}# Iterate through each state and download imagesfor (i in1:nrow(df2)) {download_simpsons(i)}
# Saving a thumbnaillibrary(magick)# Saving a thumbnail for the webpageimage_read(here::here("data_vizs", "tidy_simpsons_feb25.png")) |>image_resize(geometry ="x400") |>image_write( here::here("data_vizs", "thumbnails", "tidy_simpsons_feb25.png" ) )# Define the folder pathfolder_path <-file.path(getwd(), "data_vizs")# List all files in the folder that start with "temp_simpsons_"files_to_delete <-list.files(folder_path, pattern ="^temp_simpsons_", full.names =TRUE)# Delete the filesfile.remove(files_to_delete)# Print confirmationcat(length(files_to_delete), "files deleted.\n")
Session Info
Code
# Data Import and Wrangling Toolslibrary(tidyverse) # All things tidy# Final plot toolslibrary(scales) # Nice Scales for ggplot2library(fontawesome) # Icons display in ggplot2library(ggtext) # Markdown text support for ggplot2library(showtext) # Display fonts in ggplot2library(colorspace) # Lighten and Darken colourslibrary(magick) # Image manipulationlibrary(httr) # Downloading Google imageslibrary(biscale) # Bivariate Scales in Rlibrary(patchwork) # Composing gpglot2 plotssessioninfo::session_info()$packages |>as_tibble() |>select(package, version = loadedversion, date, source) |>arrange(package) |> janitor::clean_names(case ="title" ) |> gt::gt() |> gt::opt_interactive(use_search =TRUE ) |> gtExtras::gt_theme_espn()
Table 1: R Packages and their versions used in the creation of this page and graphics