Chapter 11

Colour Scales and Legends

Author

Aditya Dahiya

Published

January 1, 2024

This Chapter has no exercises to solve, so I use various datasets to explore and customize colour scales and legends in ggplot2.

For this Chapter, let’s use the data on Engagement (Hours Watched) report from Netflix which lists, in descending order, all 18,000+ titles viewed for 50,000+ hours on the platform in the first half of 2023. Credits to the Data is Plural newsletter by Jeremy Singer-Vine for circulating this data-set. I recommend subscribing to this newsletter for some amazing data-sets.

Code
library(tidyverse)
library(readxl)
library(ggwordcloud)
library(lubridate)

set.seed(42)
netflix <- read_csv(here::here("data", "netflix.csv")) |> 
  slice_head(n = 100) |> 
  janitor::clean_names() |> 
  select(1:3) |> 
  mutate(title = str_remove_all(title, "[^[\\da-zA-Z ]]")) |> 
  mutate(title = str_replace(title, "\\s{3,}.*", "")) |> 
  mutate(month = floor_date(release_date, unit = "month"),
         year = year(release_date))

11.2 Continuous colour scales

For depicting use of continuous color scales, we use the dataset by Harsha Devulapalli on Movie Theaters in 8 Indian metropolitan Cities. Credits to the Data is Plural newsletter by Jeremy Singer-Vine for circulating this data-set.

The Figure 1 shows use of two fill scales, one for geom_col() and other for geom_point() , both within the same plot. This is possible with use of ggnewscale (Campitelli 2023) that allows us to specify a new fill or colour scale within the same plot.

Code
# Loading dataset

movies <- read_csv("https://raw.githubusercontent.com/HarshaDevulapalli/indian-movie-theatres/master/indian-movie-theatres.csv")

# Using two fill scales in 1 ggplot object
library(ggnewscale)

plotdf <- movies |> 
  group_by(theatre_chain) |> 
  summarise(
    n = n(),
    avg_price = mean(average_ticket_price, na.rm = TRUE)
  ) |> 
  drop_na() |> 
  filter(n >= 5) |> 
  mutate(
    theatre_chain = factor(theatre_chain),
    theatre_chain = fct_reorder(theatre_chain, avg_price),
    avg_price = round(avg_price, 0)
    )

# Creating a gradient in the geom_col fill
plotdf_fill <- plotdf  |> 
  
  # Group input by rows: to compute on a data frame a row-at-a-time.
  rowwise()  |> 
  
  # Create Vars
  summarise(theatre_chain = theatre_chain,
            avg_price = avg_price,
            fill_col = list(1:avg_price),
            height = 1) %>%
  # Long form
  unnest(cols = fill_col)
  
plotdf |> 
  ggplot(
    aes(x = avg_price,
        y = theatre_chain)) +
  geom_col(
    data = plotdf_fill,
    aes(fill = fill_col,
        x = height),
    width = 0.3,
    position = "stack"
  ) +
  
  # Annotations
  annotate(
    geom = "curve",
    x = plotdf |> filter(theatre_chain == "PVR") |> pull(avg_price),
    y = 9,
    xend = 300,
    yend = 4,
    linetype = 5,
    arrow = arrow(length = unit(2, "mm")),
    col = "darkgrey",
    curvature = -0.5
  ) +
  annotate(
    geom = "curve",
    x = plotdf |> filter(theatre_chain == "Inox") |> pull(avg_price),
    y = 8,
    xend = 300,
    yend = 4,
    linetype = 5,
    arrow = arrow(length = unit(2, "mm")),
    col = "darkgrey",
    curvature = -0.5
  ) +
  annotate(
    geom = "label",
    x = 300,
    y = 4,
    label = "Size of the dot and the number represent\nnumber of Movie Theatres owned by the Chain",
    hjust = "inward", 
    vjust = 1,
    size = 3, 
    fontface = "italic",
    col = "black"
  ) +
  
  # Colour Scales
  paletteer::scale_fill_paletteer_c("ggthemes::Orange-Gold") +
  
  # Using a Second Colour Scale in same ggplot2
  new_scale_fill() +
  # Lollipop Graph, circles
  geom_point(aes(x = avg_price, 
                 size = n,
                 fill = n),
             pch = 21,
             col = "white") +
  geom_text(
    aes(label = n),
    hjust = 0.5,
    vjust = 0.5
  ) +
  paletteer::scale_fill_paletteer_c("grDevices::PinkYl",
                                    direction = -1) +
  scale_size_continuous(range = c(8, 15)) +
  labs(
    x = "Average Ticket Price (in Rs.)",
    y = NULL,
    title = "Ticket prices in Indian Movie Theatre Chains",
    subtitle = "PVR is the most expensive, and also has the most movie theatres.\nThe numbers in circles represent the number of theatres run by the Chain."
  ) +
  scale_x_continuous(expand = expansion(c(0, 0.05))) +
  scale_y_discrete(expand = expansion(c(0.05, 0.1))) +
  theme_minimal() +
  theme(
    legend.position = "none",
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_line(linetype = 2),
    axis.line.x = element_line(arrow = arrow(length = unit(3, "mm"))),
    plot.title = element_text(face = "bold",
                              size = 18,
                              hjust = 0.5),
    plot.subtitle = element_text(face = "italic",
                                 hjust = 0.5),
    plot.title.position = "plot"
  ) 

Figure 1: Using multiple continuous colour scales in a lollipop chart

11.3 Discrete colour scales

Now, I use the data-set with ggwordcloud (Le Pennec and Slowikowski 2023) to make word-clouds of top Netflix Titles. The Figure 2 shows the top 10 titles on Netflix by number of hours viewed between January - June 2023, with default colour schemes in ggplot2. The word-cloud shows the use of a continuous colour scale for text colour based on the number of hours viewed.

Code
g1 <- netflix |> 
  slice_max(order_by = hours_viewed, n = 10) |> 
  ggplot(aes(
    x = hours_viewed, 
    y = reorder(title, hours_viewed),
    fill = title,
    label = title)) +
  geom_col() +
  geom_text(aes(x = 0),
            hjust = "inward",
            vjust = 0.5) +
  labs(x = "Hours viewed on Netflix", y = NULL,
       title = "Top 10 titles on Netflix",
       subtitle = "By number of hours viewed: January - June 2023") +
  scale_x_continuous(expand = expansion(0),
                     labels = scales::label_number_si()) +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.y = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank())

g2 <- netflix |> 
  mutate(title = str_replace(title, " Season\\s.*", "")) |> 
  slice_max(order_by = hours_viewed, n = 40) |> 
  ggplot(aes(label = title, 
             size = hours_viewed,
             color = hours_viewed)) +
  geom_text_wordcloud(shape = "circle") +
  scale_size_area(max_size = unit(10, "mm")) +
  labs(title = "Word cloud of top 30 titles on Netflix (Jan-Jun 2023)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

g1
g2

(a) A horizontal Bar-Chart

(b) A wordcloud
Figure 2: Top 10 titles on Netflix by number of hours viewed: January - June 2023

The Figure 3 shows the same plots with paletteer (Hvitfeldt 2021a) scales.

Code
library(paletteer)

g1 + scale_fill_paletteer_d("ggthemes::Winter")

g2 + scale_color_paletteer_c("oompaBase::redscale")

(a) A horizontal Bar-Chart

(b) A wordcloud
Figure 3: Top 10 titles on Netflix by number of hours viewed: January - June 2023

11.7 Legend Position

To demonstrate the use of theme(legend.position = "..."), I use the data on New Delhi’s movie theaters to produce a customized colored map in shades of purple, as shown below in Figure 4

Code
#==============================================================================#
# Library Load-in---------------------------------------------------------------
#==============================================================================#
library(tidyverse)      # Data wrangling and plotting
library(osmdata)        # Wrapper for Overpass API from Open Street Maps
library(janitor)        # Cleaning names
library(sf)             # For plotting maps
library(here)           # Files location and loading
library(paletteer)      # Lots of Color Palettes in R
library(colorspace)     # Lightening and Darkening Colors
library(showtext)       # Using Fonts More Easily in R Graphs
library(fontawesome)    # Social Media icons
library(ggtext)         # Markdown Text in ggplot2
library(patchwork)      # For compiling plots
library(magick)         # Work with Images and Logos
library(ggimage)        # Background Image


#==============================================================================#
# Data Load-in------------------------------------------------------------------
#==============================================================================#

del_mov <- read_csv("https://raw.githubusercontent.com/HarshaDevulapalli/indian-movie-theatres/master/indian-movie-theatres.csv") |> 
  filter(city == "Delhi")

main_roads <- st_read(here::here("data", 
                                 "delhi_osm",
                                 "delhi_main_roads.shp"))
st_crs(main_roads) <- "WGS 84"

minor_roads <- st_read(here::here("data", 
                                  "delhi_osm",
                                  "delhi_minor_roads.shp")) 
st_crs(minor_roads) <- "WGS 84"

very_minor_roads <- st_read(here::here("data", 
                                       "delhi_osm",
                                       "delhi_veryminor_roads.shp")) 
st_crs(very_minor_roads) <- "WGS 84"

#==============================================================================#
# Data Wrangling----------------------------------------------------------------
#==============================================================================#

coords <- del_mov |> 
  summarise(
    top = max(lat),
    bottom = min(lat),
    left = min(lon),
    right = max(lon)
  ) |> 
  as_vector()

# Adjust to remove the one leftmost (westward) cniema hall - outlier
percentage_removal_factor = 0.2
coords[3] <- coords[3] + 
  ((coords[4] - coords[3]) * percentage_removal_factor)

# Impute average value to NAs

impute_na <- median(del_mov$average_ticket_price, na.rm = TRUE)
del_mov <- del_mov |> 
  mutate(average_ticket_price = 
           if_else(is.na(average_ticket_price),
                   impute_na,
                   average_ticket_price))


#==============================================================================#
# Options & Visualization Parameters--------------------------------------------
#==============================================================================#

# Load fonts
font_add_google("Limelight", 
                family = "title_font")       # Font for titles
font_add_google("Saira Extra Condensed", 
                family = "caption_font")     # Font for the caption
font_add_google("Bree Serif", 
                family = "body_font")        # Font for plot text
showtext_auto()

# Colour Palette
mypal_c <- paletteer::scale_colour_paletteer_c("ggthemes::Purple")
mypal <- paletteer::paletteer_d("rcartocolor::Purp")

# Define colours
low_col <- mypal[4]                   # Low colour
hi_col <- mypal[6]                    # High colour
bg_col <- mypal[3] |> lighten(0.9)    # Background Colour
text_col <- mypal[1] |> darken(0.6)   # Colour for the text
text_hil <- mypal[6] |> darken(0.4)   # Colour for the title

# Caption stuff
sysfonts::font_add(family = "Font Awesome 6 Brands",
                   regular = here::here("docs", "Font Awesome 6 Brands-Regular-400.otf"))
github <- "&#xf09b"
github_username <- "aditya-dahiya"
xtwitter <- "&#xe61b"
xtwitter_username <- "@adityadahiyaias"
linkedin <- "&#xf08c"
linkedin_username <- "dr-aditya-dahiya-ias"
social_caption <- glue::glue("<span style='font-family:\"Font Awesome 6 Brands\";'>{github};</span> <span style='color: {text_col}'>{github_username}  </span> <span style='font-family:\"Font Awesome 6 Brands\";'>{xtwitter};</span> <span style='color: {text_col}'>{xtwitter_username}</span> <span style='font-family:\"Font Awesome 6 Brands\";'>{linkedin};</span> <span style='color: {text_col}'>{linkedin_username}</span>")

# Add text to plot--------------------------------------------------------------
plot_title <- "Movie Theatres in Delhi"

subtitle_text <- "Most theatres are in North or South Delhi, with a central vacant band around Lutyens Delhi !"
plot_subtitle <- paste(strwrap(subtitle_text, 100), collapse = "\n")

plot_caption <- paste0("**Data:** Harsha Devulapalli  |  ", "**Graphics:** ", social_caption)

#==============================================================================#
# Data Visualization------------------------------------------------------------
#==============================================================================#




  
g <- ggplot() +
  geom_sf(
    data = 
      main_roads |> 
      mutate(geometry = st_simplify(
        geometry, 
        dTolerance = 50,
        preserveTopology = TRUE)),
    mapping = aes(geometry = geometry),
    color = low_col,
    linewidth = 1,
    alpha = 0.4) +
  geom_sf(
    data = 
      minor_roads |> 
      mutate(geometry = st_simplify(
        geometry, 
        dTolerance = 1,
        preserveTopology = TRUE)),
    color = low_col,
    linewidth = 0.7,
    alpha = 0.3) +
  geom_sf(
    data = 
      very_minor_roads |> 
      mutate(geometry = st_simplify(
        geometry, 
        dTolerance = 10,
        preserveTopology = TRUE)),
    color = low_col,
    linewidth = 0.3,
    alpha = 0.2) +
  geom_point(
    data = del_mov,
    mapping = aes(
      x = lon,
      y = lat,
      size = total_seats,
      fill = average_ticket_price
    ),
    pch = 21,
    color = text_hil,
    alpha = 0.6
  ) +
  ggrepel::geom_text_repel(
    data = del_mov,
    mapping = aes(
      x = lon,
      y = lat,
      label = theatre_name
    ),
    alpha = 0.95,
    family = "body_font",
    colour = text_col,
    seed = 42,
    size = 10,
    segment.color = text_col
  ) +
  coord_sf(
    xlim = coords[c("left", "right")],
    ylim = coords[c("bottom", "top")],
    expand = FALSE) +
  scale_fill_paletteer_c("ggthemes::Purple") +
  scale_size_continuous(range = c(1, 15)) +
  labs(title = plot_title,
       subtitle = plot_subtitle,
       caption = plot_caption,
       fill = "Average Ticket Price (in Rs.)",
       size = "Total Number of Seats") +
  theme_void() + 
  guides(fill = guide_colorbar(title.position = "top",
                               barheight = unit(0.5, "cm"),
                               barwidth = unit(8, "cm")),
         size = guide_legend(title.position = "top",
                             keywidth = unit(0.5, "cm"),
                             keyheight = unit(0.5, "cm"),
                             label.hjust = 0)) +
  theme(
    plot.caption =  element_textbox(family = "caption_font",
                                    hjust = 0.5,
                                    colour = text_col,
                                    size = unit(40, "cm")),
    plot.title   =     element_text(hjust = 0.5,
                                    size = unit(175, "cm"),
                                    margin = margin(0.3,0,0.2,0, 
                                                    unit = "cm"),
                                    family = "title_font",
                                    face = "bold",
                                    colour = text_col),
    plot.subtitle    = element_text(hjust = 0.5,
                                    size = unit(50, "cm"),
                                    family = "body_font",
                                    colour = text_col,
                                    margin = margin(0,0,0.2,0, 
                                                    unit = "cm")),
    plot.background =  element_rect(fill = bg_col,
                                    color = bg_col,
                                    linewidth = 0),
    legend.position = "bottom",
    legend.text = element_text(hjust = 0.5,
                               size = unit(40, "cm"),
                               family = "body_font",
                               colour = text_col),
    legend.title = element_text(hjust = 0.5,
                                size = 50,
                                family = "body_font",
                                colour = text_col,
                                margin = margin(0,0,0,0)),
    legend.box.margin = margin(0,0,0.5,0, unit = "cm"),
    legend.box = "horizontal",
    legend.spacing.y = unit(0.2, "cm")
  )




#=============================================================================#
# Image Saving-----------------------------------------------------------------
#=============================================================================#


ggsave(
  filename = here::here("docs", "delhimovies_tidy.png"),
  plot = g,
  device = "png", 
  dpi = "retina", 
  width = unit(10, "cm"), 
  height = unit(10, "cm"),
  bg = bg_col
)



#=============================================================================#
# Data Collection Work---------------------------------------------------------
#=============================================================================#


###########################################
# DO NOT RUN CODE: To download initial Delhi data
###########################################

# Saving the coordinates bounding box for Delhi Map
coords <- del_mov |> 
  summarize(
    top = max(lat),
    bottom = min(lat),
    left = min(lon),
    right = max(lon)
  ) |> 
  as_vector()

coords

# Code used for Delhi area: Downloading the Delhi map (1.4 GB !!) 
cty <- opq(bbox = coords)


cty_roads <- cty |>
  add_osm_feature(key = "highway") |>
  osmdata_sf()

main_roads <- cty_roads$osm_lines |>
  filter(highway %in% c("primary", "trunk")) |> 
  clean_names()

minor_roads <- cty_roads$osm_lines |>
  filter(highway %in% c("tertiary", "secondary"))

very_minor_roads <- cty_roads$osm_lines |>
  filter(highway %in% c("residential"))


st_write(
  obj = main_roads |> select(geometry),
  dsn = here::here("data", "delhi_main_roads.shp"),
  append = FALSE
)

st_write(
  obj = minor_roads |> select(geometry),
  dsn = here::here("data", "delhi_minor_roads.shp"),
  append = FALSE
)

st_write(
  obj = very_minor_roads |> select(geometry),
  dsn = here::here("data", "delhi_veryminor_roads.shp"),
  append = FALSE
)

# rm(main_roads)
# rm(minor_road)

Figure 4: Map of New Delhi with locations of Movie Theatres, along with number of seats in each (size of circles) and average ticket price (colour of the circle).

Some other packages for colour palettes in R

  • paletteer:
    • The {paletteer} R package (Hvitfeldt 2021b), developed by Emil Hvitfeldt, serves as a comprehensive repository of diverse color palettes sourced from various R packages. With a unified interface, {paletteer} aims to streamline the usage of these palettes, analogous to the “caret of palettes.” Featuring a collection of 2538 palettes obtained from CRAN packages, the palettes are categorized into discrete and continuous scales. The paletteer gallery facilitates easy exploration and implementation of these palettes in data visualization through ggplot2, providing users with copy/pastable R code for seamless integration.
  • RColorBrewer:
    • This package sources color palettes from ColorBrewer, delivering a diverse range of qualitative, sequential, and diverging color schemes for ggplot.
  • viridis:
    • Designed for both colorblind individuals and black-and-white printing, viridis provides perceptually uniform color maps that enhance data visualization.
  • viridisLite:
    • A streamlined version of viridis, viridisLite offers the same high-quality color maps with reduced dependencies for efficient use in ggplot.
  • wesanderson:
    • Inspired by Wes Anderson films, this package provides a unique and aesthetically pleasing set of color palettes, adding a distinctive touch to ggplot visuals.
  • ggsci:
    • Drawing inspiration from scientific journals like Nature and Science, ggsci offers color palettes that lend a professional and research-oriented look to ggplot visualizations.
  • nord:
    • Inspired by the Nord color scheme, this package delivers modern and elegant color palettes for ggplot, adding a contemporary feel to data visualizations.
  • iWantHue:
    • Enabling users to generate and explore color palettes based on criteria such as color count and harmony, iWantHue provides flexibility and customization for ggplot visuals.
  • colorspace:
    • Based on the HCL (Hue-Chroma-Luminance) color space, colorspace offers perceptually uniform and visually appealing color palettes for ggplot, enhancing the aesthetic quality of visualizations.
  • dichromat:
    • Specifically catering to individuals with color vision deficiencies, dichromat provides color palettes that prioritize accessibility for improved data visualization experiences in ggplot.
  • ggthemes:
    • Inspired by popular data visualization libraries and software like Excel, Tableau, and Stata, ggthemes offers a variety of color palettes and themes to diversify ggplot visuals.

References

Campitelli, Elio. 2023. “Ggnewscale: Multiple Fill and Colour Scales in ’Ggplot2’.” https://CRAN.R-project.org/package=ggnewscale.
Hvitfeldt, Emil. 2021a. “Paletteer: Comprehensive Collection of Color Palettes.” https://github.com/EmilHvitfeldt/paletteer.
———. 2021b. “Paletteer: Comprehensive Collection of Color Palettes.” https://github.com/EmilHvitfeldt/paletteer.
Le Pennec, Erwan, and Kamil Slowikowski. 2023. “Ggwordcloud: A Word Cloud Geom for ’Ggplot2’.” https://CRAN.R-project.org/package=ggwordcloud.