What percentage of population lives near a metro station?

An analysis for various cities, over time, on population density near metro stations and analysis using rsater data and Open Street Maps data

Data Visualization

Maps

{sf}

{osmdata}

Open Street Maps

Gecomputation

{rnaturalearth}

Author

Aditya Dahiya

Published

February 16, 2025

Loading required packages

The tidyverse package is loaded for data manipulation and visualization, while sf and terra handle spatial vector and raster data, respectively. The tidyterra package acts as a bridge between terra and ggplot2, enabling raster visualization in a tidy framework. For enhanced plot aesthetics, the script utilizes several ggplot2 extensions. scales provides improved axis scaling, fontawesome adds icons, and ggtext enables markdown formatting in text elements. Custom fonts are incorporated using sysfonts and displayed via showtext, ensuring that Google Fonts (e.g., Roboto Condensed, Oswald, Saira Extra Condensed) are seamlessly integrated. The patchwork package facilitates the composition of multiple plots.

A minimalist theme is established using theme_minimal() with a base_size of 24 and a body_font for text. Colors are defined and printed with seecolor, ensuring consistency in highlighted text and background colors. Additionally, custom captions incorporating Font Awesome icons are created using glue, dynamically styling GitHub and Twitter/X handles within the plot caption.

Code

# Data wrangling & visualization
library(tidyverse)  # Data manipulation & visualization

# Spatial data handling
library(sf)         # Import, export, and manipulate vector data
library(terra)      # Import, export, and manipulate raster data

# ggplot2 extensions
library(tidyterra)  # Helper functions for using terra with ggplot2

# Final plot tools
library(scales)               # Nice Scales for ggplot2
library(fontawesome)          # Icons display in ggplot2
library(ggtext)               # Markdown text in ggplot2
library(showtext)             # Display fonts in ggplot2
library(patchwork)            # Composing Plots

bts = 24 # Base Text Size
sysfonts::font_add_google("Roboto Condensed", "body_font")
sysfonts::font_add_google("Oswald", "title_font")
sysfonts::font_add_google("Saira Extra Condensed", "caption_font")
showtext::showtext_auto()
# A base Colour
bg_col <- "white"
seecolor::print_color(bg_col)

# Colour for highlighted text
text_hil <- "grey30"
seecolor::print_color(text_hil)

# Colour for the text
text_col <- "grey20"
seecolor::print_color(text_col)

theme_set(
  theme_minimal(
    base_size = bts,
    base_family = "body_font"
  ) +
    theme(
      text = element_text(
        colour = "grey30",
        lineheight = 0.3,
        margin = margin(0,0,0,0, "pt")
      ),
      plot.title = element_text(
        hjust = 0.5,
        margin = margin(0,0,0,0, "pt")
      ),
      plot.subtitle = element_text(
        hjust = 0.5,
        margin = margin(0,0,0,0, "pt")
      )
    )
)

# Caption stuff for the plot
sysfonts::font_add(
  family = "Font Awesome 6 Brands",
  regular = here::here("docs", "Font Awesome 6 Brands-Regular-400.otf")
)
github <- "&#xf09b"
github_username <- "aditya-dahiya"
xtwitter <- "&#xe61b"
xtwitter_username <- "@adityadahiyaias"
social_caption_1 <- glue::glue("<span style='font-family:\"Font Awesome 6 Brands\";'>{github};</span> <span style='color: {text_hil}'>{github_username}  </span>")
social_caption_2 <- glue::glue("<span style='font-family:\"Font Awesome 6 Brands\";'>{xtwitter};</span> <span style='color: {text_hil}'>{xtwitter_username}</span>")
plot_caption <- paste0(
  "**Data**:  Open Street Maps and CIESIN (Columbia University)",
  "  |  **Code:** ", 
  social_caption_1, 
  " |  **Graphics:** ", 
  social_caption_2
  )
rm(github, github_username, xtwitter, 
   xtwitter_username, social_caption_1, 
   social_caption_2)

Get metro-lines, metro stations and buffer zones in Delhi (India)

This R script retrieves and processes metro lines and stations in New Delhi using data from OpenStreetMap (OSM) via the osmextract package and administrative boundaries from the geodata package. First, a district-level map of Delhi is obtained with geodata::gadm(), filtered for the “NCT of Delhi” region. Next, the script extracts different spatial layers from OSM:

Metro lines using osmextract::oe_get(place = "Delhi", layer = "lines")
Metro stations (point data) using osmextract::oe_get(layer = "points")
Metro station polygons using osmextract::oe_get(layer = "multipolygons")

The metro station points are filtered from points_delhi based on OSM tags ("railway"=>"stop","subway"=>"yes"), while metro station polygons are identified from polygons_delhi and their centroids are calculated with st_centroid().

Finally, buffer zones of 1 km and 2 km around metro stations are created using st_buffer(), helping analyze areas within walkable distances to metro stops. These buffers are spatially clipped to Delhi’s boundaries.

Code

# Get metro-lines in New Delhi, and the metro stations using data from Open Street Maps using {osmextract}
# Get Delhi Map from GADM / {geodata} with geodata::gadm()
# District Wise Map
delhi_map <- geodata::gadm(
  country = "India",
  level = 2,
  path = tempdir()
) |> 
  st_as_sf() |> 
  janitor::clean_names() |> 
  filter(name_1 == "NCT of Delhi") |> 
  select(name_1, geometry)


lines_delhi <- osmextract::oe_get(
  place = "Delhi",
  layer = "lines",
  download_directory = "C:/Users/dradi/OneDrive/Desktop/pop_raster_data_temp"
  #download_directory = "C:/Users/dradi/Desktop/pop_raster_data_temp"
)

points_delhi <- osmextract::oe_get(
  place = "Delhi",
  layer = "points",
  download_directory = "C:/Users/dradi/OneDrive/Desktop/pop_raster_data_temp"
  #download_directory = "C:/Users/dradi/Desktop/pop_raster_data_temp"
)

polygons_delhi <- osmextract::oe_get(
  place = "Delhi",
  layer = "multipolygons",
  download_directory = "C:/Users/dradi/OneDrive/Desktop/pop_raster_data_temp"
  #download_directory = "C:/Users/dradi/Desktop/pop_raster_data_temp"
)

# points_delhi |> 
#   names()

subway_stops1 <- points_delhi |> 
  select(osm_id, name, other_tags, geometry) |> 
  # st_drop_geometry() |>
  # as_tibble() |>
  filter(str_detect(other_tags, 'railway"=>"stop","subway"=>"yes"')) |> 
  select(-other_tags)


subway_stops2 <- polygons_delhi |> 
  # st_drop_geometry() |> 
  # as_tibble() |> 
  filter(str_detect(other_tags, "subway")) |> 
  select(osm_id, name, geometry) |> 
  mutate(geometry = st_centroid(geometry))

subway_stops <- bind_rows(
  subway_stops1,
  subway_stops2
) |> 
  st_intersection(delhi_map)

rm(subway_stops1, subway_stops2)

subway_lines <- lines_delhi |> 
  # st_drop_geometry() |> 
  # as_tibble() |> 
  filter(str_detect(railway, "subway")) |> 
  select(osm_id, name, railway, geometry) |> 
  st_intersection(delhi_map)

subway_stops_buffer_1km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 1000) |> 
  st_intersection(delhi_map)

subway_stops_buffer_2km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 2000) |> 
  st_intersection(delhi_map)

rm(lines_delhi, points_delhi, polygons_delhi)

An initial basic plot

This R code creates a spatial visualization of Delhi’s subway system using the ggplot2 package and its geom_sf() function to handle spatial features. The base map (delhi_map) is plotted first, followed by layers representing subway stops (subway_stops in blue), subway lines (subway_lines in dark grey), and buffer zones around subway stops at 1 km and 2 km (subway_stops_buffer_1km and subway_stops_buffer_2km, shaded in semi-transparent red). The alpha() function from scales is used to set transparency levels for better visualization.

Code

g <- ggplot() +
  geom_sf(data = delhi_map) +
  geom_sf(
    data = subway_stops,
    colour = "blue",
    size = 1,
    alpha = 0.5
  ) +
  geom_sf(
    data = subway_lines,
    colour = "grey20",
    linewidth = 0.2
  ) +
  geom_sf(
    data = subway_stops_buffer_2km,
    fill = alpha("red", 0.3), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops_buffer_1km,
    fill = alpha("red", 0.3), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  labs(
    title = "Delhi: Subway Stops and Lines",
    subtitle = "Areas within 1 km and 2 km of subway stops are shaded"
  ) +
  theme(
    panel.grid = element_line(
      linewidth = 0.2
    )
  )

ggsave(
  plot = g, 
  filename = here::here(
    "projects", "images",
    "cities_pop_subway_1.png"
  ),
  height = 1200,
  width = 1200,
  units = "px",
  bg = "white"
)

Figure 1: A basic map of Delhi, with metro stations and 1-km and 2-km buffer zones around metro stations.

Population Density Raster for New Delhi

Full article and code on how to get these population density rasters for any city in the world is here.

This R script processes and visualizes population density data for New Delhi using raster data from (Liu et al. 2024) focusing on areas near subway stops. The script employs the terra package to handle raster operations and the sf package for spatial vector data. The 2022 population density raster is then loaded with rast(), cropped, and masked to fit the delhi_map boundary. Negative values are replaced with a small constant (0.001) to avoid computational issues. The script extracts non-missing values from the raster, calculates summary statistics (mean, total_pop), and estimates population distribution within 1 km and 2 km buffers around subway stops using terra::extract(). A ggplot2-based map visualizes population density with geom_spatraster() (from ggspatial). A scale_fill_gradient2() function transforms density values with a square root scale for better readability. The map overlays:

Delhi’s boundary (geom_sf(delhi_map))
Subway stops & lines (geom_sf(subway_stops, subway_lines))
Population buffers (geom_sf(subway_stops_buffer_1km, subway_stops_buffer_2km))

Code

# output_file <- paste0("GlobPOP_Count_30arc_", i, "_I32.tiff")
# 
# # ALERT: Downloads Massive amounts of Data ---------------------------
# for (i in year_ranges) {
#   url <- paste0(
#     "https://zenodo.org/records/11179644/files/GlobPOP_Count_30arc_", 
#     i, 
#     "_I32.tiff"
#   )
#   
#   output_file <- paste0("GlobPOP_Count_30arc_", i, "_I32.tiff")
#   
#   # Attempt to download the file with error handling
#   tryCatch({
#     download.file(url, output_file, mode = "wb")
#     cat("Successfully downloaded:", output_file, "\n")
#   }, error = function(e) {
#     cat("Error downloading:", output_file, "-", conditionMessage(e), "\n")
#   })
# }
# --------------------------------------------------------------------

# A 2022 population raster of Delhi
rast_2022 <- rast("GlobPOP_Count_30arc_2022_I32.tiff") |> 
  terra::crop(delhi_map) |> 
  terra::mask(delhi_map, touches = FALSE)

rast_2022[rast_2022 <= 0] <- 0.001

names(rast_2022) <- c("year_2022")
  
total_rast_pop <- rast_2022 |> 
  values() |> 
  as_tibble() |> 
  filter(!is.na(year_2022)) |> 
  summarise(
    n = n(),
    mean_pop = mean(year_2022, na.rm = T),
    total_pop = n * mean_pop
  )
  
pop_compute <- rast_2022 |> 
  terra::extract(
    c(subway_stops_buffer_1km, subway_stops_buffer_2km) |> 
        st_as_sf() |> 
        rename(geometry = x)
  ) |> 
  as_tibble() |> 
  group_by(ID) |> 
  summarise(
    n = n(),
    mean_pop = mean(year_2022, na.rm = T)
  ) |> 
  mutate(
  year = 2022,
  zone = c("1 km", "2 km"),
  total_pop = n * mean_pop,
  perc_pop = total_pop / total_rast_pop$total_pop
)

g <- ggplot() +
  geom_spatraster(
    data = rast_2022
  ) +
  scale_fill_gradient2(
    low = "white",
    high = "grey20",
    na.value = "transparent",
    transform = "sqrt",
    name = "Population density\n(persons / sq. km.)",
    breaks = c(0, 100, 2000, 1e4, 3e4, 5e4),
    labels = scales::label_number(scale_cut = cut_short_scale())
  ) +
  geom_sf(
    data = delhi_map,
    fill = NA
  ) +
  geom_sf(
    data = subway_stops,
    colour = "blue",
    size = 0.4,
    alpha = 0.5
  ) +
  geom_sf(
    data = subway_lines,
    colour = "blue",
    linewidth = 0.1,
    linetype = "longdash"
  ) +
  geom_sf(
    data = subway_stops_buffer_2km,
    fill = alpha("red", 0.1), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops_buffer_1km,
    fill = alpha("red", 0.2), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  labs(
    title = "Delhi: Population Density vs. Subway Stops (Year: 2022)",
    subtitle = "Areas within 1 km and 2 km of a Metro (Subway) stop are coloured in red.",
    caption = plot_caption
  ) +
  theme(
    legend.position = "inside",
    plot.caption = element_textbox(
      halign = 0.5, 
      hjust = 0.5,
      size = bts * 0.6
    ),
    panel.grid = element_line(linewidth = 0.6),
    legend.position.inside = c(0, 0),
    legend.justification = c(0, 0),
    legend.direction = "horizontal",
    legend.title.position = "top",
    legend.title = element_text(
      margin = margin(0,0,2,0, "pt"),
      lineheight = 0.3
    ),
    legend.text = element_text(
      margin = margin(1,0,0,0, "pt")
    ),
    legend.key.height = unit(2, "pt"),
    plot.margin = margin(5,0,5,0, "pt"),
    plot.title = element_text(
      margin = margin(5,0,5,0, "pt")
    )
  )

ggsave(
  plot = g, 
  filename = here::here(
    "projects", "images",
    "cities_pop_subway_2.png"
  ),
  height = 1200,
  width = 1200,
  units = "px",
  bg = "white"
)

Figure 2: A map, like earlier figure, but with a population density raster map of Delhi, in grayscale.

Code for plotting and comparing multiple years (1990-2022)

This R script analyzes population distribution around subway stations in Delhi over time using raster and vector spatial data. It utilizes the {terra} package (terra) for raster data processing, {sf} (sf) for handling vector spatial data, and {ggplot2} (ggplot2) for visualization. The script begins by selecting years to analyze and defining buffer zones of 1 km and 2 km around subway stations. It then iterates through each year, loading and cropping population raster data to match the Delhi map boundaries, replacing zero values with a small positive value to avoid computational errors. The raster is masked to exclude areas outside Delhi, and population values are extracted within the defined buffer zones.

The extracted population data is summarized using {dplyr} (dplyr) functions such as group_by(), summarise(), and mutate(), calculating mean and total population within each buffer. The script constructs a multi-layered raster stack to store population data across years and visualizes the results with {ggplot2}. A bar chart compares the proportion of Delhi’s population within 1 km and 2 km of metro stations over time, highlighting the relatively stable percentage despite metro expansion.

The analysis shows that the Delhi Metro did not significantly increase population density around metro stations over time. Instead, the metro appears to have been built to connect already densely populated areas rather than driving new population growth in those locations. This suggests that metro development follows existing population patterns, rather than causing major demographic shifts.

Code

# Years to plot
selected_year = c(seq(1991, 2018, 3), 2020)

# a blank tibble to start with
pop_compute <- tibble()

# Buffer Zones of 1 km and 2 km around the Subway Stations
compute_areas <- c(subway_stops_buffer_1km, subway_stops_buffer_2km) |> 
            st_as_sf() |> 
            rename(geometry = x)

for(i in selected_year) {
  rast <- rast(paste0("GlobPOP_Count_30arc_", i, "_I32.tiff")) |> 
     terra::crop(delhi_map) |> 
     terra::mask(delhi_map, touches = FALSE)
  
  rast[rast <= 0] <- 0.001
  
  names(rast) <- "year_vals"
    
  total_rast_pop <- rast |> 
    values() |> 
    as_tibble() |> 
    filter(!is.na(year_vals)) |> 
    summarise(
      n = n(),
      mean_pop = mean(year_vals, na.rm = T),
      total_pop = n * mean_pop
    )
    
  pop_compute <- bind_rows(
    pop_compute,
    
    rast |> 
      terra::extract(compute_areas) |> 
      as_tibble() |> 
      group_by(ID) |> 
      summarise(
        n = n(),
        mean_pop = mean(year_vals, na.rm = T)
      ) |> 
      mutate(
      year = i,
      zone = c("1 km", "2 km"),
      total_pop = n * mean_pop,
      tot_del_pop = total_rast_pop$total_pop,
      perc_pop = total_pop / total_rast_pop$total_pop
    )
  )
  
  paste0("rast_", i) |> 
    assign(rast)
    
}

## Compile into a one multi-layered raster -----------------------------
# Initialize an empty SpatRaster object
rast_stack <- NULL
# Loop through each year and add the raster if it exists
for (y in selected_year) {
  rast_name <- paste0("rast_", y)  # Construct variable name
  
  if (exists(rast_name)) {  # Check if raster exists
    rast <- get(rast_name)  # Retrieve raster
    
    if (is.null(rast_stack)) {
      rast_stack <- rast  # Initialize with first available raster
    } else {
      rast_stack <- c(rast_stack, rast)  # Append to SpatRaster
    }
  } else {
    message(paste("Skipping year", y, "as raster is missing"))
  }
}

names(rast_stack) <- as.character(selected_year)
varnames(rast_stack) <- as.character(selected_year)

pop_compute |> 
  ggplot(
    aes(x = year, y = tot_del_pop)
  ) +
  geom_col() +
  facet_wrap(~zone) +
  scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()))

plotdf1 <- pop_compute |> 
  mutate(
    lyr = as.character(year)
  ) |> 
  mutate(
    x_var = case_when(
      zone == "1 km" ~ 76.9,
      zone == "2 km" ~ 77.0,
      .default = NA
    ),
    y_var = 28.4 + ((0.1 * perc_pop) * 1 / 0.75)
  )
  

ggplot() +
  geom_col(
    data = plotdf1,
    mapping = aes(
      y = y_var,
      x = x_var,
      fill = zone
    ),
    colour = "grey30",
    linewidth = 0.3
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_var,
      x = x_var
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 2,
    lineheight = 0.45
  ) +
  facet_wrap(
    ~lyr
  ) +
  scale_fill_manual(
    name = "Distance to nearest\nMetro Station",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(28.4, 28.9),
    xlim = c(76.85, 77.33)
  ) +
  theme_minimal(
    base_size = 12
  )

pop_compute |> 
  ggplot(
    aes(
      x = year,
      y = perc_pop
    )
  ) +
  geom_col() +
  scale_x_continuous(
    breaks = selected_year
  ) +
  facet_wrap(~zone) +
  theme(
    axis.text.x = element_text(
      angle = 90
    )
  )

# Convert 2021 and 2022 rasters to integer class, instead of double
# rast_stack$`2021` <- as.int(rast_stack$`2021`)

g1 <- ggplot() +
  geom_spatraster(
    data = rast_stack
  ) +
  scale_fill_gradient2(
    low = "white",
    high = "grey20",
    na.value = "transparent",
    transform = "sqrt",
    name = "Population\ndensity (persons\n/ sq. km.)",
    breaks = c(0, 100, 2000, 1e4, 3e4, 5e4),
    limits = c(100, 6e4),
    oob = scales::squish,
    labels = scales::label_number(scale_cut = cut_short_scale())
  ) +
  geom_sf(
    data = delhi_map,
    fill = NA,
    linewidth = 0.9
  ) +
  geom_sf(
    data = subway_stops_buffer_2km,
    fill = alpha("yellow", 0.3), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops_buffer_1km,
    fill = alpha("red", 0.3), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops,
    colour = "blue",
    size = 1,
    alpha = 0.5
  ) +
  geom_sf(
    data = subway_lines,
    colour = "blue",
    linewidth = 0.2
  ) +
  
  ggnewscale::new_scale_fill()+
  geom_col(
    data = plotdf1,
    mapping = aes(
      y = y_var,
      x = x_var,
      fill = zone
    ),
    colour = "grey30",
    linewidth = 0.3
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_var,
      x = x_var
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 11,
    lineheight = 0.3,
    family = "caption_font"
  ) +
  scale_fill_manual(
    name = "Distance to\nnearest Metro\nStation",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  
  facet_wrap(
    ~lyr,
    ncol = 4
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(28.4, 28.9),
    xlim = c(76.8, 77.35),
    expand = FALSE
  ) +
  
  labs(
    title = "What percent of Delhi's population lives close to a metro station?",
    subtitle = str_wrap("Delhi Metro (Subway) was started in 2002, First phase was completed in 2006, phase II in 2011, and phase III in 2021. Surprisingly, the percentage of population that lives in areas that are now within 1-2 km of metro stations has not changed since early 1990s!", 100),
    caption = plot_caption,
    x = NULL, 
    y = NULL
  ) +
  theme(
    text = element_text(
      colour = "grey30"
    ),
    legend.position = "inside",
    panel.grid = element_line(
      linewidth = 0.6,
      linetype = 3
    ),
    
    # Customizing the Legend
    legend.position.inside = c(1, 0),
    legend.justification = c(1, 0),
    legend.direction = "vertical",
    legend.box = "horizontal",
    legend.box.margin = margin(0,1,0,0, "pt"),
    legend.margin = margin(0,0,0,0, "pt"),
    legend.title.position = "top",
    legend.title = element_text(
      margin = margin(0,0,2,0, "pt"),
      lineheight = 0.3,
      size = bts * 1.7
    ),
    legend.text = element_text(
      margin = margin(0,0,0,2, "pt"),
      size = bts * 1.2
    ),
    legend.key.height = unit(30, "pt"),
    legend.key.width = unit(10, "pt"),
    
    
    plot.margin = margin(5,0,5,0, "pt"),
    plot.title = element_text(
      margin = margin(15,0,5,0, "pt"),
      size = 4 * bts,
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      margin = margin(0,0,0,0, "pt"),
      lineheight = 0.3,
      hjust = 0.5,
      size = 2.5 * bts
    ),
    plot.caption = element_textbox(
      halign = 0.5, 
      hjust = 0.5,
      size = bts * 1.2,
      margin = margin(150,0,0,0, "pt")
    ),
    panel.spacing = unit(0, "pt"),
    panel.background = element_rect(
      fill = "transparent",
      colour = "transparent"
    ),
    strip.text = element_text(
      margin = margin(5,0,0,0, "pt"),
      hjust = 0.5,
      size = bts * 3
    ),
    axis.ticks = element_blank(),
    axis.ticks.length = unit(0, "pt")
  )

# An inset graph of changes over years
strip_labels <- c(
  "% Population within 1 km",
  "% Population within 2 km"
)
names(strip_labels) <- unique(plotdf1$zone)

g2 <- ggplot(
  data = plotdf1,
  mapping = aes(
    x = as.character(year), 
    y = perc_pop,
    fill = zone
    )
  ) +
  geom_col(
    colour = "grey20",
    linewidth = 0.2
  ) +
  geom_text(
    mapping = aes(
      label = paste0(round(100 * perc_pop, 1), "%")
    ),
    hjust = 0.5,
    vjust = -0.35,
    size = 12,
    colour = "grey30",
    family = "caption_font"
  ) +
  scale_y_continuous(
    labels = label_percent(),
    expand = expansion(c(0, 0.2))
  ) +
  scale_fill_manual(
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  facet_wrap(
    ~zone,
    labeller = labeller(zone = strip_labels)
  ) +
  labs(x = NULL, y = NULL) +
  theme(
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(
      linewidth = 0.3, linetype = "longdash",
      colour = "grey50"
    ),
    legend.position = "none",
    axis.ticks.length = unit(0, "pt"),
    axis.text = element_text(
      size = 36
    ),
    strip.text = element_text(
      size = 66,
      margin = margin(0,0,0,0, "pt"),
      colour = "grey30"
    )
  )

g <- g1 +
  inset_element(
    p = g2,
    left = 0, right = 1,
    bottom = 0.02, top = 0.17,
    align_to = "full"
  )

ggsave(
  plot = g, 
  filename = here::here(
    "projects", "images",
    "cities_pop_subway_3.png"
  ),
  height = 1200 * (3 * 5 / 4),
  width = 1200 * 3,
  units = "px",
  bg = "white"
)

Figure 3: The graphic illustrates the proportion of Delhi’s population residing within 1 km and 2 km of a metro station across multiple years. Each facet represents a different year, showing how population distribution relative to metro accessibility has evolved over time. The bar heights indicate the percentage of the total population within the respective buffer zones, while the background map provides spatial context for metro stations, lines, and population density. An inset chart further summarizes the temporal trends for both buffer zones.

Same Analysis for London

The following code does the exact same analysis for Greater London (UK).

Code

# Get a basic map of Greater London Area
london_map <- spData::lnd |>
  st_union() |> 
  st_as_sf() |> 
  st_transform("EPSG:4326")

# Get Subway Lines and Subway Stops (called Tube in London)
library(osmdata)

subway_lines_london <- opq(bbox = st_bbox(london_map)) |> 
  add_osm_feature(
    key = "railway",
    value = "subway"
  ) |> 
  osmdata_sf()

subway2 <- subway_lines_london$osm_lines |> 
  st_intersection(london_map)

subway_stations_london <- opq(bbox = st_bbox(london_map)) |> 
  add_osm_feature(
    key = "railway",
    value = "subway_entrance"
  ) |> 
  osmdata_sf()

subway3 <- subway_stations_london$osm_points |> 
  st_intersection(london_map)

# subway_route_london <- opq(bbox = st_bbox(london_map)) |> 
#   add_osm_feature(
#     key = "railway",
#     value = c("light_rail", "monorail", "tram")
#   ) |> 
#   osmdata_sf()
# 
# subway4 <- subway_route_london$osm_lines |> 
#   st_intersection(london_map)

# A test map for london and subway stations
ggplot() +
  geom_sf(data = london_map) +
  geom_sf(
    data = subway3,
    colour = "red",
    size = 0.5
  ) +
  geom_sf(
    data = subway2,
    colour = "blue"
  )


subway_stops <- subway3

subway_lines <- subway2

subway_stops_buffer_1km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 1000) |> 
  st_intersection(london_map)

subway_stops_buffer_2km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 2000) |> 
  st_intersection(london_map)

# Years to plot
selected_year = seq(1991, 2022, 3)

# a blank tibble to start with
pop_compute <- tibble()

# Buffer Zones of 1 km and 2 km around the Subway Stations
compute_areas <- c(subway_stops_buffer_1km, subway_stops_buffer_2km) |> 
            st_as_sf() |> 
            rename(geometry = x)

for(i in selected_year) {
  rast <- rast(paste0("GlobPOP_Count_30arc_", i, "_I32.tiff")) |> 
     terra::crop(london_map) |> 
     terra::mask(london_map, touches = FALSE)
  
  rast[rast <= 0] <- 0.001
  
  names(rast) <- "year_vals"
    
  total_rast_pop <- rast |> 
    values() |> 
    as_tibble() |> 
    filter(!is.na(year_vals)) |> 
    summarise(
      n = n(),
      mean_pop = mean(year_vals, na.rm = T),
      total_pop = n * mean_pop
    )
    
  pop_compute <- bind_rows(
    pop_compute,
    
    rast |> 
      terra::extract(compute_areas) |> 
      as_tibble() |> 
      group_by(ID) |> 
      summarise(
        n = n(),
        mean_pop = mean(year_vals, na.rm = T)
      ) |> 
      mutate(
      year = i,
      zone = c("1 km", "2 km"),
      total_pop = n * mean_pop,
      tot_del_pop = total_rast_pop$total_pop,
      perc_pop = total_pop / total_rast_pop$total_pop
    )
  )
  
  paste0("rast_", i) |> 
    assign(rast)
    
}

## Compile into a one multi-layered raster -----------------------------
# Initialize an empty SpatRaster object
rast_stack <- NULL
# Loop through each year and add the raster if it exists
for (y in selected_year) {
  rast_name <- paste0("rast_", y)  # Construct variable name
  
  if (exists(rast_name)) {  # Check if raster exists
    rast <- get(rast_name)  # Retrieve raster
    
    if (is.null(rast_stack)) {
      rast_stack <- rast  # Initialize with first available raster
    } else {
      rast_stack <- c(rast_stack, rast)  # Append to SpatRaster
    }
  } else {
    message(paste("Skipping year", y, "as raster is missing"))
  }
}

names(rast_stack) <- as.character(selected_year)
varnames(rast_stack) <- as.character(selected_year)

# Check total population of London
# pop_compute |> 
#   ggplot(
#     aes(x = year, y = tot_del_pop)
#   ) +
#   geom_col() +
#   facet_wrap(~zone) +
#   scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()))

# Rework the dataset so that it can be displayed on coord_sf()
plotdf1 <- pop_compute |> 
  mutate(
    lyr = as.character(year)
  ) |> 
  mutate(
    x_var = case_when(
      zone == "1 km" ~ 0.22,
      zone == "2 km" ~ 0.32,
      .default = NA
    ),
    y_var = 51.3 + ((0.1 * perc_pop) * 1 / 0.75)
  )
  

ggplot() +
  geom_col(
    data = plotdf1,
    mapping = aes(
      y = y_var,
      x = x_var,
      fill = zone
    ),
    colour = "grey30",
    linewidth = 0.3
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_var,
      x = x_var
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 2,
    lineheight = 0.45
  ) +
  facet_wrap(
    ~lyr,
  ) +
  scale_fill_manual(
    name = "Distance to nearest\nTube Station",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(51.28, 51.7),
    xlim = c(-0.53, 0.45),
    expand = FALSE
  ) +
  theme_minimal(
    base_size = 12
  )

# Check change in percentage population near Tube Station over time
# pop_compute |> 
#   ggplot(
#     aes(
#       x = year,
#       y = perc_pop
#     )
#   ) +
#   geom_col() +
#   scale_x_continuous(
#     breaks = selected_year
#   ) +
#   facet_wrap(~zone) +
#   theme(
#     axis.text.x = element_text(
#       angle = 90
#     )
#   )

# Convert 2021 and 2022 rasters to integer class, instead of double
# rast_stack$`2021` <- as.int(rast_stack$`2021`)

g1 <- ggplot() +
  geom_spatraster(
    data = rast_stack
  ) +
  scale_fill_gradient2(
    low = "white",
    high = "grey20",
    na.value = "transparent",
    transform = "sqrt",
    name = "Population\ndensity (persons\n/ sq. km.)",
    breaks = c(0, 100, 2000, 1e4),
    limits = c(10, 2e4),
    oob = scales::squish,
    labels = scales::label_number(scale_cut = cut_short_scale())
  ) +
  geom_sf(
    data = london_map,
    fill = NA,
    linewidth = 0.7
  ) +
  geom_sf(
    data = subway_stops_buffer_2km,
    fill = alpha("yellow", 0.2), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops_buffer_1km,
    fill = alpha("red", 0.2), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops,
    colour = "blue",
    size = 0.5,
    alpha = 0.5
  ) +
  geom_sf(
    data = subway_lines,
    colour = "blue",
    linewidth = 0.1
  ) +
  
  ggnewscale::new_scale_fill()+
  geom_col(
    data = plotdf1,
    mapping = aes(
      y = y_var,
      x = x_var,
      fill = zone
    ),
    colour = "grey30",
    linewidth = 0.3
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_var,
      x = x_var
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 9,
    lineheight = 0.3,
    family = "caption_font"
  ) +
  scale_fill_manual(
    name = "Distance to\nnearest Tube\nStation",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  
  facet_wrap(
    ~lyr,
    ncol = 4
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(51.28, 51.7),
    xlim = c(-0.53, 0.4),
    expand = FALSE
  ) +
  
  labs(
    title = "What percent of London's population lives close to a Tube station?",
    subtitle = str_wrap("The London tube system is over 150 years old. Of its more than 270 Stations, very few (~20) have been added in last 3 decades. However, percentage of London's population that lives within 1 km and 2 km of a Tube Station has been slowly growing.", 100),
    caption = plot_caption,
    x = NULL, 
    y = NULL
  ) +
  theme(
    text = element_text(
      colour = "grey30"
    ),
    legend.position = "inside",
    panel.grid = element_line(
      linewidth = 0.6,
      linetype = 3
    ),
    
    # Customizing the Legend
    legend.position.inside = c(1, 0),
    legend.justification = c(1, 0),
    legend.direction = "vertical",
    legend.box = "horizontal",
    legend.box.margin = margin(0,1,0,0, "pt"),
    legend.margin = margin(0,0,0,0, "pt"),
    legend.title.position = "top",
    legend.title = element_text(
      margin = margin(0,0,2,0, "pt"),
      lineheight = 0.3,
      size = bts * 1.7
    ),
    legend.text = element_text(
      margin = margin(0,0,0,2, "pt"),
      size = bts * 1.2
    ),
    legend.key.height = unit(20, "pt"),
    legend.key.width = unit(15, "pt"),
    
    
    plot.margin = margin(5,0,5,0, "pt"),
    plot.title = element_text(
      margin = margin(15,0,10,0, "pt"),
      size = 4 * bts,
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      margin = margin(0,0,0,0, "pt"),
      lineheight = 0.3,
      hjust = 0.5,
      size = 2.5 * bts
    ),
    plot.caption = element_textbox(
      halign = 0.5, 
      hjust = 0.5,
      size = bts * 1.2,
      margin = margin(150,0,0,0, "pt")
    ),
    panel.spacing = unit(0, "pt"),
    panel.background = element_rect(
      fill = "transparent",
      colour = "transparent"
    ),
    strip.text = element_text(
      margin = margin(25,0,5,0, "pt"),
      hjust = 0.5,
      size = bts * 3
    ),
    axis.ticks = element_blank(),
    axis.ticks.length = unit(0, "pt")
  )

# An inset graph of changes over years
strip_labels <- c(
  "% Population within 1 km",
  "% Population within 2 km"
)
names(strip_labels) <- unique(plotdf1$zone)

g2 <- ggplot(
  data = plotdf1,
  mapping = aes(
    x = as.character(year), 
    y = perc_pop,
    fill = zone
    )
  ) +
  geom_col(
    colour = "grey20",
    linewidth = 0.2
  ) +
  geom_text(
    mapping = aes(
      label = paste0(round(100 * perc_pop, 1), "%")
    ),
    hjust = 0.5,
    vjust = -0.35,
    size = 12,
    colour = "grey30",
    family = "caption_font"
  ) +
  scale_y_continuous(
    labels = label_percent(),
    expand = expansion(c(0, 0.2))
  ) +
  scale_fill_manual(
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_cartesian(
    ylim = c(0.4, 0.7)
  ) +
  facet_wrap(
    ~zone,
    labeller = labeller(zone = strip_labels)
  ) +
  labs(x = NULL, y = NULL) +
  theme(
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(
      linewidth = 0.3, linetype = "longdash",
      colour = "grey50"
    ),
    legend.position = "none",
    axis.ticks.length = unit(0, "pt"),
    axis.text = element_text(
      size = 36
    ),
    strip.text = element_text(
      size = 66,
      margin = margin(0,0,0,0, "pt"),
      colour = "grey30"
    )
  )

g <- g1 +
  inset_element(
    p = g2,
    left = 0, right = 1,
    bottom = 0.02, top = 0.17,
    align_to = "full"
  )

ggsave(
  plot = g, 
  filename = here::here(
    "projects", "images",
    "cities_pop_subway_4.png"
  ),
  height = 1200 * 3.4,
  width = 1200 * 3,
  units = "px",
  bg = "white"
)

Figure 4: This graphic illustrates the spatial relationship between London’s population and its Tube network. The blue lines represent different London Underground routes, while blue dots mark station locations. The shaded areas show 1 km (red) and 2 km (yellow) buffer zones around stations, indicating regions with easy access to public transport. Unlike newer metro systems, London’s Tube is over 150 years old, with only about 20 new stations added in the last three decades. Despite this, the percentage of Londoners living within 1 km and 2 km of a Tube station has been gradually increasing, reflecting urban densification around existing infrastructure.

Similar Analysis for New York City

Code

# Load required packages
library(tigris)

# Get New York City Administrative boundary (county-based)
nyc_map <- counties(state = "NY", cb = TRUE, class = "sf") |> 
  dplyr::filter(NAME %in% c("New York", "Kings", 
                            "Queens", "Bronx", "Richmond")) |> 
  st_union() |> 
  st_as_sf() |> 
  st_transform("EPSG:4326")

# Get Coastline Map 
nyc_map2 <- rnaturalearth::ne_countries(
  country = "United States of America",
  scale = "large", 
  returnclass = "sf") |> 
  st_crop(st_bbox(nyc_map))

# Combine to get City map excluding Water
nyc_map <- st_intersection(nyc_map, nyc_map2)

# Test Check the map
ggplot(
  data = st_intersection(nyc_map, nyc_map2)
) +
  geom_sf()

# Get Subway Lines and Subway Stops (called Tube in London)
library(osmdata)

subway_lines_nyc <- opq(bbox = st_bbox(nyc_map)) |> 
  add_osm_feature(
    key = "railway",
    value = "subway"
  ) |> 
  osmdata_sf()

subway2 <- subway_lines_nyc$osm_lines |> 
  st_intersection(nyc_map)

subway_stations_nyc <- opq(bbox = st_bbox(nyc_map)) |> 
  add_osm_feature(
    key = "railway",
    value = "subway_entrance"
  ) |> 
  osmdata_sf()

subway3 <- subway_stations_nyc$osm_points |> 
  st_intersection(nyc_map)

# subway_route_nyc <- opq(bbox = st_bbox(nyc_map)) |> 
#   add_osm_feature(
#     key = "railway",
#     value = c("light_rail", "monorail", "tram")
#   ) |> 
#   osmdata_sf()
# 
# subway4 <- subway_route_nyc$osm_lines |> 
#   st_intersection(nyc_map)

# A test map for nyc and subway stations
ggplot() +
  geom_sf(data = nyc_map) +
  geom_sf(
    data = subway3,
    colour = "red",
    size = 0.5
  ) +
  geom_sf(
    data = subway2,
    colour = "blue"
  )


subway_stops <- subway3

subway_lines <- subway2

subway_stops_buffer_1km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 1000) |> 
  st_intersection(nyc_map)

subway_stops_buffer_2km <- subway_stops |> 
  st_union() |> 
  st_buffer(dist = 2000) |> 
  st_intersection(nyc_map)

# Years to plot
selected_year = seq(1991, 2022, 3)

# a blank tibble to start with
pop_compute <- tibble()

# Buffer Zones of 1 km and 2 km around the Subway Stations
compute_areas <- c(subway_stops_buffer_1km, subway_stops_buffer_2km) |> 
            st_as_sf() |> 
            rename(geometry = x)

for(i in selected_year) {
  rast <- rast(paste0("GlobPOP_Count_30arc_", i, "_I32.tiff")) |> 
     terra::crop(nyc_map) |> 
     terra::mask(nyc_map, touches = FALSE)
  
  rast[rast <= 0] <- 0.001
  
  names(rast) <- "year_vals"
    
  total_rast_pop <- rast |> 
    values() |> 
    as_tibble() |> 
    filter(!is.na(year_vals)) |> 
    summarise(
      n = n(),
      mean_pop = mean(year_vals, na.rm = T),
      total_pop = n * mean_pop
    )
    
  pop_compute <- bind_rows(
    pop_compute,
    
    rast |> 
      terra::extract(compute_areas) |> 
      as_tibble() |> 
      group_by(ID) |> 
      summarise(
        n = n(),
        mean_pop = mean(year_vals, na.rm = T)
      ) |> 
      mutate(
      year = i,
      zone = c("1 km", "2 km"),
      total_pop = n * mean_pop,
      tot_del_pop = total_rast_pop$total_pop,
      perc_pop = total_pop / total_rast_pop$total_pop
    )
  )
  
  paste0("rast_", i) |> 
    assign(rast)
    
}

## Compile into a one multi-layered raster -----------------------------
# Initialize an empty SpatRaster object
rast_stack <- NULL
# Loop through each year and add the raster if it exists
for (y in selected_year) {
  rast_name <- paste0("rast_", y)  # Construct variable name
  
  if (exists(rast_name)) {  # Check if raster exists
    rast <- get(rast_name)  # Retrieve raster
    
    if (is.null(rast_stack)) {
      rast_stack <- rast  # Initialize with first available raster
    } else {
      rast_stack <- c(rast_stack, rast)  # Append to SpatRaster
    }
  } else {
    message(paste("Skipping year", y, "as raster is missing"))
  }
}

names(rast_stack) <- as.character(selected_year)
varnames(rast_stack) <- as.character(selected_year)

# Check total population of nyc
pop_compute |>
  ggplot(
    aes(x = year, y = tot_del_pop)
  ) +
  geom_col() +
  facet_wrap(~zone) +
  scale_y_continuous(labels = label_number(scale_cut = cut_short_scale()))

# Rework the dataset so that it can be displayed on coord_sf()
plotdf1 <- pop_compute |> 
  mutate(
    lyr = as.character(year)
  ) |> 
  mutate(
    x_max = case_when(
      zone == "1 km" ~ -74.16,
      zone == "2 km" ~ -74.06,
      .default = NA
    ),
    x_min = case_when(
      zone == "1 km" ~ -74.24,
      zone == "2 km" ~ -74.14,
      .default = NA
    ),
    y_max = 40.7 + ((0.1 * perc_pop) * 1 / 0.75),
    y_min = 40.7
  ) |> 
  mutate(
    geometry = pmap(
      list(x_min, x_max, y_min, y_max),
      ~ st_polygon(list(matrix(
        c(..1, ..3,  # Bottom-left
          ..2, ..3,  # Bottom-right
          ..2, ..4,  # Top-right
          ..1, ..4,  # Top-left
          ..1, ..3), # Closing the polygon
        ncol = 2, byrow = TRUE
      )))
    )
  ) |> 
  st_as_sf() |> 
  st_set_crs("EPSG:4326")


ggplot() +
  geom_sf(
    data = plotdf1,
    mapping = aes(fill = zone),
    linewidth = 0.2,
    colour = "grey30"
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_max,
      x = (x_min + x_max)/2
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 2,
    lineheight = 0.45
  ) +
  facet_wrap(
    ~lyr,
  ) +
  scale_fill_manual(
    name = "Distance to nearest\nTube Station",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(40.5, 40.9),
    xlim = c(-74.35, -73.7),
    expand = FALSE
  ) +
  theme_minimal(
    base_size = 12
  )

# Check change in percentage population near Tube Station over time
pop_compute |>
  ggplot(
    aes(
      x = year,
      y = perc_pop
    )
  ) +
  geom_col() +
  scale_x_continuous(
    breaks = selected_year
  ) +
  facet_wrap(~zone) +
  theme(
    axis.text.x = element_text(
      angle = 90
    )
  )

# Convert 2021 and 2022 rasters to integer class, instead of double
rast_stack$`2021` <- as.int(rast_stack$`2021`)

g1 <- ggplot() +
  geom_spatraster(
    data = rast_stack
  ) +
  scale_fill_gradient2(
    low = "white",
    high = "grey20",
    na.value = "transparent",
    transform = "sqrt",
    name = "Population\ndensity (persons\n/ sq. km.)",
    breaks = c(0, 100, 2000, 1e4, 3e4),
    limits = c(10, 5e4),
    oob = scales::squish,
    labels = scales::label_number(scale_cut = cut_short_scale())
  ) +
  geom_sf(
    data = nyc_map,
    fill = NA,
    linewidth = 0.7
  ) +
  geom_sf(
    data = subway_stops_buffer_2km,
    fill = alpha("yellow", 0.2), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops_buffer_1km,
    fill = alpha("red", 0.2), 
    colour = "transparent",
    linewidth = 0.1
  ) +
  geom_sf(
    data = subway_stops,
    colour = "blue",
    size = 0.5,
    alpha = 0.5
  ) +
  geom_sf(
    data = subway_lines,
    colour = "blue",
    linewidth = 0.1
  ) +
  
  ggnewscale::new_scale_fill()+
  geom_sf(
    data = plotdf1,
    mapping = aes(fill = zone),
    linewidth = 0.2,
    colour = "grey30"
  ) +
  geom_text(
    data = plotdf1,
    mapping = aes(
      label = paste0(round(perc_pop * 100, 1),"%",
                     "\n<", zone),
      y = y_max,
      x = (x_min + x_max)/2
    ),
    colour = "grey30",
    vjust = 1.2,
    hjust = 0.5,
    size = 8,
    lineheight = 0.4
  ) +
  scale_fill_manual(
    name = "Distance to\nnearest Tube\nStation",
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_sf(
    default_crs = "EPSG:4326",
    ylim = c(40.48, 40.94),
    xlim = c(-74.28, -73.68),
    expand = FALSE,
    clip = "off"
  ) +
  
  facet_wrap(
    ~lyr,
    ncol = 4
  ) +
  labs(
    title = "Almost all of New York City's population lives close to Subway !",
    subtitle = str_wrap("New York City's percentage population that lives within 1 km of a subway stop is very high, much above other cities, given the high number of Subway Stops and lines. It has been consistent over last 3 decades, since the Subway system is much older.", 100),
    caption = plot_caption,
    x = NULL, 
    y = NULL
  ) +
  theme(
    text = element_text(
      colour = "grey30"
    ),
    legend.position = "inside",
    panel.grid = element_line(
      linewidth = 0.6,
      linetype = 3
    ),
    
    # Customizing the Legend
    legend.position.inside = c(1, 0),
    legend.justification = c(1, 0),
    legend.direction = "vertical",
    legend.box = "horizontal",
    legend.box.margin = margin(0,1,0,0, "pt"),
    legend.margin = margin(0,0,0,0, "pt"),
    legend.title.position = "top",
    legend.title = element_text(
      margin = margin(0,0,2,0, "pt"),
      lineheight = 0.3,
      size = bts * 1.7
    ),
    legend.text = element_text(
      margin = margin(0,0,0,2, "pt"),
      size = bts * 1.2
    ),
    legend.key.height = unit(30, "pt"),
    legend.key.width = unit(15, "pt"),
    
    
    plot.margin = margin(5,0,5,0, "pt"),
    plot.title = element_text(
      margin = margin(15,0,10,0, "pt"),
      size = 4 * bts,
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      margin = margin(0,0,0,0, "pt"),
      lineheight = 0.3,
      hjust = 0.5,
      size = 2.5 * bts
    ),
    plot.caption = element_textbox(
      halign = 0.5, 
      hjust = 0.5,
      size = bts * 1.2,
      margin = margin(150,0,0,0, "pt")
    ),
    panel.spacing = unit(0, "pt"),
    panel.background = element_rect(
      fill = "transparent",
      colour = "transparent"
    ),
    strip.text = element_text(
      margin = margin(25,0,-25,0, "pt"),
      hjust = 0.2,
      size = bts * 3
    ),
    axis.ticks = element_blank(),
    axis.ticks.length = unit(0, "pt")
  )

# An inset graph of changes over years
strip_labels <- c(
  "% Population within 1 km",
  "% Population within 2 km"
)
names(strip_labels) <- unique(plotdf1$zone)

g2 <- ggplot(
  data = plotdf1,
  mapping = aes(
    x = as.character(year), 
    y = perc_pop,
    fill = zone
    )
  ) +
  geom_col(
    colour = "grey20",
    linewidth = 0.2
  ) +
  geom_text(
    mapping = aes(
      label = paste0(round(100 * perc_pop, 1), "%")
    ),
    hjust = 0.5,
    vjust = -0.35,
    size = 12,
    colour = "grey30",
    family = "caption_font"
  ) +
  scale_y_continuous(
    labels = label_percent(),
    expand = expansion(c(0, 0.2))
  ) +
  scale_fill_manual(
    values = c(
      alpha("red", 0.3),
      alpha("yellow", 0.3)
    )
  ) +
  coord_cartesian(
    ylim = c(0.6, 1)
  ) +
  facet_wrap(
    ~zone,
    labeller = labeller(zone = strip_labels)
  ) +
  labs(x = NULL, y = NULL) +
  theme(
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(
      linewidth = 0.15, 
      linetype = "longdash",
      colour = "grey70"
    ),
    legend.position = "none",
    axis.ticks.length = unit(0, "pt"),
    axis.text = element_text(
      size = 36
    ),
    strip.text = element_text(
      size = 66,
      margin = margin(0,0,0,0, "pt"),
      colour = "grey30"
    )
  )

g <- g1 +
  inset_element(
    p = g2,
    left = 0, right = 1,
    bottom = 0.02, top = 0.17,
    align_to = "full"
  )

ggsave(
  plot = g, 
  filename = here::here(
    "projects", "images",
    "cities_pop_subway_5.png"
  ),
  height = 1200 * 3.4,
  width = 1200 * 3,
  units = "px",
  bg = "white"
)

Figure 5: This graphic visualizes New York City’s extensive subway network and its relationship with population distribution. The dark blue lines represent subway routes, while the dark blue dots mark station locations. The shaded areas indicate regions within 1 km (red) and 2 km (yellow) of subway stops. The analysis reveals that around 85% of the city’s population lives within 1 km of a subway station, and nearly 95% within 2 km—a significantly higher proportion than most cities. This consistency over the past three decades highlights that New York’s subway was built early to lead to rise of dense neighborhoods, and is no longer driving new population movement, as it is already much higher concentration than other cities.

Save a thumbnail for the page

Code

library(magick)
image_read(
  here::here("projects", "images",
             "cities_pop_subway_5.png")
) |> 
  image_resize("x400") |> 
  image_write(
    here::here(
      "projects", "images",
      "cities_pop_subways.png"
    )
  )

References

Liu, Luling, Xin Cao, Shijie Li, and Na Jie. 2024. “A 31-Year (19902020) Global Gridded Population Dataset Generated by Cluster Analysis and Statistical Learning.” Scientific Data 11 (1). https://doi.org/10.1038/s41597-024-02913-0.