Chapter 5: Geometry operations

Key Learnings from, and Solutions to the exercises in Chapter 4 of the book Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow.

Geocomputation with R
Textbook Solutions
Author

Aditya Dahiya

Published

December 16, 2024

library(sf)        # Simple Features in R
library(terra)     # Handling rasters in R
library(tidyterra) # For plotting rasters in ggplot2
library(magrittr)  # Using pipes with raster objects
library(tidyverse) # All things tidy; Data Wrangling
library(spData)    # Spatial Datasets
library(patchwork) # Composing plots

5.1 Introduction

  • Previous chapters introduced geographic datasets’ structure (Chapter 2), attribute-based manipulation (Chapter 3), and spatial relations (Chapter 4).
  • Focus of this chapter: Manipulating geographic elements of spatial objects.
    • Examples: Creating buffers, simplifying/converting vector geometries, and raster aggregation/resampling.
  • Section 5.2: Transforming vector geometries using:
    • Unary operations: Simplifications, buffers, centroids, and affine transformations (Sections 5.2.1–5.2.4).
    • Binary operations: Modifying geometries through clipping and unions (Sections 5.2.5–5.2.7).
    • Type transformations: Converting geometry types, e.g., polygons to lines (Section 5.2.8).
  • Section 5.3: Raster transformations:
    • Alter pixel size, resolution, extent, and origin.
    • Align raster datasets for map algebra.

5.2 Geometric operations on vector data

  • Focus: Operations that modify the geometry of vector (sf) objects.
  • Key distinction: Works directly on geometry-level objects of class sfc, in addition to sf objects.
  • Examples: Drilling into geometry to transform, simplify, or reshape vector data.

5.2.1 Simplification

  • Generalizes vector geometries (lines/polygons) for smaller scale maps, reducing memory, disk space, and bandwidth usage. Useful for publishing interactive maps by simplifying complex geometries.
  • Key Functions and Algorithms:
    • st_simplify() from the sf package (Pebesma and Bivand 2023):
      • Implements the Douglas-Peucker algorithm (Douglas and Peucker 1973).

      • Controlled by dTolerance (generalization level in metres, or map units).

      • Simplifies individual geometries but does not preserve topology, leading to overlaps or holes.

Note

topology (noun): the way in which parts of something are organized, arranged or connected

  • ms_simplify() from the rmapshaper package (Teucher and Russell 2023):
    • Uses the Visvalingam algorithm (Visvalingam and Whyatt 1993).
    • Retains topology by default (keep_shapes = TRUE) and allows fine control over the vertex retention (keep: the % of vertices that are to be retained, given as a proportion).
  • smooth() from the smoothr package:
    • Smooths edges using techniques like Gaussian kernel regression, Chaikin’s algorithm, or spline interpolation.
    • Does not reduce vertex count and does not preserve topology.
    • Key parameter: smoothness (controls Gaussian bandwidth).
  • Examples of Simplification are shown in Figure 1
  • Applications of Smoothing:
    • Suitable for geometries derived from raster vectorization (e.g., Chapter 6).
Code
# Download India's Official states' map from
# https://github.com/Aditya-Dahiya/projects_presentations/tree/main/data/india_map

#### Base Map
india_states <- read_sf("India_State_Boundary.shp")

g <- india_states |> 
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "Base Map of India: in full detail",
    subtitle = "Source: Census of India"
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_1.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### st_simplify()
g <- india_states |> 
  st_simplify(dTolerance = 100000) |>    # 100 km tolerance
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "India: st_simplify(dTolerance = 100000)",
    subtitle = "Douglas-Peucker algorithm. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_2.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### rmapshaper::ms_simplify()
g <- india_states |> 
  rmapshaper::ms_simplify(keep = 0.0001, keep_shapes = TRUE) |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "India: rmapshaper::ms_simplify(keep_shapes = TRUE)",
    subtitle = "Visvalingam algorithm. Topology is retained."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_3.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### smoothr::smooth() - 3 methods
g <- india_states |> 
  st_simplify(dTolerance = 10000) |>  # To save computing time
  smoothr::smooth(method = "ksmooth",
                  smoothness = 5) |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"ksmooth\")",
    subtitle = "Gaussian kernel regression. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_4.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

g <- india_states |> 
  st_simplify(dTolerance = 50000) |>  # To save computing time
  smoothr::smooth(method = "chaikin") |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"chaikin\")",
    subtitle = "Chaikin’s corner cutting algorithm. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_5.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

g <- india_states |> 
  st_simplify(dTolerance = 50000) |>  # To save computing time
  smoothr::smooth(method = "spline") |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"spline\")",
    subtitle = "Spline interpolation. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_6.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

Official Map of India (full details)

Using st_simplify(dTolerance = 100000), i.e. 100 km resolution

Using rmapshaper::ms_simplify(keep_shape = TRUE) to retain topology

Using gaussian kernel regression with smoothr::smooth(method = “ksmooth”)

Using Chalkin’s corner cutting algorithm with smoothr::smooth(method = “chalkin”)

Using spline interpolation with smoothr::smooth(method = “spline”)
Figure 1

smoothr

A short note on the {smoothr} package (Strimas-Mackey 2023), which uses three different types of algorithms:-.

Note
Code
library(sf)
library(smoothr)
library(tidyverse)

# Smooth polygons using different methods
p_smooth_chaikin <- smooth(jagged_polygons, method = "chaikin")
p_smooth_ksmooth <- smooth(jagged_polygons, method = "ksmooth")
p_smooth_spline <- smooth(jagged_polygons, method = "spline")

# Combine data for plotting
plot_data <- bind_rows(
  mutate(st_as_sf(p_smooth_chaikin), method = "chaikin"),
  mutate(st_as_sf(p_smooth_ksmooth), method = "ksmooth"),
  mutate(st_as_sf(p_smooth_spline), method = "spline"),
  mutate(st_as_sf(jagged_polygons), method = "original")
)

# Assign colors to methods
method_colors <- c(
  chaikin = "#E41A1C",
  ksmooth = "#4DAF4A",
  spline = "#377EB8"
)

# Convert geometry for plotting
plot_data <- plot_data |> 
  mutate(geometry = st_sfc(geometry)) |> 
  st_as_sf()

p2 <- plot_data |> 
  filter(method != "original")

p1 <- plot_data |> 
  filter(method == "original") |> 
  select(-method)

# Plot with ggplot2
g <- ggplot(data = p2) +
  geom_sf(aes(geometry = geometry, 
              color = method),
          size = 0.7,
          linewidth = 0.5,
          fill = "transparent") +
  geom_sf(
    data = p1,
    fill = alpha("grey50", 0.5),
    colour = "transparent"
  ) +
  scale_color_manual(values = method_colors) +
  facet_grid(id ~ method) +
  guides(fill = "none") +
  theme_void() +
  theme(
    legend.position = "none",
    plot.title.position = "plot",
    strip.text.y = element_blank()
    ) +
  labs(
    title = "Simplification with {smoothr}",
    colour = "Method"
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_7.png"),
  plot = g,
  height = 2000,
  width = 1200,
  units = "px"
)

5.2.2 Centroids

  • Identify the center of geographic objects, creating single-point representations of complex geometries using st_centroid()

  • Types of Centroids: (shown in Figure 2)

    1. Geographic Centroid (center of mass):
      • Balances a spatial object (like balancing a plate).
      • Useful for creating simple point representations or estimating distances between polygons.
      • Calculated with st_centroid() from the sf package.
      • Limitation: Centroids may fall outside the object (e.g., doughnut-shaped polygons).
    2. Point on Surface:
      • Ensures the point lies within the object boundaries.
      • Useful for labeling irregular polygons, such as islands or multipolygon geometries.
      • Calculated with st_point_on_surface().
  • Other Centroid Types: Chebyshev center and visual center

Code
sysfonts::font_add_google("Saira Extra Condensed", "caption_font")
showtext::showtext_auto()

ggplot2::theme_set(
  theme_minimal(
    base_size = 30,
    base_family = "caption_font"
  ) +
    theme(
      text = element_text(
        lineheight = 0.3,
        hjust = 0.5
      ),
      plot.title.position = "plot"
    )
)

# Focussing on the Island Chains of India
andaman <- india_states |> 
  filter(
    State_Name == "Andaman & Nicobar"
  )
  
g1 <- ggplot(andaman) +
  geom_sf() +
  labs(
    title = "Base Map",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g2 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_centroid(andaman),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2
  ) +
  labs(
    title = "st_centroid()",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g3 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_centroid(andaman, of_largest_polygon = TRUE),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2,
    fill = "transparent"
  ) +
  labs(
    title = "st_centroid\n(of_largest_polygon = TRUE)",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g4 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_point_on_surface(andaman),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2,
    fill = "transparent"
  ) +
  labs(
    title = "st_point_on_surface()",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-2_1.png"),
  plot = patchwork::wrap_plots(g1, g2, g3, g4, nrow = 1),
  height = 1900,
  width = 2400,
  units = "px"
)
Figure 2: Various centroids using st_centroid() and st_point_on_surface()

5.2.3 Buffers

  • Buffers are polygons representing areas within a specified distance from a geometric feature (point, line, or polygon).
  • Purpose: Used for geographic data analysis (not just visualization). Examples:
    • How many points are within a given distance of a line?
    • Which demographic groups are within travel range of a new shop?
  • st_buffer() from the sf package. Example Visualization is shown in Figure 3
    • Input: Geometry and dist (distance in CRS units, e.g., meters).
    • Output: One buffer polygon per geometry feature.
  • Other Key Arguments in st_buffer():
    • nQuadSegs (GEOS engine):
      • Number of segments per quadrant (default: 30).
      • Adjust: Decrease for memory concerns, or increase for high resolution output.
    • max_cells (S2 engine):
      • Higher values create smoother buffers (slower computation).
    • endCapStyle and joinStyle (GEOS engine):
      • Control buffer edge appearance (useful for lines).
    • singleSide (GEOS engine):
      • Buffer on one or both sides of the geometry.
Code
a1 <- andaman |> 
  st_cast("POLYGON")

a2 <- st_buffer(a1, dist = 20000) |> 
  mutate(id = as_factor(row_number()))

a3 <- st_buffer(a1, dist = 20000, nQuadSegs = 0.5) |> 
  mutate(id = as_factor(row_number()))

g1 <- ggplot() +
  geom_sf(data = a1) +
  labs(
    title = "Base Map with\nst_cast(\"POLYGON\")",
    subtitle = "Nicobar Islands"
  ) +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )

g2 <- ggplot() +
  geom_sf(
    data = a2,
    mapping = aes(fill = id),
    alpha = 0.2,
    colour = "transparent"
  ) +
  geom_sf(
    data = a1
  ) +
  labs(
    title = "With 20 km buffer\n around each island",
    subtitle = "Nicobar Islands; each\nbuffer in separate colour"
  ) +
  theme(legend.position = "none") +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )


g3 <- ggplot() +
  geom_sf(
    data = a3,
    mapping = aes(fill = id),
    alpha = 0.2,
    colour = "transparent"
  ) +
  geom_sf(
    data = a1
  ) +
  labs(
    title = "With 20 km buffer\n around each island",
    subtitle = "Nicobar Islands;\narugment (nQuadSegs = 1)"
  ) +
  theme(legend.position = "none") +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-3_1.png"),
  plot = patchwork::wrap_plots(g1, g2, g3, nrow = 1),
  height = 1200,
  width = 2000,
  units = "px"
)
Figure 3: Use of st_buffer()

5.2.4 Affine Transformations

  • Definition: Transformations that preserve lines and parallelism but not necessarily angles or lengths.
  • Types of Affine Transformations:
    1. Shifting (Translation): Moves every point by a constant distance.

      • Example: Adding a vector to shift all y-coordinates north by 400 km distance while keeping x-coordinates unchanged using:

        n_shift <- n1 |> 
          add(c(0, 400000)) |> 
          st_set_crs(st_crs(n1))
      • Note: This converts the CRS of the new sfc object to NA and thus needs st_set_crs() to return it back to the original CRS.

    2. Scaling: Enlarges or shrinks geometries.

      • Global Scaling:
        • Multiplies all coordinates relative to the origin, preserving topological relations.
      • Local Scaling:
        • Scales geometries around specific points (e.g., centroids).

        • Steps:

          1. Shift geometries so the centroid becomes (0,0).
          2. Scale by a factor.
          3. Shift back to original centroid coordinates.
        • Example: Enlarge the geometries by a factor of 2.5.

          n1_scale <- (n1 - n1_centroid) |> 
            multiply_by(2.5) |> 
            add(n1_centroid) |> 
            st_set_crs(st_crs(n1))
    3. Rotation: Rotates coordinates using a rotation matrix.

      • Rotation matrix: Define a function to create the rotation matrix and apply it to the geometry. R=[cosθ/sinθ​ − sinθ/cosθ​]
  • Replacing Old Geometry: Use st_set_geometry() from the sf package to finally Replace original geometry with scaled versions (shifted, rotated or scaled)
  • Applications:
    • Shifting: For label placement.
    • Scaling: In non-contiguous cartograms (see Section 9.6).
    • Rotation: Correcting distorted geometries during re-projection.
Code
sysfonts::font_add_google("Saira Extra Condensed", "caption_font")
showtext::showtext_auto()

ggplot2::theme_set(
  theme_minimal(
    base_size = 30,
    base_family = "caption_font"
  ) +
    theme(
      text = element_text(
        lineheight = 0.3,
        hjust = 0.5
      ),
      plot.title.position = "plot",
      plot.title = element_text(hjust = 1),
      plot.subtitle = element_text(hjust = 1),
      panel.grid = element_line(
        linewidth = 0.2
      )
    )
)

df1 <- andaman |> 
  st_cast("POLYGON") |> 
  mutate(id = row_number()) |> 
  filter(id < 10) |> 
  mutate(
    name = case_when(
      id %in% c(4,8, 9, 7) ~ "Nicobar Islands",
      .default = "Andaman Islands"
    )
  )


# Pull out only sfc class (i.e. geometry for Andaman Islands)
a1 <- df1 |> 
  filter(name == "Andaman Islands") |> 
  st_geometry()

# Pull out only sfc class (i.e. geometry for Nicobar Islands)
n1 <- df1 |> 
  filter(name == "Nicobar Islands") |> 
  st_geometry()


g1 <- df1 |> 
  ggplot(aes(fill = name)) +
  geom_sf(colour = "transparent") +
  geom_sf_text(aes(label = id)) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) +
  labs(
    title = "Base Map",
    subtitle = "10 Largest Islands amongst\nAndamand and Nicobar Island chain",
    fill = NULL, x = NULL, y = NULL
  ) +
  scale_fill_manual(values = c("#89973DFF", "#E8B92FFF")) +
  theme(
    legend.position = "left"
  )

g2 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(data = n1, fill = "#E8B92FFF", colour = "transparent") +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Plotting as separate\nsfc objects",
    subtitle = "10 Largest Islands"
  )

#################### Shifting #########################

n_shift <- n1 |> 
  add(c(0, 400000)) |> 
  st_set_crs(st_crs(n1))

g3 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n_shift, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) +
  labs(
    title = "Shifting sfc objects",
    subtitle = "Bring Nicobar Islands\ncloser to the Andamans"
  )

#################### Scaling ##########################

n1_centroid <- st_centroid(n1)

n1_scale <- (n1 - n1_centroid) |> 
  multiply_by(2.5) |> 
  add(n1_centroid) |> 
  st_set_crs(st_crs(n1))

g4 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n1_scale, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Scaling sfc objects",
    subtitle = "Enlarging Nicobar Islands\nby 2.5 times."
  )

##################### Rotation ########################
rotation = function(a){
  r = a * pi / 180 #degrees to radians
  matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
} 

n1_rotate <- (n1 - n1_centroid) |> 
  multiply_by(rotation(90)) |> 
  add(n1_centroid) |> 
  st_set_crs(st_crs(n1))


g5 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n1_rotate, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Rotating sfc objects",
    subtitle = "Rotating Nicobar Islands\nclockwise by 90 degrees"
  )

g <- patchwork::wrap_plots(g1, g3, g4, g5) +
  patchwork::plot_layout(widths = c(1,1,1,1,1), nrow = 1)

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-4_1.png"),
  plot = g,
  height = 1200,
  width = 2500,
  units = "px"
)

Data Viz demonstration

Here’s a more visually appealing version of the same graphic, produced using complete code given on this page.

This plot demonstrates the application of spatial transformations on the Andaman and Nicobar Islands using the `sf` package in R. It showcases four techniques: base mapping, northward shifting, scaling (enlargement), and rotation (90° clockwise), highlighting their effects on spatial geometries. The `facet_wrap` function neatly organizes the transformations for comparison, while `geom_sf` and custom labels enhance the visualization.

This plot demonstrates the application of spatial transformations on the Andaman and Nicobar Islands using the `sf` package in R. It showcases four techniques: base mapping, northward shifting, scaling (enlargement), and rotation (90° clockwise), highlighting their effects on spatial geometries. The `facet_wrap` function neatly organizes the transformations for comparison, while `geom_sf` and custom labels enhance the visualization.

5.2.5 Clipping

  • Definition: A form of spatial subsetting that modifies the geometry column of affected features. Applies to lines, polygons, and their multi equivalents (not points).

  • Purpose: Identifies or extracts areas of overlap or subsets of spatial features. Commonly used in geographic data analysis to focus on regions of interest.

  • Logical Operations and Spatial Equivalents. Inspired by Figure 12.1 of R for Data Science (2e). Spatial equivalents to logical operators (e.g., AND, OR, NOT) allow flexible geometry subsetting (as shown in Figure 4)

    • Intersection (AND): st_intersection()

    • Union (OR): st_union()

    • Difference (NOT): st_difference()

    • Exclusive OR (XOR): st_sym_difference()

  • Applications:

    • Identifying overlapping regions.

    • Creating subsets of spatial data for specific analysis or visualization.

Code
sysfonts::font_add_google("Nova Mono", "body_font")
showtext::showtext_auto()

theme_custom <- function(...) {
  ggthemes::theme_map(
    base_size = 20,
    base_family = "body_font"
    ) +
    labs(
      x = NULL, y = NULL
    ) +
    theme(
      plot.title = element_text(
        hjust = 0.5,
        lineheight = 0.3,
        margin = margin(5,0,2,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}
# Define the center points of the circles
sf_circles <- tibble(
  x = c(0, 2, 4, 1, 3),
  y = c(0, 0, 0, -1, -1),
  colour_var = c("blue", "black", "red", "yellow", "green"),
  label_var = LETTERS[1:5]
  ) |> 
  # Convert the points to an sf object
  st_as_sf(
  coords = c("x", "y"), 
  crs = NA
  ) |> 
  # Create circular geometries using st_buffer
  mutate(geometry = st_buffer(geometry, dist = 1))

g1 <- sf_circles |> 
  ggplot(
    mapping = aes(
      label = label_var
      )
    ) +
  geom_sf(
    fill = "transparent",
    linewidth = 0.5,
    colour = "grey10"
  ) +
  geom_sf_text(
    colour = "grey10",
    fontface = "bold",
    size = 16,
    family = "body_font"
  ) +
  labs(
    title = "5 overlapping circles plotted with {sf}"
  ) + 
  theme_custom()

# Naming the individual circles

pull_a_circle <- function(ch_pick){
  sf_circles |> 
    filter(label_var == ch_pick)
}
a1 <- pull_a_circle("A")  
b1 <- pull_a_circle("B")  
c1 <- pull_a_circle("C")  
d1 <- pull_a_circle("D")  
e1 <- pull_a_circle("E")  

g2 <- g1 +
  geom_sf(
    data = a1 |> st_difference(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("A |> st_difference(D)")

g3 <- g1 +
  geom_sf(
    data = d1 |> st_difference(a1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("D |> st_difference(A)")

g4 <- g1 +
  geom_sf(
    data = d1 |> st_difference(a1) |> st_difference(b1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("D |> st_difference(A) |>\nst_difference(B)")

g5 <- g1 +
  geom_sf(
    data = a1 |> st_union(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_union(A, D)")

g6 <- g1 +
  geom_sf(
    data = a1 |> st_intersection(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_intersection(A, D)")

g7 <- g1 +
  geom_sf(
    data = st_sym_difference(a1, d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_sym_difference(A, D)")

non_overlap <- a1 |> 
            st_sym_difference(d1) |> 
            st_sym_difference(b1) |> 
            st_sym_difference(c1) |> 
            st_sym_difference(e1)
g8 <- g1 +
  geom_sf(
    data = non_overlap,
    fill = alpha("grey", 0.7)
  ) +
  labs(title = "An st_sym_difference() chain")

overlap <- a1 |> 
  st_intersection(d1) |> 
  st_union(st_intersection(d1, b1)) |> 
  st_union(st_intersection(b1, e1)) |> 
  st_union(st_intersection(e1, c1))

g9 <- g1 +
  geom_sf(
    data = overlap,
    fill = alpha("grey", 0.7)
  ) +
  labs(title = "A st_union() and\nst_interaction() chain")

custom_layout_design <- "
  AAAAAA
  AAAAAA
  BBCCDD
  BBCCDD
  EEFFGG
  EEFFGG
  HHHIII
  HHHIII
"

g <- patchwork::wrap_plots(
  g1, g2, g3, g4, g5, g6, g7, g8, g9
  ) + 
  patchwork::plot_layout(
    design = custom_layout_design
  ) +
  patchwork::plot_annotation(
    title = "Clipping {sf} objects\n(Boolean Algebra examples)",
    subtitle = "Using functions like st_intersection(), st_union(), st_difference()\n& st_sym_difference()",
    theme = theme(
      plot.title = element_text(
        family = "body_font",
        size = 54, 
        lineheight = 0.3,
        hjust = 0.5,
        face = "bold"
      ),
      plot.subtitle = element_text(
        family = "body_font",
        size = 30, 
        lineheight = 0.3,
        hjust = 0.5
      )
    )
  )


ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-5_1.png"),
  plot = g,
  height = 2000,
  width = 1600,
  units = "px",
  bg = "white"
)
Figure 4: Various methods and examples (inspired by Boolean Algebra) for clipping {sf} objects in R

5.2.6 Sub-setting and Clipping

  • Clipping: Modifies geometry to match a subsetting object. Subsetting: Selects features that intersect or partly intersect with a clipping object. An example: Points randomly distributed within the bounding box of the five concentric circles. Some points are inside one circle, some inside two circles, or neither. Then, we subset points intersecting with one, two or no circles.
  • Key Functions:
    1. st_sample(): Generates random points within a geometry.
    2. Clipping and Subsetting Approaches:
      • Way #1: Use the intersection of x and y (x_and_y) as a direct subsetting object: p[x_and_y]
      • Way #2: Find the intersection between points (p) and x_and_y, modifying overlapping geometries: st_intersection(p, x_and_y) , or, using st_interesects() when working in a pipe (|>) chain.
      • Way #3: Use st_intersects() to determine logical overlap between p and the subsetting objects: sel_p_xy = st_intersects(p, x, sparse = FALSE)[, 1] & st_intersects(p, y, sparse = FALSE)[, 1] and then subset, using p_xy3 = p[sel_p_xy]
  • Preferred Implementation:
    • Way #2 (concise and efficient) and it is the tidyverse approach with |> compatibility. Example shown in Figure 5
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

theme_custom <- function(...) {
  ggthemes::theme_map(
    base_size = 20,
    base_family = "body_font"
    ) +
    labs(
      x = NULL, y = NULL
    ) +
    theme(
      plot.title = element_text(
        hjust = 0.5,
        lineheight = 0.3,
        margin = margin(5,0,2,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}
# Define the center points of the circles
sf_circles <- tibble(
  x = c(0, 2, 4, 1, 3),
  y = c(0, 0, 0, -1, -1),
  colour_var = c("blue", "black", "red", "yellow", "green"),
  label_var = LETTERS[1:5]
  ) |> 
  # Convert the points to an sf object
  st_as_sf(
  coords = c("x", "y"), 
  crs = NA
  ) |> 
  # Create circular geometries using st_buffer
  mutate(geometry = st_buffer(geometry, dist = 1))

# Naming the individual circles
pull_a_circle <- function(ch_pick){
  sf_circles |> 
    filter(label_var == ch_pick)
}
a1 <- pull_a_circle("A")  
b1 <- pull_a_circle("B")  
c1 <- pull_a_circle("C")  
d1 <- pull_a_circle("D")  
e1 <- pull_a_circle("E")  

one_circle <- a1 |> 
            st_sym_difference(d1) |> 
            st_sym_difference(b1) |> 
            st_sym_difference(c1) |> 
            st_sym_difference(e1)
overlap <- a1 |> 
  st_intersection(d1) |> 
  st_union(st_intersection(d1, b1)) |> 
  st_union(st_intersection(b1, e1)) |> 
  st_union(st_intersection(e1, c1))
rm(a1, b1, c1, d1, e1)

set.seed(42)

random_points <- sf_circles |> 
  # Get a bounding box
  st_bbox() |> 
  # Covert it into a polygon
  st_as_sfc() |> 
  
  # Get a sample of points within this polygon
  st_sample(size = 100) |> 
  
  # Convert into a sf object
  st_as_sf() |> 
  
  # Add identifiers for where the points fall
  mutate(
    colour_var = case_when(
      st_intersects(x, overlap, sparse = F) ~ "Two Circles",
      st_intersects(x, one_circle, sparse = F) ~ "One Circle",
      .default = "Outside"
    ),
    colour_var = fct(
      colour_var,
      levels = c(
        "Outside",
        "One Circle",
        "Two Circles"
      )
    )
  )

g <- ggplot() +
  geom_sf(
    data = sf_circles, 
    fill = "transparent",
    linewidth = 0.2,
    colour = "grey10"
    ) +
  geom_sf(
    data = random_points,
    mapping = aes(
      geometry = x,
      colour = colour_var
    ),
    alpha = 0.75,
    size = 0.7,
    stroke = 0.1
  ) +
  labs(
    title = "Clipping and Subsetting",
    subtitle = "Subsetting random points into those that overlap none, one or two circles",
    colour = "Point lies within"
  ) +
  paletteer::scale_colour_paletteer_d("khroma::highcontrast") +
  ggthemes::theme_map(
    base_family = "body_font",
    base_size = 16
  ) +
  theme(
    plot.title = element_text(
      size = 24,
      hjust = 0.5,
      margin = margin(0,0,0,0, "mm")
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.3,
      margin = margin(0,0,0,0, "mm")
    ),
    legend.position = "inside",
    legend.position.inside = c(0.5, 0),
    legend.justification = c(0.5, 1),
    legend.direction = "horizontal",
    legend.text = element_text(
      margin = margin(0,0,0,0, "mm")
    ),
    legend.title = element_text(
      margin = margin(0,0,0,0, "mm"),
      hjust = 0.5
    ),
    legend.margin = margin(0,0,0,0, "mm"),
    legend.key.size = unit(5, "pt"),
    legend.title.position = "top"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-6_1.png"),
  plot = g,
  height = 500,
  width = 800,
  units = "px",
  bg = "white"
)
Figure 5: Sub-setting points with clipped {sf} objects - an example

5.2.7 Geometry Unions

  • Spatial Aggregation and Geometry Dissolution:
    • Spatial aggregation dissolves boundaries of touching polygons within the same group automatically. Example: Aggregating 22 districts of the State of Haryana into an overall boundary map with aggregate() (base R approach) or summarize() (tidyverse approach), for example in Figure 6
    • Geometric Operation Behind the Scenes: Functions aggregate() and summarize() internally call st_union() from the sf package to dissolve boundaries and merge geometries.
  • Union of Geometries: Visualization Insight example is shown below.
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

haryana1 <- read_sf(here::here(
  "data", "haryana_map",
  "HARYANA_DISTRICT_BDY.shp"
))

g1 <- haryana1 |>
  ggplot() +
  geom_sf(
    aes(fill = District)
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Districts of Haryana State (India)",
    subtitle = "The raw map, showing each district."
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    ),
    legend.position = "none"
  )

g2 <- haryana1 |>
  st_union() |> 
  ggplot() +
  geom_sf() +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Boundary of Haryana State (India)",
    subtitle = "Generated using sf::st_union()"
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    ),
    legend.position = "none"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-7_2.png"),
  plot = (g1 + g2),
  height = 400,
  width = 600,
  units = "px",
  bg = "white"
)
Figure 6: A district map of Haryana, converted into an outer boundary map using st_union()
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

haryana <- read_sf(
  here::here(
    "data", "haryana_map",
    "HARYANA_SUBDISTRICT_BDY.shp"
  )
) |> 
  janitor::clean_names() |> 
  st_simplify(dTolerance = 100) |> 
  mutate(
    district = str_replace_all(district, ">", "A"),
    district = str_replace_all(district, "\\|", "I"),
    district = str_to_title(district),
    tehsil = str_to_title(str_replace_all(tehsil, ">", "A"))
  )

g1 <- ggplot() +
  geom_sf(data = haryana, linewidth = 0.1) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Tehsils of Haryana State (India)",
    subtitle = "The raw map, showing each tehsil /\nsubdivision of Haryana."
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

g2 <- haryana |> 
  group_by(district) |> 
  summarise() |> 
  ggplot() +
  geom_sf(
    linewidth = 0.1
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Districts of Haryana State (India)",
    subtitle = "Aggregating tehsils with group_by(..) |> summarise()\nwhich uses st_union() at the backend."
  ) +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

g3 <- haryana |> 
  group_by(district) |> 
  summarise() |> 
  st_cast("LINESTRING") |> 
  ggplot() +
  geom_sf(
    linewidth = 0.1
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Casting geometry into LINESTRING",
    subtitle = "group_by(district) |> summarise() |> st_cast(\"LINESTRING\").\nLoss of some geometries, but internal slivers are removed."
  ) +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-7_1.png"),
  plot = (g1 + g2 + g3),
  height = 500,
  width = 900,
  units = "px",
  bg = "white"
)
Note

When using the st_union() function in spatial analysis, “inside small lines” or “slivers” appearing in the resulting combined geometry often indicates slight discrepancies in the input geometries, particularly at boundary points, causing the union operation to create small, extra line segments where the geometries nearly overlap but don’t perfectly align, as shown in Figure 7

Figure 7: A sub-district / Tehsil map of Haryana, when converted into a district map with summarise() [which basically uses st_union() only] produces an imperfect result with internal slivers, as the boundary nodes are not overlapping amongst the sub-districts / tehsils. This depicts an inherent (limitation? / caution?) of sf::st_union().

References

Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in r.” https://doi.org/10.1201/9780429459016.
Strimas-Mackey, Matthew. 2023. “Smoothr: Smooth and Tidy Spatial Features.” https://CRAN.R-project.org/package=smoothr.
Teucher, Andy, and Kenton Russell. 2023. “Rmapshaper: Client for ’Mapshaper’ for ’Geospatial’ Operations.” https://CRAN.R-project.org/package=rmapshaper.