Chapter 5: Geometry operations

Key Learnings from, and Solutions to the exercises in Chapter 4 of the book Geocomputation with R by Robin Lovelace, Jakub Nowosad and Jannes Muenchow.

Geocomputation with R
Textbook Solutions
Author

Aditya Dahiya

Published

December 16, 2024

Code
library(sf)        # Simple Features in R
library(terra)     # Handling rasters in R
library(tidyterra) # For plotting rasters in ggplot2
library(magrittr)  # Using pipes with raster objects
library(tidyverse) # All things tidy; Data Wrangling
library(spData)    # Spatial Datasets
library(patchwork) # Composing plots

5.1 Introduction

  • Previous chapters introduced geographic datasets’ structure (Chapter 2), attribute-based manipulation (Chapter 3), and spatial relations (Chapter 4).
  • Focus of this chapter: Manipulating geographic elements of spatial objects.
    • Examples: Creating buffers, simplifying/converting vector geometries, and raster aggregation/resampling.
  • Section 5.2: Transforming vector geometries using:
    • Unary operations: Simplifications, buffers, centroids, and affine transformations (Sections 5.2.1–5.2.4).
    • Binary operations: Modifying geometries through clipping and unions (Sections 5.2.5–5.2.7).
    • Type transformations: Converting geometry types, e.g., polygons to lines (Section 5.2.8).
  • Section 5.3: Raster transformations:
    • Alter pixel size, resolution, extent, and origin.
    • Align raster datasets for map algebra.

5.2 Geometric operations on vector data

  • Focus: Operations that modify the geometry of vector (sf) objects.
  • Key distinction: Works directly on geometry-level objects of class sfc, in addition to sf objects.
  • Examples: Drilling into geometry to transform, simplify, or reshape vector data.

5.2.1 Simplification

  • Generalizes vector geometries (lines/polygons) for smaller scale maps, reducing memory, disk space, and bandwidth usage. Useful for publishing interactive maps by simplifying complex geometries.
  • Key Functions and Algorithms:
    • st_simplify() from the sf package (Pebesma and Bivand 2023):
      • Implements the Douglas-Peucker algorithm (Douglas and Peucker 1973).

      • Controlled by dTolerance (generalization level in metres, or map units).

      • Simplifies individual geometries but does not preserve topology, leading to overlaps or holes.

Note

topology (noun): the way in which parts of something are organized, arranged or connected

  • ms_simplify() from the rmapshaper package (Teucher and Russell 2023):
    • Uses the Visvalingam algorithm (Visvalingam and Whyatt 1993).
    • Retains topology by default (keep_shapes = TRUE) and allows fine control over the vertex retention (keep: the % of vertices that are to be retained, given as a proportion).
  • smooth() from the smoothr package:
    • Smooths edges using techniques like Gaussian kernel regression, Chaikin’s algorithm, or spline interpolation.
    • Does not reduce vertex count and does not preserve topology.
    • Key parameter: smoothness (controls Gaussian bandwidth).
  • Examples of Simplification are shown in Figure 1
  • Applications of Smoothing:
    • Suitable for geometries derived from raster vectorization (e.g., Chapter 6).
Code
# Download India's Official states' map from
# https://github.com/Aditya-Dahiya/projects_presentations/tree/main/data/india_map

#### Base Map
india_states <- read_sf("India_State_Boundary.shp")

g <- india_states |> 
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "Base Map of India: in full detail",
    subtitle = "Source: Census of India"
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_1.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### st_simplify()
g <- india_states |> 
  st_simplify(dTolerance = 100000) |>    # 100 km tolerance
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "India: st_simplify(dTolerance = 100000)",
    subtitle = "Douglas-Peucker algorithm. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_2.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### rmapshaper::ms_simplify()
g <- india_states |> 
  rmapshaper::ms_simplify(keep = 0.0001, keep_shapes = TRUE) |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "India: rmapshaper::ms_simplify(keep_shapes = TRUE)",
    subtitle = "Visvalingam algorithm. Topology is retained."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_3.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

#### smoothr::smooth() - 3 methods
g <- india_states |> 
  st_simplify(dTolerance = 10000) |>  # To save computing time
  smoothr::smooth(method = "ksmooth",
                  smoothness = 5) |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"ksmooth\")",
    subtitle = "Gaussian kernel regression. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_4.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

g <- india_states |> 
  st_simplify(dTolerance = 50000) |>  # To save computing time
  smoothr::smooth(method = "chaikin") |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"chaikin\")",
    subtitle = "Chaikin’s corner cutting algorithm. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_5.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

g <- india_states |> 
  st_simplify(dTolerance = 50000) |>  # To save computing time
  smoothr::smooth(method = "spline") |>    
  ggplot() +
  geom_sf() +
  theme_minimal() +
  labs(
    title = "smoothr::smooth(method = \"spline\")",
    subtitle = "Spline interpolation. Topology is lost."
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_6.png"),
  plot = g,
  height = 1600,
  width = 1200,
  units = "px"
)

Official Map of India (full details)

Using st_simplify(dTolerance = 100000), i.e. 100 km resolution

Using rmapshaper::ms_simplify(keep_shape = TRUE) to retain topology

Using gaussian kernel regression with smoothr::smooth(method = “ksmooth”)

Using Chalkin’s corner cutting algorithm with smoothr::smooth(method = “chalkin”)

Using spline interpolation with smoothr::smooth(method = “spline”)
Figure 1

smoothr

A short note on the {smoothr} package (Strimas-Mackey 2023), which uses three different types of algorithms:-.

Note
Code
library(sf)
library(smoothr)
library(tidyverse)

# Smooth polygons using different methods
p_smooth_chaikin <- smooth(jagged_polygons, method = "chaikin")
p_smooth_ksmooth <- smooth(jagged_polygons, method = "ksmooth")
p_smooth_spline <- smooth(jagged_polygons, method = "spline")

# Combine data for plotting
plot_data <- bind_rows(
  mutate(st_as_sf(p_smooth_chaikin), method = "chaikin"),
  mutate(st_as_sf(p_smooth_ksmooth), method = "ksmooth"),
  mutate(st_as_sf(p_smooth_spline), method = "spline"),
  mutate(st_as_sf(jagged_polygons), method = "original")
)

# Assign colors to methods
method_colors <- c(
  chaikin = "#E41A1C",
  ksmooth = "#4DAF4A",
  spline = "#377EB8"
)

# Convert geometry for plotting
plot_data <- plot_data |> 
  mutate(geometry = st_sfc(geometry)) |> 
  st_as_sf()

p2 <- plot_data |> 
  filter(method != "original")

p1 <- plot_data |> 
  filter(method == "original") |> 
  select(-method)

# Plot with ggplot2
g <- ggplot(data = p2) +
  geom_sf(aes(geometry = geometry, 
              color = method),
          size = 0.7,
          linewidth = 0.5,
          fill = "transparent") +
  geom_sf(
    data = p1,
    fill = alpha("grey50", 0.5),
    colour = "transparent"
  ) +
  scale_color_manual(values = method_colors) +
  facet_grid(id ~ method) +
  guides(fill = "none") +
  theme_void() +
  theme(
    legend.position = "none",
    plot.title.position = "plot",
    strip.text.y = element_blank()
    ) +
  labs(
    title = "Simplification with {smoothr}",
    colour = "Method"
  )

ggsave(
  filename = here::here("book_solutions", "images",
                        "ch5-2-1_7.png"),
  plot = g,
  height = 2000,
  width = 1200,
  units = "px"
)

5.2.2 Centroids

  • Identify the center of geographic objects, creating single-point representations of complex geometries using st_centroid()

  • Types of Centroids: (shown in Figure 2)

    1. Geographic Centroid (center of mass):
      • Balances a spatial object (like balancing a plate).
      • Useful for creating simple point representations or estimating distances between polygons.
      • Calculated with st_centroid() from the sf package.
      • Limitation: Centroids may fall outside the object (e.g., doughnut-shaped polygons).
    2. Point on Surface:
      • Ensures the point lies within the object boundaries.
      • Useful for labeling irregular polygons, such as islands or multipolygon geometries.
      • Calculated with st_point_on_surface().
  • Other Centroid Types: Chebyshev center and visual center

Code
sysfonts::font_add_google("Saira Extra Condensed", "caption_font")
showtext::showtext_auto()

ggplot2::theme_set(
  theme_minimal(
    base_size = 30,
    base_family = "caption_font"
  ) +
    theme(
      text = element_text(
        lineheight = 0.3,
        hjust = 0.5
      ),
      plot.title.position = "plot"
    )
)

# Focussing on the Island Chains of India
andaman <- india_states |> 
  filter(
    State_Name == "Andaman & Nicobar"
  )
  
g1 <- ggplot(andaman) +
  geom_sf() +
  labs(
    title = "Base Map",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g2 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_centroid(andaman),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2
  ) +
  labs(
    title = "st_centroid()",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g3 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_centroid(andaman, of_largest_polygon = TRUE),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2,
    fill = "transparent"
  ) +
  labs(
    title = "st_centroid\n(of_largest_polygon = TRUE)",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

g4 <- ggplot() +
  geom_sf(
    data = andaman
    ) +
  geom_sf(
    data = st_point_on_surface(andaman),
    colour = "red",
    size = 4, 
    pch = 1,
    stroke = 2,
    fill = "transparent"
  ) +
  labs(
    title = "st_point_on_surface()",
    subtitle = "Andaman & Nicobar\nIslands (India)"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-2_1.png"),
  plot = patchwork::wrap_plots(g1, g2, g3, g4, nrow = 1),
  height = 1900,
  width = 2400,
  units = "px"
)
Figure 2: Various centroids using st_centroid() and st_point_on_surface()

5.2.3 Buffers

  • Buffers are polygons representing areas within a specified distance from a geometric feature (point, line, or polygon).
  • Purpose: Used for geographic data analysis (not just visualization). Examples:
    • How many points are within a given distance of a line?
    • Which demographic groups are within travel range of a new shop?
  • st_buffer() from the sf package. Example Visualization is shown in Figure 3
    • Input: Geometry and dist (distance in CRS units, e.g., meters).
    • Output: One buffer polygon per geometry feature.
  • Other Key Arguments in st_buffer():
    • nQuadSegs (GEOS engine):
      • Number of segments per quadrant (default: 30).
      • Adjust: Decrease for memory concerns, or increase for high resolution output.
    • max_cells (S2 engine):
      • Higher values create smoother buffers (slower computation).
    • endCapStyle and joinStyle (GEOS engine):
      • Control buffer edge appearance (useful for lines).
    • singleSide (GEOS engine):
      • Buffer on one or both sides of the geometry.
Code
a1 <- andaman |> 
  st_cast("POLYGON")

a2 <- st_buffer(a1, dist = 20000) |> 
  mutate(id = as_factor(row_number()))

a3 <- st_buffer(a1, dist = 20000, nQuadSegs = 0.5) |> 
  mutate(id = as_factor(row_number()))

g1 <- ggplot() +
  geom_sf(data = a1) +
  labs(
    title = "Base Map with\nst_cast(\"POLYGON\")",
    subtitle = "Nicobar Islands"
  ) +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )

g2 <- ggplot() +
  geom_sf(
    data = a2,
    mapping = aes(fill = id),
    alpha = 0.2,
    colour = "transparent"
  ) +
  geom_sf(
    data = a1
  ) +
  labs(
    title = "With 20 km buffer\n around each island",
    subtitle = "Nicobar Islands; each\nbuffer in separate colour"
  ) +
  theme(legend.position = "none") +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )


g3 <- ggplot() +
  geom_sf(
    data = a3,
    mapping = aes(fill = id),
    alpha = 0.2,
    colour = "transparent"
  ) +
  geom_sf(
    data = a1
  ) +
  labs(
    title = "With 20 km buffer\n around each island",
    subtitle = "Nicobar Islands;\narugment (nQuadSegs = 1)"
  ) +
  theme(legend.position = "none") +
  coord_sf(
    ylim = c(6.5, 9.5),
    default_crs = 4326
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-3_1.png"),
  plot = patchwork::wrap_plots(g1, g2, g3, nrow = 1),
  height = 1200,
  width = 2000,
  units = "px"
)
Figure 3: Use of st_buffer()

5.2.4 Affine Transformations

  • Definition: Transformations that preserve lines and parallelism but not necessarily angles or lengths.
  • Types of Affine Transformations:
    1. Shifting (Translation): Moves every point by a constant distance.

      • Example: Adding a vector to shift all y-coordinates north by 400 km distance while keeping x-coordinates unchanged using:

        n_shift <- n1 |> 
          add(c(0, 400000)) |> 
          st_set_crs(st_crs(n1))
      • Note: This converts the CRS of the new sfc object to NA and thus needs st_set_crs() to return it back to the original CRS.

    2. Scaling: Enlarges or shrinks geometries.

      • Global Scaling:
        • Multiplies all coordinates relative to the origin, preserving topological relations.
      • Local Scaling:
        • Scales geometries around specific points (e.g., centroids).

        • Steps:

          1. Shift geometries so the centroid becomes (0,0).
          2. Scale by a factor.
          3. Shift back to original centroid coordinates.
    3. Rotation: Rotates coordinates using a rotation matrix.

      • Rotation matrix: Define a function to create the rotation matrix and apply it to the geometry. R=[cosθ/sinθ​ − sinθ/cosθ​]
  • Replacing Old Geometry: Once the affine transformation has been completed, use st_set_geometry() from the sf package to finally Replace original geometry with scaled versions (shifted, rotated or scaled)
  • Applications:
    • Shifting: For label placement.
    • Scaling: In non-contiguous cartograms.
    • Rotation: Correcting distorted geometries during re-projection.
Code
sysfonts::font_add_google("Saira Extra Condensed", "caption_font")
showtext::showtext_auto()

ggplot2::theme_set(
  theme_minimal(
    base_size = 30,
    base_family = "caption_font"
  ) +
    theme(
      text = element_text(
        lineheight = 0.3,
        hjust = 0.5
      ),
      plot.title.position = "plot",
      plot.title = element_text(hjust = 1),
      plot.subtitle = element_text(hjust = 1),
      panel.grid = element_line(
        linewidth = 0.2
      )
    )
)

df1 <- andaman |> 
  st_cast("POLYGON") |> 
  mutate(id = row_number()) |> 
  filter(id < 10) |> 
  mutate(
    name = case_when(
      id %in% c(4,8, 9, 7) ~ "Nicobar Islands",
      .default = "Andaman Islands"
    )
  )


# Pull out only sfc class (i.e. geometry for Andaman Islands)
a1 <- df1 |> 
  filter(name == "Andaman Islands") |> 
  st_geometry()

# Pull out only sfc class (i.e. geometry for Nicobar Islands)
n1 <- df1 |> 
  filter(name == "Nicobar Islands") |> 
  st_geometry()


g1 <- df1 |> 
  ggplot(aes(fill = name)) +
  geom_sf(colour = "transparent") +
  geom_sf_text(aes(label = id)) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) +
  labs(
    title = "Base Map",
    subtitle = "10 Largest Islands amongst\nAndamand and Nicobar Island chain",
    fill = NULL, x = NULL, y = NULL
  ) +
  scale_fill_manual(values = c("#89973DFF", "#E8B92FFF")) +
  theme(
    legend.position = "left"
  )

g2 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(data = n1, fill = "#E8B92FFF", colour = "transparent") +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Plotting as separate\nsfc objects",
    subtitle = "10 Largest Islands"
  )

#################### Shifting #########################

n_shift <- n1 |> 
  add(c(0, 400000)) |> 
  st_set_crs(st_crs(n1))

g3 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n_shift, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) +
  labs(
    title = "Shifting sfc objects",
    subtitle = "Bring Nicobar Islands\ncloser to the Andamans"
  )

#################### Scaling ##########################

n1_centroid <- st_centroid(n1)

n1_scale <- (n1 - n1_centroid) |> 
  multiply_by(2.5) |> 
  add(n1_centroid) |> 
  st_set_crs(st_crs(n1))

g4 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n1_scale, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Scaling sfc objects",
    subtitle = "Enlarging Nicobar Islands\nby 2.5 times."
  )

##################### Rotation ########################
rotation = function(a){
  r = a * pi / 180 #degrees to radians
  matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
} 

n1_rotate <- (n1 - n1_centroid) |> 
  multiply_by(rotation(90)) |> 
  add(n1_centroid) |> 
  st_set_crs(st_crs(n1))


g5 <- ggplot() +
  geom_sf(data = a1, fill = "#89973DFF", colour = "transparent") +
  geom_sf(
    data = n1_rotate, 
    fill = "#E8B92FFF", 
    colour = "transparent"
    ) +
  coord_sf(
    ylim = c(7, 13.5),
    default_crs = 4326
  ) + 
  labs(
    title = "Rotating sfc objects",
    subtitle = "Rotating Nicobar Islands\nclockwise by 90 degrees"
  )

g <- patchwork::wrap_plots(g1, g3, g4, g5) +
  patchwork::plot_layout(widths = c(1,1,1,1,1), nrow = 1)

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-4_1.png"),
  plot = g,
  height = 1200,
  width = 2500,
  units = "px"
)

Data Viz demonstration

Here’s a more visually appealing version of the same graphic, produced using complete code given on this page.

This plot demonstrates the application of spatial transformations on the Andaman and Nicobar Islands using the `sf` package in R. It showcases four techniques: base mapping, northward shifting, scaling (enlargement), and rotation (90° clockwise), highlighting their effects on spatial geometries. The `facet_wrap` function neatly organizes the transformations for comparison, while `geom_sf` and custom labels enhance the visualization.

This plot demonstrates the application of spatial transformations on the Andaman and Nicobar Islands using the `sf` package in R. It showcases four techniques: base mapping, northward shifting, scaling (enlargement), and rotation (90° clockwise), highlighting their effects on spatial geometries. The `facet_wrap` function neatly organizes the transformations for comparison, while `geom_sf` and custom labels enhance the visualization.

5.2.5 Clipping

  • Definition: A form of spatial subsetting that modifies the geometry column of affected features. Applies to lines, polygons, and their multi equivalents (not points).

  • Purpose: Identifies or extracts areas of overlap or subsets of spatial features. Commonly used in geographic data analysis to focus on regions of interest.

  • Logical Operations and Spatial Equivalents. Inspired by Figure 12.1 of R for Data Science (2e). Spatial equivalents to logical operators (e.g., AND, OR, NOT) allow flexible geometry subsetting (as shown in Figure 4)

    • Intersection (AND): st_intersection()

    • Union (OR): st_union()

    • Difference (NOT): st_difference()

    • Exclusive OR (XOR): st_sym_difference()

  • Applications:

    • Identifying overlapping regions.

    • Creating subsets of spatial data for specific analysis or visualization.

Code
sysfonts::font_add_google("Nova Mono", "body_font")
showtext::showtext_auto()

theme_custom <- function(...) {
  ggthemes::theme_map(
    base_size = 20,
    base_family = "body_font"
    ) +
    labs(
      x = NULL, y = NULL
    ) +
    theme(
      plot.title = element_text(
        hjust = 0.5,
        lineheight = 0.3,
        margin = margin(5,0,2,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}
# Define the center points of the circles
sf_circles <- tibble(
  x = c(0, 2, 4, 1, 3),
  y = c(0, 0, 0, -1, -1),
  colour_var = c("blue", "black", "red", "yellow", "green"),
  label_var = LETTERS[1:5]
  ) |> 
  # Convert the points to an sf object
  st_as_sf(
  coords = c("x", "y"), 
  crs = NA
  ) |> 
  # Create circular geometries using st_buffer
  mutate(geometry = st_buffer(geometry, dist = 1))

g1 <- sf_circles |> 
  ggplot(
    mapping = aes(
      label = label_var
      )
    ) +
  geom_sf(
    fill = "transparent",
    linewidth = 0.5,
    colour = "grey10"
  ) +
  geom_sf_text(
    colour = "grey10",
    fontface = "bold",
    size = 16,
    family = "body_font"
  ) +
  labs(
    title = "5 overlapping circles plotted with {sf}"
  ) + 
  theme_custom()

# Naming the individual circles

pull_a_circle <- function(ch_pick){
  sf_circles |> 
    filter(label_var == ch_pick)
}
a1 <- pull_a_circle("A")  
b1 <- pull_a_circle("B")  
c1 <- pull_a_circle("C")  
d1 <- pull_a_circle("D")  
e1 <- pull_a_circle("E")  

g2 <- g1 +
  geom_sf(
    data = a1 |> st_difference(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("A |> st_difference(D)")

g3 <- g1 +
  geom_sf(
    data = d1 |> st_difference(a1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("D |> st_difference(A)")

g4 <- g1 +
  geom_sf(
    data = d1 |> st_difference(a1) |> st_difference(b1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("D |> st_difference(A) |>\nst_difference(B)")

g5 <- g1 +
  geom_sf(
    data = a1 |> st_union(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_union(A, D)")

g6 <- g1 +
  geom_sf(
    data = a1 |> st_intersection(d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_intersection(A, D)")

g7 <- g1 +
  geom_sf(
    data = st_sym_difference(a1, d1),
    fill = alpha("grey", 0.7)
  ) +
  ggtitle("st_sym_difference(A, D)")

non_overlap <- a1 |> 
            st_sym_difference(d1) |> 
            st_sym_difference(b1) |> 
            st_sym_difference(c1) |> 
            st_sym_difference(e1)
g8 <- g1 +
  geom_sf(
    data = non_overlap,
    fill = alpha("grey", 0.7)
  ) +
  labs(title = "An st_sym_difference() chain")

overlap <- a1 |> 
  st_intersection(d1) |> 
  st_union(st_intersection(d1, b1)) |> 
  st_union(st_intersection(b1, e1)) |> 
  st_union(st_intersection(e1, c1))

g9 <- g1 +
  geom_sf(
    data = overlap,
    fill = alpha("grey", 0.7)
  ) +
  labs(title = "A st_union() and\nst_interaction() chain")

custom_layout_design <- "
  AAAAAA
  AAAAAA
  BBCCDD
  BBCCDD
  EEFFGG
  EEFFGG
  HHHIII
  HHHIII
"

g <- patchwork::wrap_plots(
  g1, g2, g3, g4, g5, g6, g7, g8, g9
  ) + 
  patchwork::plot_layout(
    design = custom_layout_design
  ) +
  patchwork::plot_annotation(
    title = "Clipping {sf} objects\n(Boolean Algebra examples)",
    subtitle = "Using functions like st_intersection(), st_union(), st_difference()\n& st_sym_difference()",
    theme = theme(
      plot.title = element_text(
        family = "body_font",
        size = 54, 
        lineheight = 0.3,
        hjust = 0.5,
        face = "bold"
      ),
      plot.subtitle = element_text(
        family = "body_font",
        size = 30, 
        lineheight = 0.3,
        hjust = 0.5
      )
    )
  )


ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-5_1.png"),
  plot = g,
  height = 2000,
  width = 1600,
  units = "px",
  bg = "white"
)
Figure 4: Various methods and examples (inspired by Boolean Algebra) for clipping {sf} objects in R

5.2.6 Sub-setting and Clipping

  • Clipping: Modifies geometry to match a subsetting object. Subsetting: Selects features that intersect or partly intersect with a clipping object. An example: Points randomly distributed within the bounding box of the five concentric circles. Some points are inside one circle, some inside two circles, or neither. Then, we subset points intersecting with one, two or no circles.
  • Key Functions:
    1. st_sample(): Generates random points within a geometry.
    2. Clipping and Subsetting Approaches:
      • Way #1: Use the intersection of x and y (x_and_y) as a direct subsetting object: p[x_and_y]
      • Way #2: Find the intersection between points (p) and x_and_y, modifying overlapping geometries: st_intersection(p, x_and_y) , or, using st_interesects() when working in a pipe (|>) chain.
      • Way #3: Use st_intersects() to determine logical overlap between p and the subsetting objects: sel_p_xy = st_intersects(p, x, sparse = FALSE)[, 1] & st_intersects(p, y, sparse = FALSE)[, 1] and then subset, using p_xy3 = p[sel_p_xy]
  • Preferred Implementation:
    • Way #2 (concise and efficient) and it is the tidyverse approach with |> compatibility. Example shown in Figure 5
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

theme_custom <- function(...) {
  ggthemes::theme_map(
    base_size = 20,
    base_family = "body_font"
    ) +
    labs(
      x = NULL, y = NULL
    ) +
    theme(
      plot.title = element_text(
        hjust = 0.5,
        lineheight = 0.3,
        margin = margin(5,0,2,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}
# Define the center points of the circles
sf_circles <- tibble(
  x = c(0, 2, 4, 1, 3),
  y = c(0, 0, 0, -1, -1),
  colour_var = c("blue", "black", "red", "yellow", "green"),
  label_var = LETTERS[1:5]
  ) |> 
  # Convert the points to an sf object
  st_as_sf(
  coords = c("x", "y"), 
  crs = NA
  ) |> 
  # Create circular geometries using st_buffer
  mutate(geometry = st_buffer(geometry, dist = 1))

# Naming the individual circles
pull_a_circle <- function(ch_pick){
  sf_circles |> 
    filter(label_var == ch_pick)
}
a1 <- pull_a_circle("A")  
b1 <- pull_a_circle("B")  
c1 <- pull_a_circle("C")  
d1 <- pull_a_circle("D")  
e1 <- pull_a_circle("E")  

one_circle <- a1 |> 
            st_sym_difference(d1) |> 
            st_sym_difference(b1) |> 
            st_sym_difference(c1) |> 
            st_sym_difference(e1)
overlap <- a1 |> 
  st_intersection(d1) |> 
  st_union(st_intersection(d1, b1)) |> 
  st_union(st_intersection(b1, e1)) |> 
  st_union(st_intersection(e1, c1))
rm(a1, b1, c1, d1, e1)

set.seed(42)

random_points <- sf_circles |> 
  # Get a bounding box
  st_bbox() |> 
  # Covert it into a polygon
  st_as_sfc() |> 
  
  # Get a sample of points within this polygon
  st_sample(size = 100) |> 
  
  # Convert into a sf object
  st_as_sf() |> 
  
  # Add identifiers for where the points fall
  mutate(
    colour_var = case_when(
      st_intersects(x, overlap, sparse = F) ~ "Two Circles",
      st_intersects(x, one_circle, sparse = F) ~ "One Circle",
      .default = "Outside"
    ),
    colour_var = fct(
      colour_var,
      levels = c(
        "Outside",
        "One Circle",
        "Two Circles"
      )
    )
  )

g <- ggplot() +
  geom_sf(
    data = sf_circles, 
    fill = "transparent",
    linewidth = 0.2,
    colour = "grey10"
    ) +
  geom_sf(
    data = random_points,
    mapping = aes(
      geometry = x,
      colour = colour_var
    ),
    alpha = 0.75,
    size = 0.7,
    stroke = 0.1
  ) +
  labs(
    title = "Clipping and Subsetting",
    subtitle = "Subsetting random points into those that overlap none, one or two circles",
    colour = "Point lies within"
  ) +
  paletteer::scale_colour_paletteer_d("khroma::highcontrast") +
  ggthemes::theme_map(
    base_family = "body_font",
    base_size = 16
  ) +
  theme(
    plot.title = element_text(
      size = 24,
      hjust = 0.5,
      margin = margin(0,0,0,0, "mm")
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.3,
      margin = margin(0,0,0,0, "mm")
    ),
    legend.position = "inside",
    legend.position.inside = c(0.5, 0),
    legend.justification = c(0.5, 1),
    legend.direction = "horizontal",
    legend.text = element_text(
      margin = margin(0,0,0,0, "mm")
    ),
    legend.title = element_text(
      margin = margin(0,0,0,0, "mm"),
      hjust = 0.5
    ),
    legend.margin = margin(0,0,0,0, "mm"),
    legend.key.size = unit(5, "pt"),
    legend.title.position = "top"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-6_1.png"),
  plot = g,
  height = 500,
  width = 800,
  units = "px",
  bg = "white"
)
Figure 5: Sub-setting points with clipped {sf} objects - an example

5.2.7 Geometry Unions

  • Spatial Aggregation and Geometry Dissolution:
    • Spatial aggregation dissolves boundaries of touching polygons within the same group automatically. Example: Aggregating 22 districts of the State of Haryana into an overall boundary map with aggregate() (base R approach) or summarize() (tidyverse approach), for example in Figure 6
    • Geometric Operation Behind the Scenes: Functions aggregate() and summarize() internally call st_union() from the sf package to dissolve boundaries and merge geometries.
  • Union of Geometries: Visualization Insight example is shown below.
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

haryana1 <- read_sf(here::here(
  "data", "haryana_map",
  "HARYANA_DISTRICT_BDY.shp"
))

g1 <- haryana1 |>
  ggplot() +
  geom_sf(
    aes(fill = District)
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Districts of Haryana State (India)",
    subtitle = "The raw map, showing each district."
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    ),
    legend.position = "none"
  )

g2 <- haryana1 |>
  st_union() |> 
  ggplot() +
  geom_sf() +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Boundary of Haryana State (India)",
    subtitle = "Generated using sf::st_union()"
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    ),
    legend.position = "none"
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-7_2.png"),
  plot = (g1 + g2),
  height = 400,
  width = 600,
  units = "px",
  bg = "white"
)
Figure 6: A district map of Haryana, converted into an outer boundary map using st_union()
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()

haryana <- read_sf(
  here::here(
    "data", "haryana_map",
    "HARYANA_SUBDISTRICT_BDY.shp"
  )
) |> 
  janitor::clean_names() |> 
  st_simplify(dTolerance = 100) |> 
  mutate(
    district = str_replace_all(district, ">", "A"),
    district = str_replace_all(district, "\\|", "I"),
    district = str_to_title(district),
    tehsil = str_to_title(str_replace_all(tehsil, ">", "A"))
  )

g1 <- ggplot() +
  geom_sf(data = haryana, linewidth = 0.1) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Tehsils of Haryana State (India)",
    subtitle = "The raw map, showing each tehsil /\nsubdivision of Haryana."
  )  +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

g2 <- haryana |> 
  group_by(district) |> 
  summarise() |> 
  ggplot() +
  geom_sf(
    linewidth = 0.1
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Districts of Haryana State (India)",
    subtitle = "Aggregating tehsils with group_by(..) |> summarise()\nwhich uses st_union() at the backend."
  ) +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

g3 <- haryana |> 
  group_by(district) |> 
  summarise() |> 
  st_cast("LINESTRING") |> 
  ggplot() +
  geom_sf(
    linewidth = 0.1
  ) +
  ggthemes::theme_map(
    base_family = "body_font"
  ) +
  coord_sf(expand = FALSE) +
  labs(
    title = "Casting geometry into LINESTRING",
    subtitle = "group_by(district) |> summarise() |> st_cast(\"LINESTRING\").\nLoss of some geometries, but internal slivers are removed."
  ) +
  theme(
    plot.title = element_text(
      hjust = 0.5
    ),
    plot.subtitle = element_text(
      hjust = 0.5,
      lineheight = 0.35
    )
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-7_1.png"),
  plot = (g1 + g2 + g3),
  height = 500,
  width = 900,
  units = "px",
  bg = "white"
)
Note

When using the st_union() function in spatial analysis, “inside small lines” or “slivers” appearing in the resulting combined geometry often indicates slight discrepancies in the input geometries, particularly at boundary points, causing the union operation to create small, extra line segments where the geometries nearly overlap but don’t perfectly align, as shown in Figure 7

Figure 7: A sub-district / Tehsil map of Haryana, when converted into a district map with summarise() [which basically uses st_union() only] produces an imperfect result with internal slivers, as the boundary nodes are not overlapping amongst the sub-districts / tehsils. This depicts an inherent (limitation? / caution?) of sf::st_union().

5.2.8 Type Transformations

  • Geometry Casting: Transform geometry types using st_cast() from the sf package.

  • Works on simple feature geometry (sfg), simple feature column (sfc), and simple feature objects (sf).

  • Reversible Casting:

    • Example: st_cast(linestring, "MULTIPOINT") reverts to multipoint geometry.
    • Retains original geometry for compatible transformations.
  • Feature Splitting:

    • Converts multi-objects to non-multi-objects by splitting (e.g., MULTIPOINT to multiple POINTs).
    • Expands rows with duplicate attribute values during splitting.
  • Use cases:

    • Multipoint to Linestring: For path length calculations from ordered points.
    • Linestring to Polygon: To compute area, e.g., lake boundary.
    • Added attributes (e.g., road names, lengths) in case of feature splitting.
    • Measurements like st_length() for each LINESTRING.
  • Geometry Type Transformations for Simple Features is shown in Table 1 (Credits: Geomputation with R, Book Table 5.1)

Code
# Load necessary libraries
library(tibble)
library(gt)

# Create the tibble
geometry_table <- tibble::tibble(
  `Input Geometry Type` = c(
    "POINT (1 point)",
    "MULTIPOINT (4 points)",
    "LINESTRING (1 linestring with 5 points)",
    "MULTILINESTRING (2 linestrings: one with 5 points, one with 2 points)",
    "POLYGON (1 polygon with 5 points)",
    "MULTIPOLYGON (2 polygons: each with 5 points)",
    "GEOMETRYCOLLECTION (2 geometries: a MULTIPOINT with 4 points and a LINESTRING with 5 points)"
  ),
  `POINT` = c(1, 4, 5, 7, 5, 10, 9),
  `MULTIPOINT` = c(1, 1, 1, 2, 1, 1, 1),
  `LINESTRING` = c(1, 1, 1, 2, 1, NA, NA),
  `MULTI-LINESTRING` = c(NA, 1, 1, 1, 1, 1, NA),
  `POLYGON` = c(NA, NA, NA, NA, 1, 2, NA),
  `MULTI-POLYGON` = c(NA, NA, NA, NA, NA, 1, NA),
  `GEOMETRY-COLLECTION` = c(NA, NA, NA, NA, NA, 1, 1)
)

# Render the table using gt
geometry_table |> 
  gt::gt() |> 
  gt::fmt_missing(columns = gt::everything(), missing_text = "") |> 
  gtExtras::gt_theme_538() |> 
  tab_style(
    style = cell_text(size = px(10)), 
    locations = cells_body(
      columns = `Input Geometry Type`, 
      rows = everything()             
    )
  ) |> 
  tab_style(
    style = cell_text(size = px(10),
                      align = "center"),
    locations = list(
      cells_column_labels(columns = everything()), 
      cells_stub(rows = everything())             
    )
  ) |> 
  tab_style(
    style = cell_text(align = "center"), 
    locations = cells_body(
      columns = -`Input Geometry Type`, 
      rows = everything()             
    )
  )
Table 1: All possible combinations of Geometry type transformations for Simple Features
Input Geometry Type POINT MULTIPOINT LINESTRING MULTI-LINESTRING POLYGON MULTI-POLYGON GEOMETRY-COLLECTION
POINT (1 point) 1 1 1



MULTIPOINT (4 points) 4 1 1 1


LINESTRING (1 linestring with 5 points) 5 1 1 1


MULTILINESTRING (2 linestrings: one with 5 points, one with 2 points) 7 2 2 1


POLYGON (1 polygon with 5 points) 5 1 1 1 1

MULTIPOLYGON (2 polygons: each with 5 points) 10 1
1 2 1 1
GEOMETRYCOLLECTION (2 geometries: a MULTIPOINT with 4 points and a LINESTRING with 5 points) 9 1



1

An attempt to recreate this table graphically with {ggplot2} and {sf}. Unfortunately, it didn’t work out.

Code
# Define the bounding box limits
xmin <- -2; xmax <- 2; ymin <- -2; ymax <- 2

# Create POINT
tbl_poi <- data.frame(x = 0, y = 0)
poi <- st_sfc(st_point(c(tbl_poi$x, tbl_poi$y)), crs = 4326)

# Create MULTIPOINT
tbl_mpoi <- data.frame(x = c(-1, 1, -1, 1), y = c(-1, -1, 1, 1))
mpoi <- st_sfc(st_multipoint(as.matrix(tbl_mpoi)), crs = 4326)

# Create LINESTRING
tbl_lin <- data.frame(x = seq(xmin, xmax, length.out = 5), y = seq(ymin, ymax, length.out = 5))
lin <- st_sfc(st_linestring(as.matrix(tbl_lin)), crs = 4326)

# Create MULTILINESTRING
tbl_mlin1 <- data.frame(x = seq(xmin, xmax, length.out = 5), y = seq(ymin, ymax, length.out = 5))
tbl_mlin2 <- data.frame(x = c(xmin, xmax), y = c(ymax, ymin))
mlin <- st_sfc(st_multilinestring(list(as.matrix(tbl_mlin1), as.matrix(tbl_mlin2))), crs = 4326)

# Create POLYGON
tbl_pol <- data.frame(x = c(xmin, xmax, xmax, xmin, xmin),
                      y = c(ymin, ymin, ymax, ymax, ymin))
pol <- st_sfc(st_polygon(list(as.matrix(tbl_pol))), crs = 4326)

# Create MULTIPOLYGON
tbl_mpol1 <- data.frame(x = c(-1, 0, 0, -1, -1),
                       y = c(-1, -1, 0, 0, -1))
tbl_mpol2 <- data.frame(x = c(0, 1, 1, 0, 0),
                       y = c(0, 0, 1, 1, 0))
mpol <- st_sfc(st_multipolygon(list(list(as.matrix(tbl_mpol1)), list(as.matrix(tbl_mpol2)))), crs = 4326)

# Create GEOMETRYCOLLECTION
gc <- st_sfc(st_geometrycollection(
  list(
    st_multipoint(as.matrix(tbl_mpoi)),
    st_linestring(as.matrix(tbl_lin))
  )
), crs = 4326)

names_geometries <- c(
  "POINT", "MULTIPOINT",
  "LINESTRING", "MULTILINESTRING",
  "POLYGON", "MULTIPOLYGON",
  "GEOMETRYCOLLECTION"
  )
# Clean workspace
rm(list = ls(pattern = "^tbl_"))
rm(list = ls(pattern = "min|max"))

# Combine all objects into an sf object for viewing
sf_objects <- st_sf(
  geometry = c(poi, mpoi, lin, mlin, pol, mpol, gc), 
  type = names_geometries
  )


theme_custom <- function(...){
  theme_minimal() +
    theme(
      axis.ticks = element_blank(),
      axis.text = element_blank(),
      ...
      )
}

# basic geometries made
sf_objects |> 
  ggplot() +
  geom_sf() +
  facet_wrap(~type)

blank_numbers <- c(
  4, 5, 6, 7,
  12, 13, 14,
  19, 20, 21, 26, 27, 28,
  33, 34, 35, 38,
  41, 42, 
  45, 46, 47, 48
)

for (i in 1:7){
  original_geometry = names_geometries[i]

  for (j in 1:7){
    plot_number <- ((7 * (i-1)) + j)
  
    if (plot_number %in% blank_numbers) {
      assign(
        paste0("g", ((7 * (i-1)) + j)),
        ggplot() + theme_custom()
      )
    } else {
      temp <- sf_objects |> 
        filter(type == original_geometry) |> 
        st_cast(names_geometries[j])
      assign(
        paste0("g", ((7 * (i-1)) + j)),
        ggplot(temp) +
          geom_sf() +
          scale_x_continuous(position = "top") +
          theme_custom()
        )
      }
    }
}

g <- patchwork::wrap_plots(
  mget(paste0("g", 1:49))
) +
  patchwork::plot_layout(
    ncol = 7, nrow = 7
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-2-8_1.png"),
  plot = g,
  height = 7800,
  width = 7800,
  units = "px",
  bg = "white"
)

5.3 Geometric operations on raster data

  • Geometric raster operations include shifting, flipping, mirroring, scaling, rotation, and warping of images. Purpose:
    • Enable georeferencing for overlaying images on accurate maps with a known CRS (Coordinate Reference System).
    • Applications include georectification, orthorectification, and image registration.
  • Techniques:
    • Georectification: Uses known ground control points for alignment (Liu and Mason 2009) (Loecher and Ropkins 2015).
    • Orthorectification: Accounts for local topography.
    • Image Registration: Aligns images from different sensors by matching their coordinate systems and resolutions.
  • Suitability of R:
    • Manual operations like georectification and orthorectification are better handled in GIS software like QGIS.
    • R is suitable for aligning multiple images by modifying extent, resolution, and origin.

5.3.1 Geometric Intersections

  • Purpose:

    • Extract raster values: Use spatial objects overlaid on a raster to extract values with subsetting syntax.

    • Keep spatial output: To retain a spatial output (e.g., raster structure), set drop = FALSE during subsetting.

    • Raster midpoints: Returns a raster object containing cells whose midpoints overlap with the clip area.

  • Comparison of Base R [] vs. terra::crop() shown in Figure 8

Feature Base R [] Subsetting terra::crop()
Purpose General selection of raster values Spatial clipping of rasters
Spatial Metadata Requires manual handling Automatically preserved
Syntax raster[condition, drop = FALSE] terra::crop(raster, extent)
Key Functionality Selects raster cells by condition Clips raster to the provided extent
Recommended For Basic operations, custom workflows Efficient raster clipping in geospatial analyses

terra::crop() is often more efficient and user-friendly for spatial raster workflows with pipe operator (|>) compared to base R’s [] subsetting.

Code
elev <- rast(system.file("raster/elev.tif", package = "spData"))
clip <- rast(xmin = -0.5, xmax = 0.5, ymin = -1, ymax = 1,
            resolution = 0.5, vals = rep(1, 8))


g1 <- ggplot() +
  geom_spatraster(data = elev) +
  scale_fill_grass_c(limits = c(1, 36)) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(-1.5, 1.5)) +
  ggtitle("elev raster") +
  theme(legend.position = "none")

g2 <- ggplot() +
  geom_spatraster(data = clip) +
  scale_fill_grass_c(limits = c(1, 36)) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(-1.5, 1.5)) +
  ggtitle("clip raster") +
  theme(legend.position = "none")

g3 <- ggplot() +
  geom_spatraster(data = elev[clip, drop = FALSE]) +
  scale_fill_grass_c(limits = c(1, 36)) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(-1.5, 1.5)) +
  ggtitle("elev[clip, drop = FALSE]") +
  theme(legend.position = "none")

g4 <- ggplot() +
  geom_spatraster(data = elev |> terra::crop(clip)) +
  scale_fill_grass_c(limits = c(1, 36)) +
  scale_x_continuous(limits = c(-1.5, 1.5)) +
  scale_y_continuous(limits = c(-1.5, 1.5)) +
  ggtitle("elev |> terra::crop(clip)") +
  theme(
    legend.key.height = unit(10, "pt"),
    legend.key.width = unit(4, "pt")
  )

g <- g1 + g2 + g3 + g4 +
  plot_layout(
    nrow = 1
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-3-1_1.png"),
  plot = g,
  height = 700,
  width = 1800,
  units = "px",
  bg = "white"
)
Figure 8: Comparing base R [] and terra::crop() for subsetting rasters

5.3.2 Extent and Origin

  • Raster alignment necessity: Mismatches in resolution, projection, origin, or extent must be addressed for map algebra or merging. For example, Adding a raster with a 0.2-degree resolution to one with a 1-degree resolution is impossible without alignment.
  • Extent mismatch: Extending a raster adds rows/columns filled with NA values. Use extend() from the terra package for this.
  • Extent alignment: Performing operations on rasters with differing extents throws an error. Solution: Use another raster as a reference to align extents via terra::extend().
    • The terra::extend() function in R enlarges the spatial extent of a SpatRaster or SpatExtent. It can add rows and columns, with values filled using the fill argument (default NA). The y argument specifies how much to extend, either as an extent object or numeric values indicating rows/columns to add. The snap argument controls alignment (“near”, “in”, “out”). Outputs can be saved using filename and overwrite.
  • Origin:
    • The origin is the raster cell corner closest to coordinates (0, 0). Use terra::origin() to retrieve or modify the raster origin.
    • Misaligned origins cause raster cells not to overlap, making map algebra impossible. Adjust the origin with origin() to ensure proper alignment.
    • If two rasters are marginally apart, change tolerance argument for terra::terraOptions().
  • Visualization: Figure 9 illustrates raster extension and Figure 10 shows the effect of origin changes.
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()
theme_set(
  theme_minimal(
    base_family = "body_font",
    base_size = 18
  )
)
elev <-  rast(system.file("raster/elev.tif", 
                          package = "spData"))
elev2 <- elev |> extend(c(1,2), fill = 36)

g1 <- ggplot() +
  geom_spatraster(data = elev) +
  scale_fill_grass_c() +
  labs(
    title = "Base Raster elev"
  )

g2 <- ggplot() +
  geom_spatraster(data = elev |> extend(c(1,2), fill = 36)) +
  scale_fill_grass_c() +
  labs(
    title = "elev |> extend(c(1,2), fill = 36) "
  )

g3 <- ggplot() +
  geom_spatraster(data = elev |> extend(elev2, fill = 1)) +
  scale_fill_grass_c() +
  labs(
    title = "elev |> extend(elev2)"
  )

g <- g1 + g2 + g3 +
  plot_layout(
    guides = "collect"
  ) + 
  plot_annotation(
    title = "Extending rasters with terra::extend()",
    theme = theme(
      legend.title = element_blank(),
      plot.title = element_text(
        size = 42
      ),
      plot.subtitle = element_text(
        size = 24
      )
    )
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-3-2_2.png"),
  plot = g,
  height = 800,
  width = 2000,
  units = "px",
  bg = "white"
)
Figure 9: Use of terra::extend() to increase or decrease size of rasters
Code
sysfonts::font_add_google("Fira Sans Condensed", "body_font")
showtext::showtext_auto()
theme_set(
  theme_minimal(
    base_family = "body_font",
    base_size = 18
  )
)

elev <-  rast(system.file("raster/elev.tif", 
                          package = "spData"))

g1 <- ggplot() +
  geom_spatraster(
    data = elev,
    alpha = 0.9
  ) +
  scale_fill_grass_c() +
  ggtitle("Original Raster")

elev2 <- elev
origin(elev2) <- c(0.2 , 0.2)
g2 <- g1 + 
  geom_spatraster(
    data = elev2,
    alpha = 0.9
  ) +
  labs(
    title = "Origin shifted by c(0.2, 0.2)",
    subtitle = "Shifting along x and y axes.")

elev3 <- elev
origin(elev3) <- c(0.6 , 0.1)
g3 <- g1 +
  geom_spatraster(
    data = elev3,
    alpha = 0.9
  ) +
  labs(
    title = "Origin shifted by c(0.6, 0.1)",
    subtitle = "Shifting by remainder beyond division by resolution.")

g <- g1 + g2 + g3 +
  plot_layout(
    guides = "collect"
  ) + 
  plot_annotation(
    title = "Effects of changing terra::origin() of a SpatRaster",
    subtitle = "Changing by more than half of resolution results in shifting by +/- amount of remainder of division of origin shift by resolution",
    theme = theme(
      legend.title = element_blank(),
      plot.title = element_text(
        size = 42
      ),
      plot.subtitle = element_text(
        size = 24
      )
    )
  )


ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-3-2_1.png"),
  plot = g,
  height = 800,
  width = 2000,
  units = "px",
  bg = "white"
)
Figure 10: Using terra::origin() to change origin of a raster in relation to c(0,0)

5.3.3 Aggregation and Disaggregation

  • Raster Resolution Adjustments:
    • Aggregation: Decreases raster resolution using the aggregate() function.
    • Disaggregation: Increases raster resolution using the disagg() function.
  • Aggregation Details: The output cell values represent the mean of input cells. Other functions like median(), sum(), etc., can also be applied.
    • Effect on Resolution: Aggregation increases resolution by reducing rows and columns.
  • Disaggregation Details: 2 Methods are shown in Figure 11
    • Default (method = "near"): Duplicates input cell values, creating a ‘blocky’ output.
    • Bilinear (method = "bilinear"): Computes output values using a weighted average of the four nearest input pixel centers.
    • Accuracy: Results in finer resolution, but values are interpolations of the lower-resolution source.
  • Comparison of Rasters:
    • Aggregated and disaggregated rasters differ from the original raster.
    • Use functions like compareGeom() or all.equal() to identify differences.
  • Key Considerations:
    • Aggregation simplifies data, while disaggregation interpolates it, often leading to approximate values based on the source resolution.
    • Resolution changes impact raster extent, dimensions, and data accuracy.
Code
original_raster <- rast(system.file("raster/dem.tif", package = "spDataLarge"))

g1 <- ggplot() +
  geom_spatraster(data = original_raster) +
  scale_fill_wiki_c() +
  labs(
    title = "Original Raster",
    subtitle = paste0(
      "Resolution: ", 
      terra::res(original_raster) |> 
        round(1) |> 
        paste0(collapse = ", "),
      "\nDimensions: ",
      dim(original_raster)[1:2] |> 
        paste0(collapse = ", ")
    )
  )

agg_raster <- terra::aggregate(
  x = original_raster, 
  fact = 5, 
  fun = mean
  )

g2 <- ggplot() +
  geom_spatraster(data = agg_raster) +
  scale_fill_wiki_c() +
  labs(
    title = "Aggregation by factor of 5",
    subtitle = paste0(
      "Resolution: ", 
      terra::res(agg_raster) |> 
        round(1) |> 
        paste0(collapse = ", "),
      "\nDimensions: ",
      dim(agg_raster)[1:2] |> 
        paste0(collapse = ", ")
    )
  )


disagg_raster1 <- terra::disagg(
  x = agg_raster,
  fact = 5,
  method = "near"
)

g3 <- ggplot() +
  geom_spatraster(data = disagg_raster1) +
  scale_fill_wiki_c() +
  labs(
    title = "Dis-Aggregation (method = near)",
    subtitle = paste0(
      "Resolution: ", 
      terra::res(disagg_raster1) |> 
        round(1) |> 
        paste0(collapse = ", "),
      "\nDimensions: ",
      dim(disagg_raster1)[1:2] |> 
        paste0(collapse = ", ")
    )
  )

disagg_raster2 <- terra::disagg(
  x = agg_raster,
  fact = 5,
  method = "bilinear"
)

g4 <- ggplot() +
  geom_spatraster(data = disagg_raster2) +
  scale_fill_wiki_c() +
  labs(
    title = "Dis-Aggregation (method = bilinear)",
    subtitle = paste0(
      "Resolution: ", 
      terra::res(disagg_raster2) |> 
        round(1) |> 
        paste0(collapse = ", "),
      "\nDimensions: ",
      dim(disagg_raster2)[1:2] |> 
        paste0(collapse = ", ")
    )
  )

g <- g1 + g2 + g3 + g4 +
  plot_layout(
    guides = "collect",
    nrow = 1
  ) + 
  plot_annotation(
    title = "Aggregating and Disaggregating rasters",
    subtitle = "Disaggregating a raster will not revert to original details as some data is lost.",
    theme = theme(
      legend.title = element_blank(),
      plot.title = element_text(
        size = 42
      ),
      plot.subtitle = element_text(
        size = 18
      )
    )
  )


ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-3-3_1.png"),
  plot = g,
  height = 1800,
  width = 5400,
  units = "px",
  bg = "white"
)
Figure 11: Aggregating and Disaggregating a raster by a factor of 5. Note the superiority of blinear method of disaggregatiion.

5.3.4 Resampling

  • Purpose: Adjusts pixel values for a target raster with different resolution and origin than the original raster (terra). Useful for combining rasters with varying resolutions/origins.

  • Methods: A comparison table on resampling methods available in the {terra} package’s resample() function with examples in Figure 12

No. Method Description Suitable For Complexity Use Case
1 Nearest Neighbor Assigns the value of the nearest cell of the original raster to the target cell. Categorical rasters Simple Discrete data like land-use classifications
2 Bilinear Interpolation Computes a weighted average of the four nearest cells from the original raster for the target cell. Continuous rasters Moderate Continuous data; smoother transitions, elevation models or temperature data.
3 Cubic Interpolation Uses the values of the 16 nearest cells, applying third-order polynomial functions for smooth output. Continuous rasters Higher Continuous data requiring smoother surfaces; resampling satellite imagery.
4 Cubic Spline Interpolation Uses the 16 nearest cells, applying cubic splines (piece-wise third-order polynomial functions). Continuous rasters Higher Continuous data where smoothness is crucial; terrain modeling.
5 Lanczos Resampling Uses the 36 nearest cells with a Lanczos windowed sinc function for smoother outputs. Continuous rasters High Best for high-quality image scaling; maintains sharpness and reduces aliasing in photographic images.
6 Sum Calculates the weighted sum of all non-NA contributing grid cells. Spatially extensive variables Variable Aggregating data like population counts; ensures the total sum remains consistent.
7 Min Finds the minimum value among all non-NA contributing grid cells. Continuous rasters Variable Where the minimum value is of interest, such as finding the lowest temperature in a region.
8 Q1 (First Quartile) Computes the first quartile value of all non-NA contributing grid cells. Continuous rasters Variable To understand the lower distribution of data values.
9 Median (Med) Computes the median value of all non-NA contributing grid cells. Continuous rasters Variable Reducing the impact of outliers; in skewed data distributions.
10 Q3 (Third Quartile) Computes the third quartile value of all non-NA contributing grid cells. Continuous rasters Variable Focussing on the upper distribution of data values.
11 Max Finds the maximum value among all non-NA contributing grid cells. Continuous rasters Variable Identifying peak values, such as maximum elevation or highest temperature.
12 Average Calculates the mean value of all non-NA contributing grid cells. Continuous rasters Variable For smoothing data; provides an overall average.
13 Mode Identifies the most frequent value among all non-NA contributing grid cells. Continuous rasters Variable For categorical data; to determine the most common land cover type.
14 RMS (Root Mean Square) Computes the root mean square of all non-NA contributing grid cells. Continuous rasters Variable In error analysis; For assessing the variability in data values.
  • Notes:

    • Nearest Neighbor is the preferred method for categorical data to prevent the creation of non-existent categories.
    • Bilinear, Cubic, Cubic Spline, and Lanczos methods are more appropriate for continuous data, offering varying degrees of smoothness and detail preservation.
  • Categorical vs. Continuous:

    • Only nearest neighbor is suitable for categorical rasters.
    • Other methods can be used for continuous rasters but vary in outcomes, complexity, and processing time.
  • Additional Options:

    • Statistical Resampling:
      • Methods like sum, min, med, mode, and rms compute statistics over contributing cells.
      • Example: sum ensures the total remains unchanged for spatially extensive variables (e.g., population).
  • Implementation in terra:

    • Use the resample() function (documentation).
    • Requires:
      • Input Raster (x): Original raster.
      • Target Raster (y): Raster with desired resolution/origin.
      • Method: Resampling method (e.g., “bilinear”).
  • Note: Raster Re-projection is a special case of raster re-sampling. A specific type of re-sampling used for target rasters with different Coordinate Reference Systems (CRS).

Code
sysfonts::font_add_google("Encode Sans Condensed", "body_font")
showtext::showtext_auto()
original_raster <- rast(system.file("raster/dem.tif", package = "spDataLarge"))

resampling_methods_tbl <- tibble(
  name = c(
    "near",
    "bilinear",
    "cubic",
    "cubicspline",
    "lanczos",
    "sum",
    "min",
    "q1",
    "median",
    "q3",
    "max",
    "average",
    "mode",
    "rms"
  ),
  description = c(
    "Nearest Neighbor",
    "Bilinear Interpolation",
    "Cubic Interpolation",
    "Cubic Spline Interpolation",
    "Lanczos Resampling",
    "Sum",
    "Minimum",
    "First Quartile",
    "Median",
    "Third Quartile",
    "Maximum",
    "Average (Mean)",
    "Mode",
    "Root Mean Square"
  )
)

# A Target Raster for re-sampling
target_rast <- rast(xmin = 794650, xmax = 798250, 
                   ymin = 8931750, ymax = 8935350,
                   resolution = 100, crs = "EPSG:32717")

theme_custom <- function(...){
  ggthemes::theme_map(
    base_family = "body_font",
    base_size = 45
  ) +
    theme(
      plot.title = element_text(
        size = 45, 
        margin = margin(0,0,2,0, "mm")
      ),
      legend.position = "bottom",
      legend.text = element_text(
        margin = margin(1,0,0,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}
Code
g1 <- ggplot() +
  geom_spatraster(data = original_raster) +
  scale_fill_wiki_c() +
  coord_sf(expand = F) +
  labs(
    title = "Original Raster"
  ) +
  guides(fill = "none") +
  theme_custom()

for (i in c(2:7, 11, 12, 13, 14)) {
  assign(
    paste0("g", i),
    ggplot() +
      geom_spatraster(
        data = terra::resample(
          x = original_raster,
          y = target_rast,
          method = resampling_methods_tbl$name[i]
        )
      ) +
      scale_fill_wiki_c() +
      coord_sf(expand = F) +
      labs(
        title = paste0(resampling_methods_tbl$description[i]),
        fill = NULL
      ) +
      theme_custom(
        legend.key.height = unit(2, "mm"),
        legend.key.width = unit(20, "mm")
      )
  )
}

g <- g1 + g2 + g3 + g4 + g5 + g7 + g12 + g13 + g14 + 
  plot_layout(
    guides = "collect",
  ) + 
  plot_annotation(
    title = "Methods for Re-sampling rasters with {terra}",
    theme = theme(
      legend.title = element_blank(),
      plot.title = element_text(
        size = 75,
        margin = margin(5,0,5,0, "mm"),
        hjust = 0.5
      ),
      legend.position = "bottom"
    )
  )


ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-3-4_1.png"),
  plot = g,
  height = 2400 * 5/4,
  width = 2400,
  units = "px",
  bg = "white"
)
Figure 12: Various methods for re-sampling a raster with terra::resample()
Code
# Load required libraries
library(tibble)
library(gt)

# Create the resampling methods table as a tibble
resampling_methods <- tibble::tibble(
  `Resampling Method` = c(
    "Nearest Neighbor", "Bilinear Interpolation", "Cubic Interpolation", 
    "Cubic Spline Interpolation", "Lanczos Resampling", "Sum", "Min", 
    "Q1 (First Quartile)", "Median (Med)", "Q3 (Third Quartile)", 
    "Max", "Average", "Mode", "RMS (Root Mean Square)"
  ),
  Description = c(
    "Assigns the value of the nearest cell of the original raster to the target cell.",
    "Computes a weighted average of the four nearest cells from the original raster for the target cell.",
    "Uses the values of the 16 nearest cells, applying third-order polynomial functions for smooth output.",
    "Uses the 16 nearest cells, applying cubic splines (piece-wise third-order polynomial functions).",
    "Uses the 36 nearest cells with a Lanczos windowed sinc function for smoother outputs.",
    "Calculates the weighted sum of all non-NA contributing grid cells.",
    "Finds the minimum value among all non-NA contributing grid cells.",
    "Computes the first quartile value of all non-NA contributing grid cells.",
    "Computes the median value of all non-NA contributing grid cells.",
    "Computes the third quartile value of all non-NA contributing grid cells.",
    "Finds the maximum value among all non-NA contributing grid cells.",
    "Calculates the mean value of all non-NA contributing grid cells.",
    "Identifies the most frequent value among all non-NA contributing grid cells.",
    "Computes the root mean square of all non-NA contributing grid cells."
  ),
  `Suitable For` = c(
    "Categorical rasters", "Continuous rasters", "Continuous rasters", 
    "Continuous rasters", "Continuous rasters", "Spatially extensive variables", 
    "Continuous rasters", "Continuous rasters", "Continuous rasters", 
    "Continuous rasters", "Continuous rasters", "Continuous rasters", 
    "Continuous rasters", "Continuous rasters"
  ),
  Complexity = c(
    "Simple", "Moderate", "Higher", "Higher", "High", "Variable", 
    "Variable", "Variable", "Variable", "Variable", "Variable", 
    "Variable", "Variable", "Variable"
  ),
  `Processing Time` = c(
    "Fast", "Fast", "Moderate", "Moderate", "High", "Variable", 
    "Variable", "Variable", "Variable", "Variable", "Variable", 
    "Variable", "Variable", "Variable"
  ),
  `Use Case` = c(
    "Ideal for discrete data like land-use classifications; preserves original values without alteration.",
    "Suitable for continuous data; provides smoother transitions, useful for elevation models or temperature data.",
    "Appropriate for continuous data requiring smoother surfaces; beneficial for resampling satellite imagery.",
    "Effective for continuous data where smoothness is crucial; often used in terrain modeling.",
    "Best for high-quality image scaling; maintains sharpness and reduces aliasing in photographic images.",
    "Useful for aggregating data like population counts; ensures the total sum remains consistent after resampling.",
    "Applied in scenarios where the minimum value is of interest, such as finding the lowest temperature in a region.",
    "Useful in statistical analyses to understand the lower distribution of data values.",
    "Ideal for reducing the impact of outliers; provides a central tendency measure in skewed data distributions.",
    "Beneficial for statistical analyses focusing on the upper distribution of data values.",
    "Suitable for identifying peak values, such as maximum elevation or highest temperature in a dataset.",
    "Commonly used for smoothing data; provides an overall average, useful in environmental data analyses.",
    "Effective for categorical data to determine the most common category, such as predominant land cover type.",
    "Applied in error analysis; useful for assessing the magnitude of variability in data values."
  )
)

# Display the table using gt
resampling_methods |> 
  gt::gt() |> 
  gt::tab_header(
    title = "Resampling Methods for rasters in {terra}"
  ) |> 
  gtExtras::gt_theme_espn()
Table 2: Comparison of Resampling Methods for rasters in
Resampling Methods for rasters in {terra}
Resampling Method Description Suitable For Complexity Processing Time Use Case
Nearest Neighbor Assigns the value of the nearest cell of the original raster to the target cell. Categorical rasters Simple Fast Ideal for discrete data like land-use classifications; preserves original values without alteration.
Bilinear Interpolation Computes a weighted average of the four nearest cells from the original raster for the target cell. Continuous rasters Moderate Fast Suitable for continuous data; provides smoother transitions, useful for elevation models or temperature data.
Cubic Interpolation Uses the values of the 16 nearest cells, applying third-order polynomial functions for smooth output. Continuous rasters Higher Moderate Appropriate for continuous data requiring smoother surfaces; beneficial for resampling satellite imagery.
Cubic Spline Interpolation Uses the 16 nearest cells, applying cubic splines (piece-wise third-order polynomial functions). Continuous rasters Higher Moderate Effective for continuous data where smoothness is crucial; often used in terrain modeling.
Lanczos Resampling Uses the 36 nearest cells with a Lanczos windowed sinc function for smoother outputs. Continuous rasters High High Best for high-quality image scaling; maintains sharpness and reduces aliasing in photographic images.
Sum Calculates the weighted sum of all non-NA contributing grid cells. Spatially extensive variables Variable Variable Useful for aggregating data like population counts; ensures the total sum remains consistent after resampling.
Min Finds the minimum value among all non-NA contributing grid cells. Continuous rasters Variable Variable Applied in scenarios where the minimum value is of interest, such as finding the lowest temperature in a region.
Q1 (First Quartile) Computes the first quartile value of all non-NA contributing grid cells. Continuous rasters Variable Variable Useful in statistical analyses to understand the lower distribution of data values.
Median (Med) Computes the median value of all non-NA contributing grid cells. Continuous rasters Variable Variable Ideal for reducing the impact of outliers; provides a central tendency measure in skewed data distributions.
Q3 (Third Quartile) Computes the third quartile value of all non-NA contributing grid cells. Continuous rasters Variable Variable Beneficial for statistical analyses focusing on the upper distribution of data values.
Max Finds the maximum value among all non-NA contributing grid cells. Continuous rasters Variable Variable Suitable for identifying peak values, such as maximum elevation or highest temperature in a dataset.
Average Calculates the mean value of all non-NA contributing grid cells. Continuous rasters Variable Variable Commonly used for smoothing data; provides an overall average, useful in environmental data analyses.
Mode Identifies the most frequent value among all non-NA contributing grid cells. Continuous rasters Variable Variable Effective for categorical data to determine the most common category, such as predominant land cover type.
RMS (Root Mean Square) Computes the root mean square of all non-NA contributing grid cells. Continuous rasters Variable Variable Applied in error analysis; useful for assessing the magnitude of variability in data values.
Alternatives to {terra} Functions
  • Why Alternatives? While {terra} is user-friendly and performs well for large rasters, it may not be the most efficient for extensive rasters or numerous raster files.
  • Key Alternatives via C++ Library GDAL (accessed via {gdalUtilities} or sf::gdal_utils() function):
    • gdalinfo:
      • Provides detailed information about a raster file (e.g., resolution, CRS, bounding box).
      • Useful for metadata exploration.
    • gdal_translate:
      • Converts raster files between formats.
      • Can also modify properties like compression or CRS (e.g., t_srs = "EPSG:4326").
    • gdal_rasterize:
      • Converts vector data into raster format.
      • Ideal for creating rasterized maps from vector datasets.
    • gdalwarp:
      • Performs raster mosaicing, cropping, resampling, and reprojection.
      • Essential for combining rasters, altering extents, or changing resolutions.
  • Integration with R:
    • GDAL functions are written in C++ but can be accessed in R using:
      • sf::gdal_utils(): Wrapper for calling GDAL functions.
      • gdalUtilities package: Dedicated functions for common GDAL tasks.
      • System Commands: Directly call GDAL commands from the terminal within R.
    • Input/output handling:
      • GDAL functions work with file paths. Output is often saved as a file rather than in-memory objects like {terra}’s SpatRaster.
  • When to Use GDAL:
    • For handling very large datasets or batch processing of raster files.
    • When advanced raster operations (e.g., raster mosaics) are required.
    • For compatibility with diverse raster file formats and global standards.

5.4 Exercises

E1.

Generate and plot simplified versions of the nz dataset. Experiment with different values of keep (ranging from 0.5 to 0.00005) for ms_simplify() and dTolerance (from 100 to 100,000) st_simplify().

The plots for different levels of keep in ms_simplify() are shown in Figure 13.

Code
data("nz")

for (i in c(0.5, 0.05, 0.005, 0.0005, 0.00005)) {
  g <- ggplot(
    data = nz |> rmapshaper::ms_simplify(keep = i)
  ) +
    geom_sf() +
    theme_minimal() +
    theme(
      panel.grid = element_line(
        linewidth = 0.3, 
        linetype = 3
      )
    ) +
    labs(
      title = paste0("st_simplify(keep = ", i, ")")
    )
  print(g)
}
(a) keep = 0.5
(b) keep = 0.05
(c) keep = 0.005
(d) keep = 0.0005
(e) keep = 0.00005
Figure 13: Different keep levels in rmapshaper::ms_simplify()

The plots for different levels of dTolerance within st_simplify() are in Figure 14.

Code
data("nz")

for (i in c(100, 1000, 10000, 100000)) {
  g <- ggplot(
    data = nz |> st_simplify(dTolerance = i)
  ) +
    geom_sf() +
    theme_minimal() +
    theme(
      panel.grid = element_line(
        linewidth = 0.3, 
        linetype = 3
      )
    ) +
    labs(
      title = paste0("st_simplify(dTolerance = ", i, ")")
    )
  print(g)
}
(a) dTolerance = 100
(b) dTolerance = 1000
(c) dTolerance = 10000
(d) dTolerance = 100000
Figure 14: Different dTolerance levels in st_simplify()
  • At what value does the form of the result start to break down for each method, making New Zealand unrecognizable?

    For the rmapshaper::ms_simplify() method, the result starts to break down at keep = 0.005 . For the sf::st_simplify() method, the result starts to break down around dTolerance = 10000 .

  • Advanced: What is different about the geometry type of the results from st_simplify() compared with the geometry type of ms_simplify()? What problems does this create and how can this be resolved?

    The results from st_simplify() can be of multiple geometry types: MULTIPOLYGON or POLYGON, where as the results from ms_simplify() are always of a single geometry type: POLYGON.

    Problems Created by st_simplify()

    • The invalid geometries produced by st_simplify() can lead to errors in subsequent spatial operations, such as overlays, spatial joins, or intersections.

    • For example, functions like st_intersection() or st_union() may fail or produce incorrect results when operating on invalid geometries.

    Resolution: Using st_cast() to convert all geometries into a single type.

# Number of features in New Zealand Dataset
nrow(nz)
[1] 16
# New Zealand data after st_simplify(): some features are MULTIPLOYGON, and some are POLYGON
nz |> 
  st_simplify(dTolerance = 1000) |> 
  st_as_sfc()
Geometry set for 16 features 
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 5 geometries:
# Resolution: cast them into any one type using st_cast()
nz |> 
  st_simplify(dTolerance = 1000) |> 
  st_as_sfc() |> 
  st_cast("POLYGON")
Geometry set for 16 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4823273 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 5 geometries:
nz |> 
  st_simplify(dTolerance = 1000) |> 
  st_as_sfc() |> 
  st_cast("MULTIPOLYGON")
Geometry set for 16 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 5 geometries:
nz |> 
  rmapshaper::ms_simplify(keep = 0.005) |> 
  st_as_sfc()
Geometry set for 14 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 1091224 ymin: 4830067 xmax: 2049387 ymax: 6001802
Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
First 5 geometries:

E2.

In the first exercise in Chapter Spatial Data Operations it was established that Canterbury region had 70 of the 101 highest points in New Zealand. Using st_buffer(), how many points in nz_height are within 100 km of Canterbury?

Out of the 101 heights, 95 of them fall within 100 km of the Canterbury region, as explained in the code below, and as shown in Figure 15

data("nz_height")

nz_height |> nrow()
[1] 101
canterbury_bf <- nz |> 
  filter(Name == "Canterbury") |> 
  st_buffer(dist = 100000)

nz_height |> 
  st_filter(canterbury_bf) |> 
  nrow()
[1] 95
Code
selected <- nz_height |> 
  st_filter(canterbury_bf) |> 
  pull(t50_fid)

nz_height <- nz_height |> 
  mutate(colour_var = t50_fid %in% selected)

ggplot() +
  geom_sf(data = nz) +
  geom_sf(
    data = canterbury_bf,
    fill = alpha("red", 0.1)
  ) +
  geom_sf(
    data = nz_height,
    mapping = aes(colour = colour_var),
    alpha = 0.7,
    pch = 17
  ) +
  scale_colour_manual(
    values = c("blue", "red")
  ) +
  labs(
    colour = "Within 100 km\nof Canterbury?"
  )
Figure 15

E3.

Find the geographic centroid of New Zealand. How far is it from the geographic centroid of Canterbury?

data(nz)

nz_centre <- nz |> 
  # Combining entire New Zealand into a geometry
  st_union() |> 
  st_centroid()

# Geographic Centroid of New Zealand
nz_centre |> 
  st_transform(crs = 4326) |> 
  as_vector() |> 
  round(digits = 3) |> 
  paste0(c(" Longitude;  ", " Latitude"), collapse = "")
[1] "172.842 Longitude;  -41.684 Latitude"
canterbury_centre <- nz |> 
  filter(Name == "Canterbury") |> 
  st_centroid() |> 
  st_as_sfc()

# Geographic Centroid of Canterbury
canterbury_centre |> 
  st_transform(crs = 4326) |> 
  as_vector() |> 
  round(digits = 3) |> 
  paste0(c(" Longitude;  ", " Latitude"), collapse = "")  
[1] "171.572 Longitude;  -43.573 Latitude"
# Distance between the two centroid
st_distance(nz_centre, canterbury_centre) |> 
  as.vector() |> 
  multiply_by(0.001) |> 
  round(digits = 2) |> 
  paste0(" km")
[1] "234.19 km"
line_centres <- st_union(
  nz_centre,
  canterbury_centre
) |> 
  st_cast("LINESTRING")

ggplot() +
  geom_sf(
    data = nz, 
    mapping = aes(fill = Name == "Canterbury"),
    colour = "white"
  ) +
  scale_fill_manual(values = c(alpha("blue", 0.2), 
                               alpha("red", 0.2))) +
  geom_sf(data = nz_centre, colour = "blue") +
  geom_sf(data = canterbury_centre, colour = "red") +
  geom_sf(data = line_centres, linetype = 3) +
  labs(
    title = paste0("Distance = ",
                   st_distance(nz_centre, canterbury_centre) |> 
                      as.vector() |> 
                      multiply_by(0.001) |> 
                      round(digits = 2) |> 
                      paste0(" km")
                  )
  ) +
  ggthemes::theme_map() +
  theme(
    legend.position = "none"
  )

E4.

Most world maps have a north-up orientation. A world map with a south-up orientation could be created by a reflection (one of the affine transformations not mentioned in this chapter) of the world object’s geometry. Write code to do so. Hint: you can to use the rotation() function from this chapter for this transformation. Bonus: create an upside-down map of your country.

The multiplication of an sfc object with custom function rotation(180) can be used to create a world map with south-up orientation, as shown in Figure 16 (b)

Code
# Use the rotation function from the textbook
rotation <- function(a){
  r = a * pi / 180 #degrees to radians
  matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
} 
data("world")
world_up <- world |> 
  st_geometry()

world_rotate <- (world_up * rotation(180)) |> 
  st_set_crs(st_crs(world_up))

ggplot() +
  geom_sf(data = world_up) +
  coord_sf(crs = "ESRI:54009") +
  theme_minimal()

ggplot() +
  geom_sf(data = world_rotate) +
  coord_sf(crs = "ESRI:54009") +
  theme_minimal()
(a) World map upright
(b) World map upside-down
Figure 16: Using a custom rotation() and multiplying the sfc object with rotation(180)

E5.

Run the code in Section 5.2.6. With reference to the objects created in that section, subset the point in p that is contained within xandy.

# Create the two points
b <- st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) 
# Create circles around those points
b <- st_buffer(b, dist = 1)
# Name x and y -- the two circles
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)

# Bounding box of x and y
bb = st_bbox(st_union(x, y))
# Covert bounding box into a polygon
box = st_as_sfc(bb)
set.seed(2024)
p = st_sample(x = box, size = 10)
  • Using base subsetting operators. Answer is shown in Figure 17

    p_xy1 = p[x_and_y]
    
    ggplot() +
      geom_sf(data = b, fill = "transparent") +
      geom_sf(data = p, size = 2, alpha = 0.75) +
      geom_sf(data = p_xy1, 
              colour = "red",
              fill = "transparent",
              pch = 1,
              size = 6,
              stroke = 2)
    Figure 17: Using base r [] operator
  • Using an intermediary object created with st_intersection(). Answer is shown in Figure 18

    p1 <- p |> 
      st_as_sf() |> 
      mutate(highlight = st_intersects(x, x_and_y, sparse = F))
    
    ggplot() +
      geom_sf(data = b, fill = "transparent") +
      geom_sf(
        data = p1,
        mapping = aes(
          colour = highlight
        ),
        size = 3
      ) +
      theme(legend.position = "none")
    Figure 18: Using sf::st_intersection()

E6.

Calculate the length of the boundary lines of US states in meters. Which state has the longest border and which has the shortest? Hint: The st_length function computes the length of a LINESTRING or MULTILINESTRING geometry.

The longest border is of Texas (4,959.8 km) and the shortest border is of District of Columbia (60.2 km). The complete list is at Table 3 .

Code
data("us_states")

us_states |> 
  janitor::clean_names() |> 
  st_cast("MULTILINESTRING") |> 
  mutate(border = as.numeric(st_length(geometry))/1000) |> 
  st_drop_geometry() |> 
  select(name, region, border) |> 
  arrange(desc(border)) |> 
  gt::gt() |> 
  gt::cols_label(
    name = "State Name",
    region = "Region",
    border = "Length of Border (km)"
  ) |> 
  gt::fmt_number(
    columns = border
  ) |> 
  gt::opt_interactive()
Table 3: The border length of the different states in USA

E7.

Read the srtm.tif file into R (srtm = rast(system.file("raster/srtm.tif", package = "spDataLarge"))). This raster has a resolution of 0.00083 * 0.00083 degrees. Change its resolution to 0.01 * 0.01 degrees using all of the methods available in the terra package. Visualize the results. Can you notice any differences between the results of these re-sampling methods?

The different methods of resampling rasters and their demonstration is in Figure 19

Code
sysfonts::font_add_google("Encode Sans Condensed", "body_font")
showtext::showtext_auto()
srtm <- rast(system.file("raster/srtm.tif", package = "spDataLarge"))

# Create a target raster on which to re-sample
target_rast <- rast(
  xmin = xmin(ext(srtm)), 
  xmax = xmax(ext(srtm)), 
  ymin = ymin(ext(srtm)), 
  ymax = ymax(ext(srtm)),
  # resolution = res(srtm), 
  resolution = c(0.01, 0.01),
  crs = crs(srtm)
  )

original_raster <- srtm

resampling_methods_tbl <- tibble(
  name = c(
    "bilinear",
    "cubic",
    "lanczos",
    "min",
    "average",
    "mode"
  ),
  description = c(
    "Bilinear Interpolation",
    "Cubic Interpolation",
    "Lanczos Resampling",
    "Minimum",
    "Average (Mean)",
    "Mode"
  )
)

theme_custom <- function(...){
  ggthemes::theme_map(
    base_family = "body_font",
    base_size = 36
  ) +
    theme(
      plot.title = element_text(
        size = 45, 
        margin = margin(0,0,2,0, "mm")
      ),
      legend.position = "bottom",
      legend.text = element_text(
        margin = margin(1,0,0,0, "mm")
      ),
      plot.margin = margin(0,0,0,0, "mm"),
      ...
    )
}

g1 <- ggplot() +
  geom_spatraster(data = original_raster) +
  scale_fill_wiki_c() +
  coord_sf(expand = F) +
  labs(
    title = "Original Raster"
  ) +
  guides(fill = "none") +
  theme_custom()

for (i in 1:6) {
  assign(
    paste0("g", i+1),
    ggplot() +
      geom_spatraster(
        data = terra::resample(
          x = original_raster,
          y = target_rast,
          method = resampling_methods_tbl$name[i]
        )
      ) +
      scale_fill_wiki_c() +
      coord_sf(expand = F) +
      labs(
        title = paste0(resampling_methods_tbl$description[i]),
        fill = NULL
      ) +
      theme_custom(
        legend.key.height = unit(2, "mm"),
        legend.key.width = unit(20, "mm")
      )
  )
}

g <- g1 + g2 + g3 + g4 + g5 + g6 + 
  plot_layout(
    guides = "collect",
  ) + 
  plot_annotation(
    title = "Methods for Re-sampling rasters with {terra}",
    theme = theme(
      legend.title = element_blank(),
      plot.title = element_text(
        size = 60,
        margin = margin(5,0,5,0, "mm"),
        hjust = 0.5
      ),
      legend.position = "bottom"
    )
  )

ggsave(
  filename = here::here("book_solutions", 
                        "images", 
                        "chapter5-e7.png"),
  plot = g,
  height = 2400,
  width = 2400,
  units = "px",
  bg = "white"
)
Figure 19: The various methods of resampling in the {terra} for rasters.

Another method for converting into the resolution of (0.01, 0.01) is to use terra::aggregate() with fact = 0.01 / 0.00083.

Code
srtm <- rast(system.file("raster/srtm.tif", package = "spDataLarge"))

temp <- srtm |> 
  terra::aggregate(fact = 0.01 / 0.0008333333)

ggplot() +
  geom_spatraster(data = temp) +
  scale_fill_wiki_c() + 
  coord_sf(expand = FALSE) +
  theme_minimal()
Figure 20: Using terra::aggregate() to make resolution of 0.01

References

Loecher, Markus, and Karl Ropkins. 2015. RgoogleMapsandloa: UnleashingRGraphics Power on Map Tiles.” Journal of Statistical Software 63 (4). https://doi.org/10.18637/jss.v063.i04.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in r.” https://doi.org/10.1201/9780429459016.
Strimas-Mackey, Matthew. 2023. “Smoothr: Smooth and Tidy Spatial Features.” https://CRAN.R-project.org/package=smoothr.
Teucher, Andy, and Kenton Russell. 2023. “Rmapshaper: Client for ’Mapshaper’ for ’Geospatial’ Operations.” https://CRAN.R-project.org/package=rmapshaper.