Computing shortest routes in the sea that avoid land
Using data from Killer Whales encounters in Salish Sea to plot routes of their recorded encounters, and showing those routes that dont intersect land - i.e., removing erroneous data.
Routes
Background Map
Raster Map
{sf}
Author
Aditya Dahiya
Published
October 20, 2024
Dataset used: This #TidyTuesday dataset comes from the Center for Whale Research (CWR), which monitors Southern Resident killer whales in the Salish Sea, part of the Pacific Northwest. The dataset, scraped by Jadey Ryan and documented here, contains information on encounters from 2017 to 2024. Each encounter involves photographing and identifying individual whales. The data can be accessed via the {orcas} R package and includes variables like encounter duration, location, and pod. While the dataset is mostly tidy, some inconsistencies such as missing values and negative durations remain. |Source|Data.
Here’s my #TidyTuesday Visualization for this project in Figure 1, and the code used and the visualization webpage.
Method 1: Using {geosphere} and {sf}
Step 1: Loading libraries, getting the data and cleaning it
Code
# Loading the Librarieslibrary(tidyverse) # Data wranglinglibrary(sf) # SF objectslibrary(showtext) # Using google fonts in Rlibrary(geosphere) # Spherical trigonometry for geography# Set fonts for including in all graphicsfont_add_google("Saira Semi Condensed", "body_font")font_add_google("Saira Extra Condensed","caption_font")showtext_auto()# Load orcas dataorcas <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-10-15/orcas.csv')sessioninfo::session_info()$packages |>as_tibble() |> dplyr::select(package, version = loadedversion, date, source) |>filter(package %in%.packages()) |>arrange(package) |> janitor::clean_names(case ="title" ) |> gt::gt() |> gt::opt_interactive(use_search =TRUE ) |> gtExtras::gt_theme_espn()
Step 2: Clean the data, make it in {sf} format
Code
# A cleaner tibble to use for our visualizationdf_sf <- orcas |>as_tibble() |> dplyr::select(year, duration, begin_latitude, begin_longitude, end_latitude, end_longitude) |># Convert the duration of encounter into secondsmutate(# remove parenthesis content from durationduration =str_remove(duration, "\\s*\\(.*\\)"),# remove the "s" for secondsduration =str_extract(duration, "-?\\d+"),# convert the duration into numberduration =as.numeric(duration) ) |># Remove aberrant observation with durations less than zerofilter(duration >=0) |># Remove observations with missingnessdrop_na() |># Allot an ID number to each finally selected observationmutate(id =row_number()) |> dplyr::relocate(id, .before =everything())######################################################### Get a bounding box on the coordinates of encountersbbox_orcas <-bind_rows(# Geometry column of starting coordiantes sf::st_as_sf( df_sf |> dplyr::select(begin_latitude, begin_longitude),coords =c("begin_longitude","begin_latitude"),crs ='EPSG:4326' ) |>mutate(type ="start_coords"),# Geometry column of starting coordiantes sf::st_as_sf( df_sf |> dplyr::select(end_latitude, end_longitude),coords =c("end_longitude","end_latitude"),crs ='EPSG:4326' ) |>mutate(type ="end_coords")) |>st_bbox()####################################################### Display cleaned datadf_sf |> gt::gt() |> gt::opt_interactive() |> gtExtras::gt_theme_espn() |> gt::fmt_number(columns =-c(year, id),decimals =2 ) |> gt::tab_header(title ="Cleaned Data on starting and ending coordinates",subtitle ="Recorded Encounters of Orcas (Southern Killer Whales) in the Salish Sea (2017-2024)" )
Step 3: Computing routes using geosphere::gcIntermediate()
Step 4: Getting background maps for the Salish Sea area: USA and Canada
For this, we first convert our sf object on Map of USA and Canada, shown in Figure 3 (a) into a SpatVector. In the {terra} package in R, a SpatVector is the main class used to represent and work with vector data in geographic information system (GIS) contexts. A SpatVector can store points, lines, polygons, or any combination of these geometries, along with associated attributes (data linked to these geometries). We can create a SpatVector from:
Shapefiles (widely used for vector data in GIS)
Other vector file formats like GeoJSON, KML, or GPKG.
R objects such as data frames or matrices that contain coordinate information.
The reason to create a SpatVector is for performing spatial operations like buffering, intersecting, or spatial joins.The terra::vect() is the method to create these objects from various file formats or other R objects.
Then, we use the terra::crop() to crop USA and Canada map to a specified geographic extent Figure 3 (b) defined by bounding box of the sf objects of orcas created in previous step. When applied to a SpatVector object (vector data), terra::crop() trims the geometries (points, lines, or polygons) so that only the portions within a specified spatial extent remain. Lastly, we re-convert the cropped SpatVector back into an sf object, shown in ?@fig-basemap1-3
Code
# Extract country borders database_map <- rgeoboundaries::gb_adm0(country =c("USA", "Canada")) %>% rmapshaper::ms_simplify(0.5)# checking the size# object.size(base_map) |> print(units = "Mb")# 4.3 Mbggplot(base_map) +geom_sf() +coord_sf(crs = usmap::usmap_crs() )sea <- terra::crop(terra::vect(base_map), bbox_orcas)# Finally, reconvert the Cropped area back into an sf objectsea_sf <- sea |>st_as_sf()ggplot(sea_sf) +geom_sf()
Step 5: Computing and Removing routes intersecting with land (i.e., erroneous data)
Code
# Test if path is only sea. Each logical test if for each ID in the # df1 tibbletest_intersect <-lengths(st_intersects(st_transform(routes, st_crs(base_map)), base_map )) >0# Compute distance for each dist_encounter <-st_length(routes)# Create a second tibble of distance & paths for each encounterdf_routes <- routes |>st_transform(st_crs(base_map)) |>bind_cols(id = df_sf$id) |>bind_cols(whether_intersect_land = test_intersect) |>bind_cols(dist_encounter =as.numeric(dist_encounter)) |>left_join(df_sf |> dplyr::select(year, id, duration)) |>mutate(speed = dist_encounter / duration)# A vector of IDs whose paths dont intersect landids_to_plot <- df_routes |>filter(whether_intersect_land ==FALSE) |>pull(id)df_routes |> dplyr::relocate(geometry, .after =everything()) |> gt::gt() |># gt::cols_hide(geometry) |> gt::fmt_number(columns =c(dist_encounter, speed),decimals =1 ) |> gt::fmt(columns = geometry,fns =function(x) {str_sub(x, 1, 20)} ) |> gt::opt_interactive() |> gtExtras::gt_theme_espn()
Table 3: A table of intersecting and non-intersecting routes
Step 6: Plotting the non-intersecting routes in Figure 4
Code
# ggmap::register_stadiamaps("YOUR-KEY-HERE")bbox_stadiamap <-c(left = bbox_orcas["xmin"],right = bbox_orcas["xmax"],top = bbox_orcas["ymax"],bottom = bbox_orcas["ymin"])names(bbox_stadiamap) <-c("left", "right", "top", "bottom")# Getting the Stamen Maps for the background tiles as rasterstamen_tiles_lowres <- ggmap::get_stadiamap( bbox_stadiamap,zoom =8,maptype ="stamen_terrain")# Step: 1: # Credits: https://stackoverflow.com/questions/47749078/how-to-put-a-geom-sf-produced-map-on-top-of-a-ggmap-produced-raster by andyteucher on StackOverFlow (https://stackoverflow.com/users/1736291/andyteucher)# Define a function to fix the bbox to be in CRS EPSG:3857ggmap_bbox <-function(map) {# Extract the bounding box (in lat/lon) from the ggmap# to a numeric vector, and set the names to what# sf::st_bbox expects: map_bbox <-setNames(unlist(attr(map, "bb")),c("ymin", "xmin", "ymax", "xmax") )# Coonvert the bbox to an sf polygon, transform it to 3857,# and convert back to a bbox (convoluted, but it works) bbox_3857 <-st_bbox(st_transform(st_as_sfc(st_bbox(map_bbox, crs =4326) ), 3857 ) )# Overwrite the bbox of the ggmap object with the transformed coordinatesattr(map, "bb")$ll.lat <- bbox_3857["ymin"]attr(map, "bb")$ll.lon <- bbox_3857["xmin"]attr(map, "bb")$ur.lat <- bbox_3857["ymax"]attr(map, "bb")$ur.lon <- bbox_3857["xmax"] map}# Use the function to convert our downloaded Raster Files into # the new CRS and new bounding box CRSstamen_tiles_lowres2 <-ggmap_bbox(stamen_tiles_lowres)g <- ggmap::ggmap(stamen_tiles_lowres2) +geom_sf(data = df_routes |>filter(!whether_intersect_land) |>st_transform(crs =3857),mapping =aes(geometry = geometry ),color ="grey10",alpha =0.5,inherit.aes =FALSE,arrow =arrow(angle =20,length =unit(0.5, "mm") ),linewidth =0.2 ) +coord_sf(expand = F ) +labs(x ="Longitude", y ="Latitude" ) +theme_minimal()ggsave(plot = g,filename = here::here("geocomputation", "images", "computing_sea_routes_1.png"),width =900,height =700,units ="px",bg ="white")
Figure 4: This map visualizes the movements of Southern Resident killer whales, with arrows marking the starting and ending points of each recorded encounter. The concentration of arrows within a small area highlights the key regions in the Salish Sea where these encounters occur most frequently. Background map images provided by StadiaMaps.
The packages used in this analysis are shown in the Table 4
Code
# Loading the Librarieslibrary(tidyverse) # Data wranglinglibrary(rgeoboundaries) # Getting country & admin boundaries.library(rmapshaper) # For multi-polygon simplificationlibrary(sf) # SF objectslibrary(terra) # Spatial data analysislibrary(tidyterra) # Tidyverse methods for terra objectslibrary(gdistance) # Distances and Routes on Geographical Gridslibrary(showtext) # Using google fonts in R# Set fonts for including in all graphicsfont_add_google("Saira Semi Condensed", "body_font")font_add_google("Saira Extra Condensed","caption_font")showtext_auto()# Load orcas dataorcas <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-10-15/orcas.csv')sessioninfo::session_info()$packages |>as_tibble() |> dplyr::select(package, version = loadedversion, date, source) |>filter(package %in%.packages()) |>arrange(package) |> janitor::clean_names(case ="title" ) |> gt::gt() |> gt::opt_interactive(use_search =TRUE ) |> gtExtras::gt_theme_espn()
Table 4: List of packages used during this analysis and their versions
Step 2: Clean the data
Code
# A cleaner tibble to use for our visualizationdf1 <- orcas |>as_tibble() |> dplyr::select(year, duration, begin_latitude, begin_longitude, end_latitude, end_longitude) |># Convert the duration of encounter into secondsmutate(# remove parenthesis content from durationduration =str_remove(duration, "\\s*\\(.*\\)"),# remove the "s" for secondsduration =str_extract(duration, "-?\\d+"),# convert the duration into numberduration =as.numeric(duration) ) |># Remove aberrant observation with durations less than zerofilter(duration >=0) |># Remove observations with missingnessdrop_na() |># Allot an ID number to each finally selected observationmutate(id =row_number()) |> dplyr::relocate(id, .before =everything())df1 |> gt::gt() |> gt::opt_interactive() |> gtExtras::gt_theme_espn() |> gt::fmt_number(columns =-c(year, id),decimals =2 )
Table 5: A table of clean data that shows id, year, duration and starting and ending coordinates of each Killer Whale encounter
Step 3: Creating data is specific formats needed by {sf} {terra}
Code
# The starting Coordinates as an sf objectstart_coordinates <- df1 |> dplyr::select(id, begin_latitude, begin_longitude) |> sf::st_as_sf(coords =c("begin_longitude","begin_latitude"),crs ='EPSG:4326' ) |>mutate(type ="start")# The ending Coordinates as an sf object end_coordinates <- df1 |> dplyr::select(id, end_latitude, end_longitude) |> sf::st_as_sf(coords =c("end_longitude","end_latitude"),crs ='EPSG:4326' ) |>mutate(type ="end")# Compiling starting and ending coordinates into a tibbleorcas_sf <- start_coordinates |>bind_rows(end_coordinates)# Extracting the bounding box of that tibble to get our mapbb_orcas <-st_bbox(orcas_sf)orcas_sf |>print(n =10)
Step 4: Getting background maps for the Salish Sea area: USA and Canada
For description of the actions performed, please see Step 4 of the Method 1 above.
Code
# Extract country borders database_map <- rgeoboundaries::gb_adm0(country =c("USA", "Canada")) %>% rmapshaper::ms_simplify(0.5)# checking the size# object.size(base_map) |> print(units = "Mb")# 4.3 Mbggplot(base_map) +geom_sf() +coord_sf(crs = usmap::usmap_crs() )sea <- terra::crop(terra::vect(base_map), bb_orcas)ggplot(sea) +geom_sf()# Finally, reconvert the Cropped area back into an sf objectsea_sf <- sea |>st_as_sf()ggplot(sea_sf) +geom_sf()
(a) Map of entire USA and Canada from {rgeoboundaries}
(b) Using the terra::crop() from {terra} to focus on Salish Sea area
(c) The same Salish Sea area plotted as an sf object
Figure 5: Background Map of the Salish Sea Area in 2 steps: (4.1) Getting map of USA and Canada from {rgeoboundaries}, and (4.2) Cropping out the map of Salish Sea area
Step 5: Converting into rasters to plot compute distances and intersections
terra::rast() function is used to create a raster object or load an existing raster dataset (e.g., GeoTIFF, ASCII, or other raster formats). A raster is a grid of cells (pixels) where each cell has a value representing information such as elevation, temperature, land cover, etc. We use it here to Create an empty raster, defining the number of rows, columns, extent, and coordinate reference system (CRS) to create a raster template.
terra::rasterize() function is then used to convert vector data (points, lines, polygons) into raster format. This process assigns values from vector geometries to raster cells, typically based on whether the geometries overlap with the cells or using attributes from the vector data. For example, here we are Rasterizing polygons: i.e., for each land types: USA, Canada or other, we can rasterize it so that each raster cell represents one of these three.
Code
# Convert vector to raster and set highly diffferent pixel values based on whether an area is sea or land (i.e. not sea)# Step 5.1: Generate an empty raster defining the resolution by# number of rows and columns, and a CRS from sea_sfr <- terra::rast(sea_sf, ncols =500, nrows =500)ggplot() +geom_spatraster(data = r)rr <- terra::rasterize(sea_sf, r, "shapeISO") %>%mutate(shapeISO =case_when( shapeISO %in%c('CAN', 'USA') ~1, # assign value 1 to landTRUE~1000# assign value 1000 to sea ))ggplot() +geom_spatraster(data = rr)
Figure 6: The empty raster of 500 X 500 created using terra::rast()
Figure 7: The raster is enhanced by adding the polygons data from the salish sea area map we cropped in the last step.
Step 6: Computing the distance between starting and ending coordinates
The gdistance::transition() function is used to create a transition matrix from raster data. The Transition Matrix is a sparse matrix where each element represents the movement “cost” or “resistance” from one cell to its neighboring cells. The transition matrix enables the calculation of the most efficient path (e.g., the least-cost path) from one location to another. Thus, it is used for calculating least-cost paths, commute distances, and other kinds of spatial movement analyses.
The transition matrix helps in Movement Modeling: It is used to model movement across a landscape, such as wildlife migration, water flow, or human navigation, where each raster cell’s value might represent an obstacle or ease of travel. Note that we had assigned different value to land and sea raster points.
After creating a transition layer using gdistance::transition(), the gdistance::geoCorrection() function is used to apply geographic corrections to account for the varying distances between raster cells due to the curvature of the Earth or grid layout. This step is crucial when working with spatial data in a geographic coordinate system (e.g., latitude and longitude) where distances between cells are not uniform.
The gdistance::shortestPath() is then used to compute the shortest (or least-cost) path between two points on a raster grid, based on a transition matrix that describes the “cost” or “resistance” of moving from one cell to another. The function calculates this path by minimizing the total cost or resistance, taking into account the values in the transition matrix, which typically represent the difficulty or ease of moving through each cell.
Code
# For quick rendering of this .qmd file, I have not evaluated # this chunk of code, and rather saved the results of "distance"# as an .rds file and reloaded it.# Compute transition matrix from raster pixelsr_trans <- gdistance::transition(x =raster(rr), transitionFunction = mean, directions =16)# object.size(r_trans) |> print(units = "Mb")# 24.7 Mbr_trans <-geoCorrection(r_trans)# object.size(r_trans) |> print(units = "Mb")# 24.7 Mb# Compute the shortest path between start and end for the # first line of the transition matrix, and convert into sf object:distance <- gdistance::shortestPath( r_trans, c(df1 |>filter(id ==1) |>pull(begin_longitude), df1 |>filter(id ==1) |>pull(begin_latitude)), c(df1 |>filter(id ==1) |>pull(end_longitude), df1 |>filter(id ==1) |>pull(end_latitude)), output ="SpatialLines") |>st_as_sf()# Repeat the process for the other points / IDsfor (i in2:nrow(df1)) { temp <- gdistance::shortestPath( r_trans, c(df1 |>filter(id == i) |>pull(begin_longitude), df1 |>filter(id == i) |>pull(begin_latitude)), c(df1 |>filter(id == i) |>pull(end_longitude), df1 |>filter(id == i) |>pull(end_latitude)), output ="SpatialLines") |>st_as_sf() distance <- distance |>bind_rows(temp)}# Add a CRS to the newly created sf object distance <- distance |>st_set_crs(st_crs(base_map))saveRDS(distance, file = here::here("data", "orcas_distance.rds"))
Figure 8: The shortest paths computed using {gdistance} show us that many of them are passing over land - these seem to be errors in the data
Step 7: Check whether paths intersect land, and retain only non-intersecting routes
The sf::st_intersects() function is used to determine whether two spatial geometries intersect. It checks if any part of one geometry touches or overlaps with another.
The sf::st_length() function is then used to calculate the length of geometries represented in sf (simple features) objects. It returns the length of each geometry in the specified unit of measurement.
Code
# Test if path is only sea. Each logical test if for each ID in the # df1 tibbletest_intersect <-lengths(st_intersects(distance, base_map)) >0# Compute distance for each dist_encounter <-st_length(distance)# Create a second tibble of distance & paths for each encounterdf2 <- distance |>bind_cols(id = start_coordinates$id) |>bind_cols(whether_intersect_land = test_intersect) |>bind_cols(dist_encounter =as.numeric(dist_encounter)) |>left_join(df1 |> dplyr::select(year, id, duration)) |>mutate(speed = dist_encounter / duration)# A vector of IDs whose paths dont intersect landids_to_plot <- df2 |>filter(whether_intersect_land ==FALSE) |>pull(id)df2 |> dplyr::relocate(geometry, .after =everything()) |>slice_head(n =10) |>mutate(geometry =as.character(geometry)) |> gt::gt() |># gt::cols_hide(geometry) |> gt::fmt_number(columns =c(dist_encounter, speed),decimals =2 ) |> gt::fmt(columns = geometry,fns =function(x) {str_sub(x, 1, 50)} )