Chapter 22

Databases

Author

Aditya Dahiya

Published

October 5, 2023

library(DBI)
library(dbplyr)
library(tidyverse)
# Reading in the data as a database

connection <- DBI::dbConnect(duckdb::duckdb())
dbWriteTable(connection, name = "db_diamonds", value = ggplot2::diamonds)

db_df <- tbl(connection, "db_diamonds")

22.5.10 Exercises

Question 1

What is distinct() translated to? How about head()?

As we can see below, the dplyr function distinct() translates into SELECT DISTINCT <variable name> .

Similarly, the function head() translates to SELECT * FROM <table name> LIMIT 6.

db_df |>
  distinct(cut) |>
  show_query()
<SQL>
SELECT DISTINCT cut
FROM db_diamonds
db_df |>
  head() |>
  show_query()
<SQL>
SELECT *
FROM db_diamonds
LIMIT 6

Question 2

Explain what each of the following SQL queries do and try recreate them using dbplyr.

SELECT *  
FROM flights 
WHERE dep_delay < arr_delay  

The code above is equivalent to dplyr’s flights |> filter(dep_delay < arr_delay) .

SELECT *, distance / (airtime / 60) AS speed 
FROM flights

The code above is equivalent to dplyr’s flights |> mutate(speed = distance / (airtime / 60)) .