MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • 1: Data Visualization with ggplot2
  • 2: Data Transformation with dplyr
  • 3: Data Tidying & Joins
  • 4: Exploratory Data Analysis
  • 5: Statistical Modeling
  • 6: Database Queries & SQL
    • Interacting with SQL in R
  • 7: Interactive Dashboards
  • 8. Bad Visualization Examples
  • 9. Glossary

Interacting with SQL in R

Why Learn Database Queries inside R?

In professional enterprise environments, datasets are almost never stored in CSV flat files on your local machine. They are stored in secure, central relational databases.

If you have a database of 500 million records, downloading the entire table to R's memory will freeze your R session or crash your computer.

Instead, you want to:

  1. Connect to the database from R.
  2. Query the data using SQL (Structured Query Language) or let R translate your tidyverse code into SQL.
  3. Bring only the filtered, aggregated query result back to R for plotting.

In this chapter, we will learn how to interface with SQLite databases using the DBI package and harness the dbplyr translation engine to convert standard dplyr verbs to SQL queries automatically.


1. Connecting to SQL Databases: DBI

R connects to databases using driver packages (such as RSQLite for SQLite files, or RPostgres for PostgreSQL servers) combined with the unified interface DBI.

library(DBI)

# Connect to an in-memory SQLite database
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# Write R's mpg table into a SQL table named "cars"
dbWriteTable(con, "cars", mpg)

# List all tables in the database
dbListTables(con) # "cars"

2. Executing Raw SQL Queries: dbGetQuery()

To run raw SQL SELECT queries and return the results as standard R data frames:

# Query the database
result <- dbGetQuery(con, "
  SELECT manufacturer, model, hwy 
  FROM cars 
  WHERE hwy > 30 
  LIMIT 5
")

print(result)

Closing Connections

To prevent memory leaks and database locking, always disconnect from your database when done:

# Close the database connection
dbDisconnect(con)

3. Database Translation: dbplyr

Writing complex raw SQL strings inside R scripts can be tedious. The dbplyr package provides a translation engine that lets you write standard dplyr code (like filter, select, group_by), translates it to SQL, and executes it directly on the database!

Lazy Evaluation: tbl(), show_query(), & collect()

  1. tbl(con, "table"): Creates a virtual table reference. R does not download the data yet.
  2. show_query(): Displays the SQL query that R generated.
  3. collect(): Executes the query on the database, downloads the results, and saves them as a standard local tibble in your R environment.
# 1. Re-establish connection and table
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "cars", mpg)

# Create virtual reference
db_cars <- tbl(con, "cars")

# 2. Write standard dplyr code on the database table
db_pipeline <- db_cars |>
  select(manufacturer, model, hwy) |>
  filter(hwy > 30)

# 3. View the translated SQL query
show_query(db_pipeline)
# Output prints: SELECT `manufacturer`, `model`, `hwy` FROM `cars` WHERE (`hwy` > 30.0)

# 4. Fetch the final results to your local R environment
local_result <- collect(db_pipeline)
print(local_result)

dbDisconnect(con)

Hands-on Exercises

Exercise 1: Querying Cylinders

Create an in-memory database, insert the mpg table, and write a raw SQL query. Write R code to:

  1. Connect to an in-memory SQLite database named my_db using dbConnect().
  2. Write R's mpg table to a SQL database table named "vehicle_log" using dbWriteTable().
  3. Query the database using dbGetQuery() to retrieve the manufacturer, model, and cyl column values for all vehicles with exactly 5 cylinders (cyl = 5).
  4. Disconnect from the database.
# Write your code below and click Run Code
Click to view Answer
library(DBI)

# Connect
my_db <- dbConnect(RSQLite::SQLite(), ":memory:")

# Write table
dbWriteTable(my_db, "vehicle_log", mpg)

# Query
five_cyl_vehicles <- dbGetQuery(my_db, "
  SELECT manufacturer, model, cyl 
  FROM vehicle_log 
  WHERE cyl = 5
")
print(five_cyl_vehicles)

# Disconnect
dbDisconnect(my_db)

Exercise 2: Tidyverse to SQL Translation

Connect to an in-memory database and use dbplyr to build and inspect a query. Write R code using pipes to:

  1. Connect to an in-memory SQLite database named temp_db.
  2. Write R's mpg table to the database under the table name "cars_db".
  3. Create a table reference using tbl(temp_db, "cars_db") and pipe it.
  4. Filter for cars of class "suv".
  5. Group by manufacturer and calculate the average highway mileage mean_hwy = mean(hwy).
  6. Use show_query() to inspect the translated SQL code.
  7. Use collect() to run the query and download the final table to R.
  8. Disconnect from temp_db.
# Write your code below and click Run Code
Click to view Answer
library(DBI)
library(dplyr)

# Connect & write
temp_db <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(temp_db, "cars_db", mpg)

# Build pipeline on virtual table
db_ref <- tbl(temp_db, "cars_db")

query_pipeline <- db_ref |>
  filter(class == "suv") |>
  group_by(manufacturer) |>
  summarize(mean_hwy = mean(hwy))

# Show query
show_query(query_pipeline)

# Fetch data
final_table <- collect(query_pipeline)
print(final_table)

# Close connection
dbDisconnect(temp_db)
Privacy Policy | Terms & Conditions