Facets & Coordinate Systems
Why Learn Facets and Coordinates in Exploratory Data Analysis?
Imagine you are analyzing global shipping routes. You have a dataset of flight delays across dozens of airline carriers. You want to see the distribution of arrival delays for each carrier.
- If you try to overlay dozens of overlapping histograms on a single chart using different colors, the result is a messy "spaghetti plot" that is completely unreadable.
- If you use a single boxplot, you lose the detailed shapes of the distributions.
To solve this visual overload, you need Facets to split a single chart into multiple side-by-side subplots (one for each carrier).
Additionally, if your carrier names are long text strings, they will overlap and smudge together on the horizontal axis. You need to rotate your layout using a Coordinate Flip to display the categories vertically and read them clearly. Let's learn how facets and coordinate modifications work.
1. Faceting: Splitting the Canvas
Faceting partitions your plot into a grid of subplots based on categorical columns, sharing the same x and y axes for easy comparison.
Single Variable Facets: facet_wrap()
facet_wrap(~ variable) creates a 1D sequence of panels wrapped into a 2D grid. The tilde symbol ~ means "by".
library(tidyverse)
# Split scatter plot of displ vs hwy into separate panels for each car class
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class)
You can control the arrangement of subplots using nrow (number of rows) and ncol (number of columns):
# Force the panels to render in a single row
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class, nrow = 1)
Two Variable Grid Facets: facet_grid()
facet_grid(row_variable ~ column_variable) creates a 2D matrix of subplots, cross-tabulating two categorical columns:
# Rows represent drive train types (drv), columns represent cylinder counts (cyl)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ cyl)
2. Coordinate Systems
A coordinate system maps positions on the chart canvas to physical screen locations. R defaults to standard Cartesian coordinates, but provides alternative systems.
Axis Flipping: coord_flip()
Flipping the x and y axes is the easiest way to display horizontal bar charts or boxplots, especially when categorical labels are long:
# Default: Labels overlap on the bottom
ggplot(mpg, aes(x = class)) +
geom_bar()
# Flipped: Labels are horizontal and easy to read
ggplot(mpg, aes(x = class)) +
geom_bar() +
coord_flip()
Polar Coordinates: coord_polar()
Polar coordinates express data points in terms of an angle and a distance (magnitude) from the center. Applying coord_polar() to a bar chart transforms it into a pie or donut-style chart:
# Simple stacked bar transformed into a polar chart
ggplot(mpg, aes(x = factor(1), fill = class)) +
geom_bar(width = 1) +
coord_polar(theta = "y")
3. Real-World Case Study: Temperature Jumps in Polar Coordinates
In environmental data science, time and seasonal data are naturally circular. We can use polar coordinates to visualize seasonal temperature swings and find when massive jumps occur during the year.
R has a built-in diff(x) function that computes the difference between consecutive vector values. We can use it to find the daily temperature variance in a weather dataset:
# Example of diff() on a vector
temperatures <- c(32, 35, 42, 38)
print(diff(temperatures)) # 3 7 -4
# Adding NA to pad the result vector to match dataset row count:
# daily_jumps <- abs(c(NA, diff(dataset$temperature)))
When we plot these jumps over the day of the year (yday()) and apply polar coordinates, we get a radial calendar showing exactly which seasons experience the highest weather volatility:
# Plotting extreme daily temperature jumps (>10 degrees) over a radial year
# ggplot(filter(aatemp, temp_jump > 10), aes(x = yday(DATE))) +
# geom_histogram(binwidth = 5, fill = "orange", color = "white") +
# coord_polar() +
# labs(title = "Annual Distribution of Volatile Temperature Jumps")
Hands-on Exercises
Exercise 1: Engine Class Breakdown
Explore engine size (displ) vs. highway mileage (hwy) across different car drivetrain configurations (drv column: front-wheel, rear-wheel, 4WD).
Write R code to:
- Create a scatter plot of
displ(x-axis) vs.hwy(y-axis). - Color the points by
classof the car. - Split the plot into panels horizontally (in a single row) by the
drvcolumn usingfacet_wrap().
# Write your code below and click Run Code
Click to view Answer
library(tidyverse)
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
facet_wrap(~ drv, nrow = 1)
Exercise 2: Flipped Sales Statistics
Using the mpg dataset:
Write R code to:
- Create a boxplot chart showing
classon the x-axis and highway mileagehwyon the y-axis. - Fill the boxes with colors based on
class. - Flip the coordinate system using
coord_flip()so that the boxplots render horizontally and the car class names are fully readable.
# Write your code below and click Run Code
Click to view Answer
library(tidyverse)
ggplot(mpg, aes(x = class, y = hwy, fill = class)) +
geom_boxplot() +
coord_flip()