MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • 1: Data Visualization with ggplot2
    • Aesthetic Mappings & Geoms
    • Facets & Coordinate Systems
    • Themes & Labels
  • 2: Data Transformation with dplyr
  • 3: Data Tidying & Joins
  • 4: Exploratory Data Analysis
  • 5: Statistical Modeling
  • 6: Database Queries & SQL
  • 7: Interactive Dashboards
  • 8. Bad Visualization Examples
  • 9. Glossary

Themes, Labels & Customization

Why Customize Plots in Exploratory Data Analysis?

A default chart is useful for personal exploration, but it is insufficient for communication.

Imagine you are presenting a customer analysis report to executives.

  • A default chart with column headers like displ and hwy as axis labels will confuse stakeholders who do not know the dataset's schema.
  • Points representing anomalies (like a single high-performance car with weird mileage) will look like errors unless you add a text label directly to that point explaining why it is there.
  • The chart background grid might look too cluttered for a slide deck.

To turn your graphs into data-driven stories, you must add clear titles and descriptions, customize text alignments, apply minimal presentation backgrounds, and export them in high-resolution formats. Let's learn how to customize our ggplot charts.


1. Descriptive Titles and Labels: labs()

Use the labs() function to set titles, subtitles, axis labels, captions, and legend names:

library(tidyverse)

ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(position = "jitter", alpha = 0.7) +
  labs(
    title = "Engine Size vs. Highway Fuel Efficiency",
    subtitle = "Analysis of 234 car models (1999-2008)",
    caption = "Source: US EPA (fueleconomy.gov)",
    x = "Engine Displacement (Litres)",
    y = "Highway Mileage (Miles Per Gallon)",
    color = "Cylinder Count" # Renames the legend title
  )

2. Text Annotations: annotate()

To call out a specific data point, outlier, or reference line directly on the canvas without mapping a whole table of labels, use the annotate() function:

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(position = "jitter", alpha = 0.5) +
  # Add a text annotation pointing to a specific coordinate
  annotate(
    geom = "text",
    x = 6, y = 40,
    label = "High Efficiency Outlier",
    color = "red",
    fontface = "bold"
  ) +
  # Add a helper segment line (arrow) pointing to the outlier
  annotate(
    geom = "segment",
    x = 5.8, y = 38,
    xend = 5.4, yend = 33,
    arrow = arrow(length = unit(0.2, "cm")),
    color = "red"
  )

3. Tick Label Rotation: theme()

If your horizontal axis categories have long names, you can rotate them using the theme() function. You can specify:

  • angle: Rotation angle in degrees (e.g., 45 or 90).
  • hjust: Horizontal justification (1 aligns the text edge with the tick mark).
  • vjust: Vertical justification (1 aligns the top edge vertically).
ggplot(mpg, aes(x = class)) +
  geom_bar(fill = "steelblue") +
  # Rotate tick labels by 45 degrees
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

4. Built-in Presentation Themes

ggplot2 provides several built-in themes that instantly style the background grid, borders, and margins:

  • theme_minimal(): A clean background with light grey grid lines and no outer border (recommended for reports).
  • theme_classic(): A simple axis-line layout with no grid lines (similar to academic journals).
  • theme_bw(): A black-and-white grid layout with an outer border.
  • theme_light(): Light grey borders and grid lines for a crisp presentation.
# Apply a clean minimal theme
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  theme_minimal()

5. Visualizing Categorical Data: Integrating forcats

By default, R and ggplot2 sort categorical axis levels alphabetically. This is often unhelpful and makes reading charts difficult. To arrange plots logically—such as sorting bar heights, ordering boxplots by numeric efficiency, or grouping small categories—you can use the forcats package directly inside your aesthetics:

1. Reordering Bar Charts by Frequency: fct_infreq()

To sort a categorical bar chart by its category volume rather than alphabetically:

library(forcats)

# Plot class sorted by count
ggplot(mpg, aes(y = fct_infreq(class))) +
  geom_bar(fill = "steelblue") +
  labs(
    title = "Common Vehicle Classes",
    y = "Vehicle Class",
    x = "Count"
  ) +
  theme_minimal()

2. Reordering Boxplots by a Numeric Variable: fct_reorder()

To sort categories on your axis based on another numeric column (e.g. arranging car manufacturers by their median highway mileage hwy rather than alphabetically):

library(forcats)

# Reorder manufacturer based on median hwy mileage
ggplot(mpg, aes(x = hwy, y = fct_reorder(manufacturer, hwy, .fun = median))) +
  geom_boxplot(fill = "aquamarine3") +
  labs(
    title = "Highway Mileage by Manufacturer",
    subtitle = "Sorted by median mileage",
    y = "Manufacturer",
    x = "Highway Mileage (MPG)"
  ) +
  theme_minimal()

3. Lumping Rare Categories on Plots: fct_lump_n()

If your dataset contains dozens of small groups, plotting them all makes the chart messy. Use fct_lump_n() to keep only the largest groups and automatically group the rest into "Other":

library(forcats)

# Keep the top 4 manufacturers, group the rest as 'Other'
ggplot(mpg, aes(y = fct_lump_n(manufacturer, n = 4, other_level = "Other Brands"))) +
  geom_bar(fill = "tomato2") +
  labs(
    title = "Top Car Manufacturers by Volume",
    y = "Manufacturer Group",
    x = "Count"
  ) +
  theme_minimal()

6. Exporting High-Resolution Charts: ggsave()

To save your plot to disk as a PDF, PNG, or SVG image, use the ggsave() function. By default, it saves the last plot rendered:

# Generate plot
ggplot(mpg, aes(x = displ, y = hwy)) + 
  geom_point() + 
  theme_minimal()

# Save the plot to the local directory in high resolution (300 DPI)
ggsave("fuel_efficiency_plot.png", width = 8, height = 6, dpi = 300)

# Save as vector graphic PDF (ideal for prints and documents)
ggsave("fuel_efficiency_plot.pdf", width = 10, height = 7)
In-Browser Sandbox Limitation

Running ggsave() inside this ebook's code editors saves the file to a virtual, sandboxed file system in your browser memory rather than your physical hard drive. These saved files cannot be downloaded directly from the ebook. To export and save actual image or PDF files to your computer, copy and run this code locally in RStudio.


Hands-on Exercises

Exercise 1: Presentation-Ready Chart

Create a scatter plot of cty (city miles per gallon) vs hwy (highway miles per gallon) using the mpg dataset. Write R code to:

  1. Initialize the ggplot and plot a scatter point layer.
  2. Add labels using labs():
    • Title: "City vs. Highway Fuel Performance"
    • x-axis label: "City Mileage (MPG)"
    • y-axis label: "Highway Mileage (MPG)"
  3. Apply theme_minimal() to the chart.
# Write your code below and click Run Code
Click to view Answer
library(tidyverse)

ggplot(mpg, aes(x = cty, y = hwy)) +
  geom_point() +
  labs(
    title = "City vs. Highway Fuel Performance",
    x = "City Mileage (MPG)",
    y = "Highway Mileage (MPG)"
  ) +
  theme_minimal()

Exercise 2: Rotated Categorical Distribution

Create a boxplot of highway mileage (hwy) grouped by vehicle class (class). Write R code to:

  1. Plot class on the x-axis and hwy on the y-axis using geom_boxplot().
  2. Color/fill the boxplots based on class.
  3. Add a title using labs() and apply theme_classic().
  4. Rotate the x-axis tick labels by 90 degrees (vertical) and align them to the tick marks using hjust = 1 inside theme().
# Write your code below and click Run Code
Click to view Answer
library(tidyverse)

ggplot(mpg, aes(x = class, y = hwy, fill = class)) +
  geom_boxplot() +
  labs(title = "Highway Mileage by Vehicle Class") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Exercise 3: Sorting and Lumping Categorical Plots

Create a bar chart of car manufacturers showing the number of vehicles in the dataset, but sorted by frequency, and grouped into the top 5 plus an "Other" category. Write R code to:

  1. Plot manufacturer on the vertical axis (y axis).
  2. Wrap manufacturer in fct_lump_n() to keep only the top 5 manufacturers and lump the rest into "Other Manufacturers".
  3. Wrap that lumped variable inside fct_infreq() to sort the bars by volume (most frequent first).
  4. Render the bars using geom_bar(fill = "darkturquoise").
  5. Add a title using labs() and apply theme_minimal().
# Write your code below and click Run Code
Click to view Answer
library(tidyverse)
library(forcats)

# 1. Create the plot with lumped and frequency-ordered manufacturer bars
ggplot(mpg, aes(y = fct_infreq(fct_lump_n(manufacturer, n = 5, other_level = "Other Manufacturers")))) +
  geom_bar(fill = "darkturquoise") +
  labs(
    title = "Top Car Manufacturers by Volume",
    y = "Manufacturer",
    x = "Number of Vehicles"
  ) +
  theme_minimal()
Privacy Policy | Terms & Conditions