MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • 1: Data Visualization with ggplot2
  • 2: Data Transformation with dplyr
  • 3: Data Tidying & Joins
  • 4: Exploratory Data Analysis
  • 5: Statistical Modeling
  • 6: Database Queries & SQL
  • 7: Interactive Dashboards
  • 8. Bad Visualization Examples
  • 9. Glossary

Bad Visualization Examples

Why Learn Data Ethics and Visualization Pitfalls?

Data visualization is a double-edged sword. It has the power to simplify complex trends, but it also has the power to mislead, distort, and bias the reader's interpretation.

Sometimes, charts are designed poorly due to mechanical mistakes (like forgetting to treat category numbers as factors). Other times, charts are manipulated intentionally (like starting a bar chart's axis at 80% to exaggerate a minor 2% sales gain).

In data science, we must build charts that are both honest and readable. In this chapter, we will inspect six common visualization mistakes taught in the course and learn how to write correct ggplot2 code to fix them.


1. Truncated Y-Axis (Misleading Heights)

The Mistake

In a bar chart, the height of the bar represents the magnitude of the value. If you truncate the vertical axis (e.g. starting the y-axis at 80 instead of 0), you distort the proportion, making a tiny difference look massive.

library(tidyverse)

# Misleading: Exaggerates differences by starting y-axis at 20
# ggplot(mpg, aes(x = class, y = hwy)) +
#   geom_bar(stat = "summary", fun = "mean") +
#   coord_cartesian(ylim = c(20, 35)) # BAD! Exaggerates heights

The Rule

  • Bar charts must always start at 0 on the value axis.
  • Scatter plots or line charts can start at non-zero values if the goal is to examine fine variation, but bar charts are strict.

2. Pie Chart Abuse (Visual Overload)

The Mistake

Human eyes are poor at comparing angles and area sizes, especially when a pie chart is split into more than 3 or 4 slices. If you have 10 categories, a pie chart becomes a colorful, unreadable wheel.

# Bad: Pie chart showing many slices
# (Do not use coord_polar on stacked bars with high cardinality!)

The Fix

Replace pie charts with sorted horizontal bar charts. Bar charts line up categories along a single straight baseline, allowing the eye to immediately compare lengths:

# Good: Horizontal bar chart sorted by frequency
mpg |>
  count(class) |>
  ggplot(aes(x = reorder(class, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(x = "Vehicle Class", y = "Count")

3. Spaghetti Plots (Line Overlap)

The Mistake

Plotting 15 lines of time-series data on a single line chart creates a messy web. The reader cannot track individual trends.

# Bad: Too many lines smudged together
# ggplot(many_countries_data, aes(x = Year, y = GDP, color = Country)) + geom_line()

The Fix

  1. Faceting: Split the lines into a grid of small subplots using facet_wrap().
  2. Highlighting: Draw the target line in a bold, vibrant color and draw all other background comparison lines in light grey (color = "grey80").
# Good: Use faceting to split lines cleanly
# ggplot(many_countries_data, aes(x = Year, y = GDP)) +
#   geom_line() +
#   facet_wrap(~ Country)

4. Continuous Scales on Categorical Numbers

The Mistake

If you map a category represented by numbers (like cylinder count 4, 5, 6, 8) to color, R treats it as a continuous gradient. This implies that a cylinder count of 7 exists on the color spectrum, which is false.

# Bad: Continuous gradient legend for categorical numbers
ggplot(mpg, aes(x = displ, y = hwy, color = cyl)) +
  geom_point(size = 3)

The Fix

Wrap the column in factor() to force discrete category legend groupings:

# Good: Distinct color levels for categories
ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point(size = 3) +
  labs(color = "Cylinders")

5. Misleading Stacked Bar Baselines

The Mistake

Stacked bar charts stack segments on top of each other. Except for the very bottom segment (which rests on the zero line), the baseline for the middle and top segments shifts constantly. This makes it impossible to compare their heights visually.

# Bad: Hard to compare the middle 'class' segments across years
ggplot(mpg, aes(x = year, fill = class)) +
  geom_bar(position = "stack")

The Fix

Use side-by-side bar charts by setting position = "dodge":

# Good: Elements stand next to each other on a shared baseline
ggplot(mpg, aes(x = factor(year), fill = class)) +
  geom_bar(position = "dodge") +
  labs(x = "Year")

6. Double Y-Axes (Implied Associations)

The Mistake

Creating a single chart with two separate vertical axes scales the data arbitrarily. You can manipulate the scaling factors to make two completely unrelated trends line up, implying a false causal relationship.

The Fix

Draw two separate subplots stacked vertically, sharing a common horizontal axis. This allows the reader to inspect correlation without misleading scaling overlaps.


Hands-on Exercises

Exercise 1: Fixing a Gradient Legend

Identify the mistake in the code below and write the corrected ggplot2 code.

# Messy: color gradient scale for cylinders
ggplot(mpg, aes(x = displ, y = hwy, color = cyl)) + geom_point()

Fix it by treating the cylinder column as a discrete category.

# Write your code below and click Run Code
Click to view Answer
library(tidyverse)

# Fix: Wrap cyl in factor()
ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point() +
  labs(color = "Cylinder Count")

Exercise 2: Dodging Stacked Bars

A bar chart stacks vehicle classes.

ggplot(mpg, aes(x = factor(year), fill = drv)) + geom_bar(position = "stack")

Write R code to modify this chart, positioning the drive train bars side-by-side (position = "dodge") to allow direct visual comparison of their counts.

# Write your code below and click Run Code
Click to view Answer
library(tidyverse)

ggplot(mpg, aes(x = factor(year), fill = drv)) +
  geom_bar(position = "dodge") +
  labs(x = "Year", fill = "Drivetrain")
Privacy Policy | Terms & Conditions