MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • SQL & NoSQL Databases
  • 1. Relational Database Fundamentals
    • Introduction to Databases
    • The Record & Table Structure
    • Data Types
    • Primary & Foreign Keys
    • RDBMS Architecture
    • ACID Properties
    • Normalization (1NF, 2NF, 3NF)
    • Relational Pros and Cons
  • 2. SQL: Basic Data Manipulation (DML)
  • 3. SQL: Filtering and Sorting
  • 4. SQL: Aggregation and Relations
  • 5. SQL: Schema Management (DDL)
  • 6. Python and SQL
  • 7. NoSQL Databases
  • References

Normalization

Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy and improve data integrity. It involves dividing larger tables into smaller, well-structured tables and defining relationships between them.

The primary goals of normalization are to:

  • Eliminate Redundant Data: Storing the same piece of data in multiple places is wasteful and can lead to inconsistencies.
  • Prevent Data Anomalies: Anomalies are problems that can occur during data insertion, updates, or deletion. For example, if a customer's address is stored with every order they place, changing their address would require updating multiple records, and it would be easy to miss one.
  • Improve Data Integrity: By ensuring that data is stored logically and consistently, normalization makes the database more reliable.

Normalization is achieved by following a set of rules called "normal forms." Let's walk through the most common ones with a Example.

Example: A Sales Order Table

Imagine we start with a single, unnormalized table to track sales orders.

Unnormalized Table

OrderID OrderDate CustomerID CustomerName CustomerEmail ProductID, ProductName, Quantity, Price
101 2025-09-24 55 Alice Smith alice@email.com (P1, Apple, 10, 0.50), (P2, Banana, 5, 0.25)
102 2025-09-24 82 Bob Johnson bob@email.com (P3, Carrot, 20, 0.10)
103 2025-09-25 55 Alice Smith alice@email.com (P1, Apple, 15, 0.50)

This table has several problems:

  • The last column contains multiple values (a repeating group), which is hard to query.
  • Customer information (Name, Email) is repeated for every order they place.

First Normal Form (1NF): Eliminate Repeating Groups

Rule: Each cell must hold a single, atomic value, and each record must be unique.

To achieve 1NF, we break the repeating product information into separate rows.

1NF Orders Table

OrderID OrderDate CustomerID CustomerName CustomerEmail ProductID ProductName Quantity Price
101 2025-09-24 55 Alice Smith alice@email.com P1 Apple 10 0.50
101 2025-09-24 55 Alice Smith alice@email.com P2 Banana 5 0.25
102 2025-09-24 82 Bob Johnson bob@email.com P3 Carrot 20 0.10
103 2025-09-25 55 Alice Smith alice@email.com P1 Apple 15 0.50

Problem Solved: Each cell now has a single value. New Problem: We have a lot of redundant data. CustomerName, OrderDate, etc., are repeated for each product in an order. Also, ProductName and Price are repeated every time a product is ordered.


Second Normal Form (2NF): Remove Partial Dependencies

Rule: Must be in 1NF, and all non-key attributes must be fully dependent on the entire primary key. (This rule only applies when there is a composite primary key).

Our primary key for the 1NF table is (OrderID, ProductID).

  • OrderDate, CustomerID, CustomerName, and CustomerEmail depend only on OrderID (a part of the primary key). This is a partial dependency.
  • ProductName and Price depend only on ProductID (another part of the primary key). This is also a partial dependency.

To achieve 2NF, we split the table into three separate tables:

Orders Table

OrderID OrderDate CustomerID
101 2025-09-24 55
102 2025-09-24 82
103 2025-09-25 55

Order_Items Table

OrderID ProductID Quantity
101 P1 10
101 P2 5
102 P3 20
103 P1 15

Products Table

ProductID ProductName Price
P1 Apple 0.50
P2 Banana 0.25
P3 Carrot 0.10

Problem Solved: We've eliminated the partial dependencies. New Problem: In the Orders table, we still have a transitive dependency.


Third Normal Form (3NF): Remove Transitive Dependencies

Rule: Must be in 2NF, and every non-key attribute must depend only on the primary key, not on other non-key attributes.

In our new Orders table, the primary key is OrderID.

  • CustomerID depends on OrderID.
  • However, CustomerName and CustomerEmail depend on CustomerID, which is a non-key attribute. This is a transitive dependency.

To achieve 3NF, we split the Orders table again:

Customers Table

CustomerID CustomerName CustomerEmail
55 Alice Smith alice@email.com
82 Bob Johnson bob@email.com

Orders Table (Final)

OrderID OrderDate CustomerID
101 2025-09-24 55
102 2025-09-24 82
103 2025-09-25 55

Our database is now in 3NF, with four well-structured tables (Customers, Products, Orders, Order_Items). Data redundancy is minimized.

ERD for Normalized Database (3NF)


The Modern Trend: Denormalization for Performance

Historically, the primary driver for normalization was to save disk space, which was incredibly expensive. Today, storage is cheap, but computational power (CPU) and query response time are often more critical.

A highly normalized database often requires complex JOIN operations to retrieve data. For example, to display a full order summary, we would need to join all four of our tables. For a database under heavy load, performing thousands of these complex joins per second can be very slow.

This has led to the practice of denormalization: the process of intentionally adding redundant data back into a database to improve query performance.

Why Denormalize? The goal is to reduce the number of joins needed for frequent queries.

Example: When displaying a customer's order history, we almost always need the product's name and price. In our 3NF design, getting this requires joining Order_Items with the Products table.

A denormalized approach might be to add ProductName and Price back into the Order_Items table:

Order_Items Table (Denormalized)

OrderID ProductID ProductName Quantity PriceAtTimeOfSale
101 P1 Apple 10 0.50
101 P2 Banana 5 0.25

Advantages:

  • Faster Reads: To get an order's details, we now only need to query this one table. No join is needed, which is a huge performance win for a very common operation.
  • Historical Accuracy: Storing the price in this table is actually a good thing. What if the price of an Apple changes in the Products table tomorrow? This denormalized table preserves the price at the time of the sale.

Disadvantages:

  • Increased Storage: We are now storing the product name and price multiple times.
  • More Complex Updates: If a product's name is corrected (e.g., a typo fix), we would theoretically need to update it in every Order_Items record. In practice, this is often deemed an acceptable trade-off.

Today, database design is a balancing act. While normalization provides a crucial foundation for data integrity, designers often selectively denormalize their databases to optimize for the most frequent and critical read operations, accepting a small amount of redundancy in exchange for speed. This philosophy is also a core principle behind many NoSQL database designs.

Privacy Policy | Terms & Conditions