GROUP BY Clause
The GROUP BY statement is one of the most powerful features in SQL. It is used to group rows that have the same values in specified columns into summary rows. It is almost always used with aggregate functions (COUNT, SUM, AVG, MAX, MIN) to perform a calculation on each group.
Database Preparation
Run this block to set up the Authors and Books tables.
DROP TABLE IF EXISTS Authors;
CREATE TABLE Authors (
AuthorID INTEGER PRIMARY KEY,
AuthorName VARCHAR(100)
);
INSERT INTO Authors (AuthorID, AuthorName)
VALUES (101, 'J.R.R. Tolkien'),
(102, 'George Orwell'),
(103, 'Frank Herbert');
DROP TABLE IF EXISTS Books;
CREATE TABLE Books (
BookID INTEGER PRIMARY KEY,
Title VARCHAR(255),
AuthorID INTEGER,
Genre VARCHAR(50),
PublishedYear INTEGER,
FOREIGN KEY (AuthorID) REFERENCES Authors(AuthorID)
);
INSERT INTO Books (BookID, Title, AuthorID, Genre, PublishedYear)
VALUES (1, 'The Hobbit', 101, 'Fantasy', 1937),
(2, '1984', 102, 'Dystopian', 1949),
(3, 'The Fellowship of the Ring', 101, 'Fantasy', 1954),
(4, 'Dune', 103, 'Sci-Fi', 1965);
SELECT 'Database Ready' AS Status;
Basic GROUP BY with COUNT
Query:
SELECT
AuthorID,
COUNT(BookID) AS NumberOfBooks
FROM
Books
GROUP BY
AuthorID;
Example 1: Using WHERE to Filter Rows Before Grouping
Query:
SELECT
AuthorID,
COUNT(BookID) AS NumberOfBooks
FROM
Books
WHERE
PublishedYear > 1950
GROUP BY
AuthorID;
Example 2: Using HAVING to Filter Groups After Aggregation
Query:
SELECT
AuthorID,
COUNT(BookID) AS NumberOfBooks
FROM
Books
GROUP BY
AuthorID
HAVING
COUNT(BookID) > 1;
Explanation:
FROM Books: The query considers theBookstable.GROUP BY AuthorID: It groups all rows with the sameAuthorIDtogether. This creates three groups: one for Author 101, one for 102, and one for 103.SELECT ... COUNT(BookID): For each of these groups, it counts the number ofBookIDs.
**Result:** | AuthorID | NumberOfBooks | | :--- | :--- | | 101 | 2 | | 102 | 1 | | 103 | 1 |
Filtering: WHERE vs. HAVING
This is a critical concept. Both WHERE and HAVING are used for filtering, but they operate at different stages of a query.
WHEREfilters rows before they are grouped.HAVINGfilters groups after they have been created.
You cannot use an aggregate function in a WHERE clause, but you can in a HAVING clause.
Example 1: Using WHERE to Filter Rows Before Grouping
Operation: Count how many books each author has, but only consider books published after 1950.
Query:
SELECT
AuthorID,
COUNT(BookID) AS NumberOfBooks
FROM
Books
WHERE
PublishedYear > 1950
GROUP BY
AuthorID;
Explanation:
FROM Books WHERE PublishedYear > 1950: The database first filters theBookstable, removing any rows for books published in or before 1950. The rows for "The Hobbit" (1937) and "1984" (1949) are eliminated.GROUP BY AuthorID: It then groups the remaining rows byAuthorID.COUNT(BookID): Finally, it counts the books in these new, smaller groups.
**Result:** | AuthorID | NumberOfBooks | | :--- | :--- | | 101 | 1 | | 103 | 1 | *(Author 102 is not in the result because their only book, "1984", was filtered out by the `WHERE` clause before the grouping happened.)*
Example 2: Using HAVING to Filter Groups After Aggregation
Operation: Find authors who have written more than one book.
Query:
SELECT
AuthorID,
COUNT(BookID) AS NumberOfBooks
FROM
Books
GROUP BY
AuthorID
HAVING
COUNT(BookID) > 1;
Explanation:
FROM Books GROUP BY AuthorID: The database first groups all rows byAuthorIDand calculates theCOUNTfor each group, just like in our first example. This intermediate result would be (101: 2 books, 102: 1 book, 103: 1 book).HAVING COUNT(BookID) > 1: It then filters these groups, keeping only those where the calculated count is greater than 1.
**Result:** | AuthorID | NumberOfBooks | | :--- | :--- | | 101 | 2 | *(Authors 102 and 103 are not in the result because their groups did not meet the `HAVING` condition.)*
Hands-on Exercise
Task: Count the number of books in each Genre.
Your Query:
-- Write your query here
Expected Results:
| Genre | BookCount |
|---|---|
| Fantasy | 2 |
| Dystopian | 1 |
| Sci-Fi | 1 |