SQL Aggregations Explained: COUNT, SUM, AVG, GROUP BY and HAVING with Examples
In Topic 2 you learned to retrieve and filter individual rows. That is useful — but it is not where the real analysis happens. A manager does not want to see 10,000 individual order rows. They want to know: what is our total revenue this year? Which region is most profitable? Which product category has the highest average order value?
Those questions cannot be answered by looking at individual rows. They require aggregation — combining multiple rows into a single summary figure. This topic covers the five aggregate functions every analyst uses daily, along with GROUP BY and HAVING which control how aggregations are grouped and filtered.
What Is Aggregation and Why Does It Matter
Aggregation means collapsing multiple rows of data into a single calculated value. Instead of seeing every individual sale, you see the total. Instead of every order’s profit, you see the average. Instead of each transaction, you see a count.
In pandas you did this with groupby() and agg() in Module 1. SQL does the same thing — but directly inside the database, before the data even reaches Python. This matters for performance. Aggregating 500,000 rows in SQL and returning a 4-row summary is dramatically faster than loading all 500,000 rows into Python and then aggregating them.
In a professional analyst workflow the rule is simple: aggregate in SQL, visualise and present in Python. Do the heavy lifting where the data lives.
The Five Aggregate Functions
Before combining these with GROUP BY, understand what each function does on its own when applied to an entire table.
COUNT — How Many Rows
COUNT counts the number of rows. It is the most frequently used aggregate function in day-to-day analyst work.
-- How many orders are in the dataset?
query("""
SELECT COUNT(*) AS total_orders
FROM superstore
""")
Output:
| total_orders |
|---|
| 9994 |
COUNT(*) counts all rows including nulls. COUNT(column_name) counts only rows where that column is not null — a useful distinction when checking data completeness:
-- How many rows have a valid postal code vs total rows?
query("""
SELECT
COUNT(*) AS total_rows,
COUNT(postal_code) AS rows_with_postal_code
FROM superstore
""")
If these two numbers differ, you have missing values in that column — which is exactly what you want to know before any analysis.
SUM — Total Value
SUM adds up all values in a numeric column. Used constantly for revenue, profit, quantity, and any other additive metric.
-- What is the total revenue and total profit across all orders?
query("""
SELECT
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
""")
Output:
| total_revenue | total_profit |
|---|---|
| 2297200.86 | 286397.02 |
ROUND(value, 2) keeps results to two decimal places. Always use it with financial figures — raw floating point results from SQL can be messy.
AVG — Average Value
AVG calculates the mean of a numeric column. Useful for understanding typical order size, average margin, average quantity, and so on.
-- What is the average order value and average profit per transaction?
query("""
SELECT
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(AVG(profit), 2) AS avg_profit
FROM superstore
""")
Mean vs median reminder: AVG calculates the mean, which is pulled upward by large outliers. A small number of very large orders can make the average look higher than what a typical order is worth. In Module 1 you compared mean and median in pandas — keep that in mind when interpreting AVG results from SQL.
MIN and MAX — Smallest and Largest Values
MIN and MAX return the lowest and highest values in a column. Useful for understanding the range of your data and spotting outliers.
-- What is the range of order values in the dataset?
query("""
SELECT
ROUND(MIN(sales), 2) AS smallest_order,
ROUND(MAX(sales), 2) AS largest_order,
ROUND(MIN(profit), 2) AS worst_profit,
ROUND(MAX(profit), 2) AS best_profit
FROM superstore
""")
Output:
| smallest_order | largest_order | worst_profit | best_profit |
|---|---|---|---|
| 0.44 | 22638.48 | -6599.98 | 8399.98 |
The worst profit being nearly -$6,600 on a single order is a significant finding. MAX and MIN surface these extremes instantly — without having to sort through thousands of rows manually.
GROUP BY — Aggregating by Category
Aggregate functions on their own give you a single number for the entire table. GROUP BY breaks that down by a category — giving you one summary row per group instead of one summary row for everything.
Syntax:
SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column
This is the SQL equivalent of pandas groupby(). And just like in pandas, it is one of the most powerful tools in your analyst toolkit.
Total Sales and Profit by Region
-- Business question: how does each region perform?
query("""
SELECT
region,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
ORDER BY total_profit DESC
""")
Output:
| region | total_orders | total_revenue | total_profit |
|---|---|---|---|
| West | 3203 | 725457.82 | 108418.45 |
| East | 2848 | 678781.24 | 91522.78 |
| Central | 2323 | 501239.89 | 39706.36 |
| South | 1620 | 391721.91 | 46749.43 |
Four rows. One per region. Each one summarising thousands of individual transactions. This is what aggregation does — it turns raw data into something a manager can read and act on in seconds.
Grouping by Multiple Columns
You can GROUP BY more than one column to get more granular breakdowns:
-- Revenue and profit broken down by region AND category
query("""
SELECT
region,
category,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region, category
ORDER BY region, total_profit DESC
""")
This gives you one row per unique region-category combination — 12 rows total (4 regions × 3 categories). A clean, structured view of performance across two dimensions at once.
The GROUP BY Rule — Every Non-Aggregated Column Must Be Listed
This is the rule that trips up every beginner at least once. In a SELECT statement that uses GROUP BY, every column that is not inside an aggregate function must appear in the GROUP BY clause. If you select region and SUM(sales), you must GROUP BY region. If you select region, category, and SUM(sales), you must GROUP BY region, category.
-- This will throw an error — category is selected but not grouped
query("""
SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region -- missing category here
""")
-- This is correct
query("""
SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region, category
""")
When you get a GROUP BY error, the fix is almost always: add the missing column to the GROUP BY clause.
HAVING — Filtering Aggregated Results
WHERE filters individual rows before aggregation. HAVING filters the aggregated results after GROUP BY runs. This distinction is critical and confuses many beginners.
The simple rule:
- Use WHERE to filter rows — before grouping
- Use HAVING to filter groups — after grouping
Syntax:
SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column<br>HAVING AGG_FUNCTION(column) condition
Filter Groups by Aggregate Value
-- Which regions have total profit above $50,000?
query("""
SELECT
region,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
HAVING SUM(profit) > 50000
ORDER BY total_profit DESC
""")
You cannot use WHERE profit > 50000 here because at the time WHERE runs, the profit has not been summed yet. HAVING runs after the sum is calculated — that is why it can filter on it.
WHERE and HAVING Together
WHERE and HAVING can be used in the same query. WHERE filters the raw rows first, then GROUP BY aggregates, then HAVING filters the groups:
-- Among Technology orders only, which sub-categories have total sales above $100,000?
query("""
SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
WHERE category = 'Technology'
GROUP BY sub_category
HAVING SUM(sales) > 100000
ORDER BY total_sales DESC
""")
Output:
| sub_category | total_sales | total_profit |
|---|---|---|
| Phones | 330007.05 | 44515.73 |
| Machines | 189238.63 | 3384.76 |
| Copiers | 149528.03 | 55617.82 |
Read the logic out loud: “From all Technology orders, group by sub-category, show only groups with over $100,000 in sales, sorted by sales.” WHERE handles the category filter. HAVING handles the post-aggregation filter. Both in the same query.
Practical Business Queries Using Aggregations
Here are three complete queries that combine everything in this topic to answer real analyst questions.
Which customer segment is most valuable?
query("""
SELECT
segment,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY segment
ORDER BY total_revenue DESC
""")
Which sub-categories are losing money overall?
query("""
SELECT
sub_category,
ROUND(SUM(profit), 2) AS total_profit,
COUNT(*) AS total_orders
FROM superstore
GROUP BY sub_category
HAVING SUM(profit) < 0
ORDER BY total_profit ASC
""")
This query directly answers one of the most common business questions an analyst gets asked: which product lines are unprofitable and how bad is it?
Top performing sub-categories by profit margin
query("""
SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit,
ROUND(SUM(profit) / SUM(sales) * 100, 1) AS profit_margin_pct
FROM superstore
GROUP BY sub_category
ORDER BY profit_margin_pct DESC
LIMIT 10
""")
Notice SUM(profit) / SUM(sales) * 100 — you can do arithmetic directly inside a SELECT using aggregate functions. This calculates the overall profit margin for each sub-category, not an average of individual margins, which gives a more accurate business figure.
WHERE vs HAVING — The Full Picture
This distinction comes up so often it deserves its own summary:
| WHERE | HAVING | |
|---|---|---|
| When it runs | Before GROUP BY | After GROUP BY |
| What it filters | Individual rows | Aggregated groups |
| Can use aggregate functions | No | Yes |
| Example | WHERE region = 'West' | HAVING SUM(sales) > 10000 |
| Common use | Filter raw data before summarising | Keep only groups that meet a threshold |
A useful mental model: WHERE is a row-level gate. HAVING is a group-level gate. Data passes through WHERE first, gets grouped and aggregated, then passes through HAVING.
Common Mistakes with Aggregations
| Mistake | What Happens | Fix |
|---|---|---|
| Using WHERE instead of HAVING | Error or wrong results when filtering on aggregated values | Use HAVING for any condition involving SUM, COUNT, AVG, MIN, MAX |
| Forgetting to GROUP BY all non-aggregated columns | SQL error in strict databases | Add every non-aggregated SELECT column to GROUP BY |
| Not using ROUND on financial figures | Results show many decimal places | Wrap financial aggregations in ROUND(value, 2) |
| Dividing by SUM instead of individual values for margin | Slightly different result than expected | For overall margin use SUM(profit)/SUM(sales), not AVG(profit/sales) |
| Selecting an aggregated alias in HAVING | Error — alias not yet available | Repeat the full expression: HAVING SUM(sales) > 10000, not HAVING total_sales > 10000 |
Practice Exercises
Try these before moving to Topic 4. Each one maps to a real analyst scenario.
- What is the total revenue and total profit for each category? Sort by total profit descending.
- Which ship modes are used most frequently? Show the count of orders per ship mode.
- Find all sub-categories where the average order value is above $300.
- Which states have more than 100 orders? Show state name and order count.
- Among orders in the Consumer segment only, what is the total profit per category?
Summary — What You Can Now Do
- Use COUNT, SUM, AVG, MIN, and MAX to calculate summary statistics across an entire table
- Apply GROUP BY to break aggregations down by one or multiple categories
- Follow the GROUP BY rule — every non-aggregated SELECT column must appear in GROUP BY
- Use HAVING to filter groups after aggregation
- Combine WHERE and HAVING in the same query to filter both rows and groups
- Write business-ready summary queries that answer questions about revenue, profit, and performance by segment
Up next — Topic 4: JOINs
So far every query has touched a single table. In Topic 4 you learn INNER JOIN and LEFT JOIN — how to combine data from two tables into one result. This is where SQL becomes genuinely powerful for real-world databases where customer data, order data, and product data all live separately.