How to Use COUNT, SUM, AVG, GROUP BY and HAVING in SQL

SQL Aggregations Explained: COUNT, SUM, AVG, GROUP BY and HAVING with Examples

In Topic 2 you learned to retrieve and filter individual rows. That is useful — but it is not where the real analysis happens. A manager does not want to see 10,000 individual order rows. They want to know: what is our total revenue this year? Which region is most profitable? Which product category has the highest average order value?
Those questions cannot be answered by looking at individual rows. They require aggregation — combining multiple rows into a single summary figure. This topic covers the five aggregate functions every analyst uses daily, along with GROUP BY and HAVING which control how aggregations are grouped and filtered.

What Is Aggregation and Why Does It Matter

Aggregation means collapsing multiple rows of data into a single calculated value. Instead of seeing every individual sale, you see the total. Instead of every order’s profit, you see the average. Instead of each transaction, you see a count.
In pandas you did this with groupby() and agg() in Module 1. SQL does the same thing — but directly inside the database, before the data even reaches Python. This matters for performance. Aggregating 500,000 rows in SQL and returning a 4-row summary is dramatically faster than loading all 500,000 rows into Python and then aggregating them.
In a professional analyst workflow the rule is simple: aggregate in SQL, visualise and present in Python. Do the heavy lifting where the data lives.

The Five Aggregate Functions

Before combining these with GROUP BY, understand what each function does on its own when applied to an entire table.

COUNT — How Many Rows

COUNT counts the number of rows. It is the most frequently used aggregate function in day-to-day analyst work.

-- How many orders are in the dataset?
query("""

SELECT COUNT(*) AS total_orders
FROM superstore

""")

Output:

total_orders
9994

COUNT(*) counts all rows including nulls. COUNT(column_name) counts only rows where that column is not null — a useful distinction when checking data completeness:

-- How many rows have a valid postal code vs total rows?
query("""

SELECT
COUNT(*) AS total_rows,
COUNT(postal_code) AS rows_with_postal_code
FROM superstore

""")

If these two numbers differ, you have missing values in that column — which is exactly what you want to know before any analysis.

SUM — Total Value

SUM adds up all values in a numeric column. Used constantly for revenue, profit, quantity, and any other additive metric.

-- What is the total revenue and total profit across all orders?
query("""

SELECT
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore

""")

Output:

total_revenue	total_profit
2297200.86	286397.02

ROUND(value, 2) keeps results to two decimal places. Always use it with financial figures — raw floating point results from SQL can be messy.

AVG — Average Value

AVG calculates the mean of a numeric column. Useful for understanding typical order size, average margin, average quantity, and so on.

-- What is the average order value and average profit per transaction?
query("""

SELECT
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(AVG(profit), 2) AS avg_profit
FROM superstore

""")

Mean vs median reminder: AVG calculates the mean, which is pulled upward by large outliers. A small number of very large orders can make the average look higher than what a typical order is worth. In Module 1 you compared mean and median in pandas — keep that in mind when interpreting AVG results from SQL.

MIN and MAX — Smallest and Largest Values

MIN and MAX return the lowest and highest values in a column. Useful for understanding the range of your data and spotting outliers.

-- What is the range of order values in the dataset?
query("""

SELECT
ROUND(MIN(sales), 2) AS smallest_order,
ROUND(MAX(sales), 2) AS largest_order,
ROUND(MIN(profit), 2) AS worst_profit,
ROUND(MAX(profit), 2) AS best_profit
FROM superstore

""")

Output:

smallest_order	largest_order	worst_profit	best_profit
0.44	22638.48	-6599.98	8399.98

The worst profit being nearly -$6,600 on a single order is a significant finding. MAX and MIN surface these extremes instantly — without having to sort through thousands of rows manually.

GROUP BY — Aggregating by Category

Aggregate functions on their own give you a single number for the entire table. GROUP BY breaks that down by a category — giving you one summary row per group instead of one summary row for everything.

Syntax:

SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column

This is the SQL equivalent of pandas groupby(). And just like in pandas, it is one of the most powerful tools in your analyst toolkit.

Total Sales and Profit by Region

-- Business question: how does each region perform?
query("""

SELECT
region,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
ORDER BY total_profit DESC

""")

Output:

region	total_orders	total_revenue	total_profit
West	3203	725457.82	108418.45
East	2848	678781.24	91522.78
Central	2323	501239.89	39706.36
South	1620	391721.91	46749.43

Four rows. One per region. Each one summarising thousands of individual transactions. This is what aggregation does — it turns raw data into something a manager can read and act on in seconds.

Grouping by Multiple Columns

You can GROUP BY more than one column to get more granular breakdowns:

-- Revenue and profit broken down by region AND category
query("""

SELECT
region,
category,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region, category
ORDER BY region, total_profit DESC

""")

This gives you one row per unique region-category combination — 12 rows total (4 regions × 3 categories). A clean, structured view of performance across two dimensions at once.

The GROUP BY Rule — Every Non-Aggregated Column Must Be Listed

This is the rule that trips up every beginner at least once. In a SELECT statement that uses GROUP BY, every column that is not inside an aggregate function must appear in the GROUP BY clause. If you select region and SUM(sales), you must GROUP BY region. If you select region, category, and SUM(sales), you must GROUP BY region, category.

-- This will throw an error — category is selected but not grouped
query("""

SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region -- missing category here

""")

-- This is correct
query("""

SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region, category

""")

When you get a GROUP BY error, the fix is almost always: add the missing column to the GROUP BY clause.

HAVING — Filtering Aggregated Results

WHERE filters individual rows before aggregation. HAVING filters the aggregated results after GROUP BY runs. This distinction is critical and confuses many beginners.

The simple rule:

Use WHERE to filter rows — before grouping
Use HAVING to filter groups — after grouping

Syntax:

SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column<br>HAVING AGG_FUNCTION(column) condition

Filter Groups by Aggregate Value

-- Which regions have total profit above $50,000?
query("""

SELECT
region,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
HAVING SUM(profit) > 50000
ORDER BY total_profit DESC

""")

You cannot use WHERE profit > 50000 here because at the time WHERE runs, the profit has not been summed yet. HAVING runs after the sum is calculated — that is why it can filter on it.

WHERE and HAVING Together

WHERE and HAVING can be used in the same query. WHERE filters the raw rows first, then GROUP BY aggregates, then HAVING filters the groups:

-- Among Technology orders only, which sub-categories have total sales above $100,000?
query("""

SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
WHERE category = 'Technology'
GROUP BY sub_category
HAVING SUM(sales) > 100000
ORDER BY total_sales DESC

""")

Output:

sub_category	total_sales	total_profit
Phones	330007.05	44515.73
Machines	189238.63	3384.76
Copiers	149528.03	55617.82

Read the logic out loud: “From all Technology orders, group by sub-category, show only groups with over $100,000 in sales, sorted by sales.” WHERE handles the category filter. HAVING handles the post-aggregation filter. Both in the same query.

Practical Business Queries Using Aggregations

Here are three complete queries that combine everything in this topic to answer real analyst questions.

Which customer segment is most valuable?

query("""

SELECT
segment,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY segment
ORDER BY total_revenue DESC

""")

Which sub-categories are losing money overall?

query("""

SELECT
sub_category,
ROUND(SUM(profit), 2) AS total_profit,
COUNT(*) AS total_orders
FROM superstore
GROUP BY sub_category
HAVING SUM(profit) < 0
ORDER BY total_profit ASC

""")

This query directly answers one of the most common business questions an analyst gets asked: which product lines are unprofitable and how bad is it?

Top performing sub-categories by profit margin

query("""

SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit,
ROUND(SUM(profit) / SUM(sales) * 100, 1) AS profit_margin_pct
FROM superstore
GROUP BY sub_category
ORDER BY profit_margin_pct DESC
LIMIT 10

""")

Notice SUM(profit) / SUM(sales) * 100 — you can do arithmetic directly inside a SELECT using aggregate functions. This calculates the overall profit margin for each sub-category, not an average of individual margins, which gives a more accurate business figure.

WHERE vs HAVING — The Full Picture

This distinction comes up so often it deserves its own summary:

	WHERE	HAVING
When it runs	Before GROUP BY	After GROUP BY
What it filters	Individual rows	Aggregated groups
Can use aggregate functions	No	Yes
Example	`WHERE region = 'West'`	`HAVING SUM(sales) > 10000`
Common use	Filter raw data before summarising	Keep only groups that meet a threshold

A useful mental model: WHERE is a row-level gate. HAVING is a group-level gate. Data passes through WHERE first, gets grouped and aggregated, then passes through HAVING.

Common Mistakes with Aggregations

Mistake	What Happens	Fix
Using WHERE instead of HAVING	Error or wrong results when filtering on aggregated values	Use HAVING for any condition involving SUM, COUNT, AVG, MIN, MAX
Forgetting to GROUP BY all non-aggregated columns	SQL error in strict databases	Add every non-aggregated SELECT column to GROUP BY
Not using ROUND on financial figures	Results show many decimal places	Wrap financial aggregations in ROUND(value, 2)
Dividing by SUM instead of individual values for margin	Slightly different result than expected	For overall margin use SUM(profit)/SUM(sales), not AVG(profit/sales)
Selecting an aggregated alias in HAVING	Error — alias not yet available	Repeat the full expression: HAVING SUM(sales) > 10000, not HAVING total_sales > 10000

Practice Exercises

Try these before moving to Topic 4. Each one maps to a real analyst scenario.

What is the total revenue and total profit for each category? Sort by total profit descending.
Which ship modes are used most frequently? Show the count of orders per ship mode.
Find all sub-categories where the average order value is above $300.
Which states have more than 100 orders? Show state name and order count.
Among orders in the Consumer segment only, what is the total profit per category?

Summary — What You Can Now Do

Use COUNT, SUM, AVG, MIN, and MAX to calculate summary statistics across an entire table
Apply GROUP BY to break aggregations down by one or multiple categories
Follow the GROUP BY rule — every non-aggregated SELECT column must appear in GROUP BY
Use HAVING to filter groups after aggregation
Combine WHERE and HAVING in the same query to filter both rows and groups
Write business-ready summary queries that answer questions about revenue, profit, and performance by segment

Up next — Topic 4: JOINs

So far every query has touched a single table. In Topic 4 you learn INNER JOIN and LEFT JOIN — how to combine data from two tables into one result. This is where SQL becomes genuinely powerful for real-world databases where customer data, order data, and product data all live separately.

How to Use COUNT, SUM, AVG, GROUP BY and HAVING in SQL

SQL Aggregations Explained: COUNT, SUM, AVG, GROUP BY and HAVING with Examples

What Is Aggregation and Why Does It Matter

The Five Aggregate Functions

COUNT — How Many Rows

SUM — Total Value

AVG — Average Value

MIN and MAX — Smallest and Largest Values

GROUP BY — Aggregating by Category

HAVING — Filtering Aggregated Results

WHERE and HAVING Together

Practical Business Queries Using Aggregations

WHERE vs HAVING — The Full Picture

Common Mistakes with Aggregations

Practice Exercises

Summary — What You Can Now Do

Up next — Topic 4: JOINs

Comments

Leave a Reply Cancel reply

More posts

Distribution Analysis: Understanding the Shape of Data

Descriptive Statistics: Understanding Data Through Summary Measures

Exploratory Data Analysis: Understanding Data Before Analysis

Feature Creation & Data Transformation in Python