How to Use COUNT, SUM, AVG, GROUP BY and HAVING in SQL

SQL Aggregations Explained: COUNT, SUM, AVG, GROUP BY and HAVING with Examples

In Topic 2 you learned to retrieve and filter individual rows. That is useful — but it is not where the real analysis happens. A manager does not want to see 10,000 individual order rows. They want to know: what is our total revenue this year? Which region is most profitable? Which product category has the highest average order value?
Those questions cannot be answered by looking at individual rows. They require aggregation — combining multiple rows into a single summary figure. This topic covers the five aggregate functions every analyst uses daily, along with GROUP BY and HAVING which control how aggregations are grouped and filtered.

What Is Aggregation and Why Does It Matter

Aggregation means collapsing multiple rows of data into a single calculated value. Instead of seeing every individual sale, you see the total. Instead of every order’s profit, you see the average. Instead of each transaction, you see a count.
In pandas you did this with groupby() and agg() in Module 1. SQL does the same thing — but directly inside the database, before the data even reaches Python. This matters for performance. Aggregating 500,000 rows in SQL and returning a 4-row summary is dramatically faster than loading all 500,000 rows into Python and then aggregating them.
In a professional analyst workflow the rule is simple: aggregate in SQL, visualise and present in Python. Do the heavy lifting where the data lives.

The Five Aggregate Functions

Before combining these with GROUP BY, understand what each function does on its own when applied to an entire table.

COUNT — How Many Rows

COUNT counts the number of rows. It is the most frequently used aggregate function in day-to-day analyst work.

-- How many orders are in the dataset?
query("""

SELECT COUNT(*) AS total_orders
FROM superstore

""")

Output:

total_orders
9994

COUNT(*) counts all rows including nulls. COUNT(column_name) counts only rows where that column is not null — a useful distinction when checking data completeness:

-- How many rows have a valid postal code vs total rows?
query("""

SELECT
COUNT(*) AS total_rows,
COUNT(postal_code) AS rows_with_postal_code
FROM superstore

""")

If these two numbers differ, you have missing values in that column — which is exactly what you want to know before any analysis.

SUM — Total Value

SUM adds up all values in a numeric column. Used constantly for revenue, profit, quantity, and any other additive metric.

-- What is the total revenue and total profit across all orders?
query("""

SELECT
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore

""")

Output:

total_revenuetotal_profit
2297200.86286397.02

ROUND(value, 2) keeps results to two decimal places. Always use it with financial figures — raw floating point results from SQL can be messy.

AVG — Average Value

AVG calculates the mean of a numeric column. Useful for understanding typical order size, average margin, average quantity, and so on.

-- What is the average order value and average profit per transaction?
query("""

SELECT
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(AVG(profit), 2) AS avg_profit
FROM superstore

""")

Mean vs median reminder: AVG calculates the mean, which is pulled upward by large outliers. A small number of very large orders can make the average look higher than what a typical order is worth. In Module 1 you compared mean and median in pandas — keep that in mind when interpreting AVG results from SQL.

MIN and MAX — Smallest and Largest Values

MIN and MAX return the lowest and highest values in a column. Useful for understanding the range of your data and spotting outliers.

-- What is the range of order values in the dataset?
query("""

SELECT
ROUND(MIN(sales), 2) AS smallest_order,
ROUND(MAX(sales), 2) AS largest_order,
ROUND(MIN(profit), 2) AS worst_profit,
ROUND(MAX(profit), 2) AS best_profit
FROM superstore

""")

Output:

smallest_orderlargest_orderworst_profitbest_profit
0.4422638.48-6599.988399.98

The worst profit being nearly -$6,600 on a single order is a significant finding. MAX and MIN surface these extremes instantly — without having to sort through thousands of rows manually.

GROUP BY — Aggregating by Category

Aggregate functions on their own give you a single number for the entire table. GROUP BY breaks that down by a category — giving you one summary row per group instead of one summary row for everything.

Syntax:

SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column

This is the SQL equivalent of pandas groupby(). And just like in pandas, it is one of the most powerful tools in your analyst toolkit.

Total Sales and Profit by Region

-- Business question: how does each region perform?
query("""

SELECT
region,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
ORDER BY total_profit DESC

""")

Output:

regiontotal_orderstotal_revenuetotal_profit
West3203725457.82108418.45
East2848678781.2491522.78
Central2323501239.8939706.36
South1620391721.9146749.43

Four rows. One per region. Each one summarising thousands of individual transactions. This is what aggregation does — it turns raw data into something a manager can read and act on in seconds.

Grouping by Multiple Columns

You can GROUP BY more than one column to get more granular breakdowns:

-- Revenue and profit broken down by region AND category
query("""

SELECT
region,
category,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region, category
ORDER BY region, total_profit DESC

""")

This gives you one row per unique region-category combination — 12 rows total (4 regions × 3 categories). A clean, structured view of performance across two dimensions at once.

The GROUP BY Rule — Every Non-Aggregated Column Must Be Listed

This is the rule that trips up every beginner at least once. In a SELECT statement that uses GROUP BY, every column that is not inside an aggregate function must appear in the GROUP BY clause. If you select region and SUM(sales), you must GROUP BY region. If you select region, category, and SUM(sales), you must GROUP BY region, category.

-- This will throw an error — category is selected but not grouped
query("""

SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region -- missing category here

""")
-- This is correct
query("""

SELECT region, category, SUM(sales)
FROM superstore
GROUP BY region, category

""")

When you get a GROUP BY error, the fix is almost always: add the missing column to the GROUP BY clause.

HAVING — Filtering Aggregated Results

WHERE filters individual rows before aggregation. HAVING filters the aggregated results after GROUP BY runs. This distinction is critical and confuses many beginners.

The simple rule:

  • Use WHERE to filter rows — before grouping
  • Use HAVING to filter groups — after grouping

Syntax:

SELECT column, AGG_FUNCTION(column)<br>FROM table<br>GROUP BY column<br>HAVING AGG_FUNCTION(column) condition

Filter Groups by Aggregate Value

-- Which regions have total profit above $50,000?
query("""

SELECT
region,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY region
HAVING SUM(profit) > 50000
ORDER BY total_profit DESC

""")

You cannot use WHERE profit > 50000 here because at the time WHERE runs, the profit has not been summed yet. HAVING runs after the sum is calculated — that is why it can filter on it.

WHERE and HAVING Together

WHERE and HAVING can be used in the same query. WHERE filters the raw rows first, then GROUP BY aggregates, then HAVING filters the groups:

-- Among Technology orders only, which sub-categories have total sales above $100,000?
query("""

SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
WHERE category = 'Technology'
GROUP BY sub_category
HAVING SUM(sales) > 100000
ORDER BY total_sales DESC

""")

Output:

sub_categorytotal_salestotal_profit
Phones330007.0544515.73
Machines189238.633384.76
Copiers149528.0355617.82

Read the logic out loud: “From all Technology orders, group by sub-category, show only groups with over $100,000 in sales, sorted by sales.” WHERE handles the category filter. HAVING handles the post-aggregation filter. Both in the same query.

Practical Business Queries Using Aggregations

Here are three complete queries that combine everything in this topic to answer real analyst questions.

Which customer segment is most valuable?

query("""

SELECT
segment,
COUNT(*) AS total_orders,
ROUND(SUM(sales), 2) AS total_revenue,
ROUND(AVG(sales), 2) AS avg_order_value,
ROUND(SUM(profit), 2) AS total_profit
FROM superstore
GROUP BY segment
ORDER BY total_revenue DESC

""")

Which sub-categories are losing money overall?

query("""

SELECT
sub_category,
ROUND(SUM(profit), 2) AS total_profit,
COUNT(*) AS total_orders
FROM superstore
GROUP BY sub_category
HAVING SUM(profit) < 0
ORDER BY total_profit ASC

""")

This query directly answers one of the most common business questions an analyst gets asked: which product lines are unprofitable and how bad is it?

Top performing sub-categories by profit margin

query("""

SELECT
sub_category,
ROUND(SUM(sales), 2) AS total_sales,
ROUND(SUM(profit), 2) AS total_profit,
ROUND(SUM(profit) / SUM(sales) * 100, 1) AS profit_margin_pct
FROM superstore
GROUP BY sub_category
ORDER BY profit_margin_pct DESC
LIMIT 10

""")

Notice SUM(profit) / SUM(sales) * 100 — you can do arithmetic directly inside a SELECT using aggregate functions. This calculates the overall profit margin for each sub-category, not an average of individual margins, which gives a more accurate business figure.

WHERE vs HAVING — The Full Picture

This distinction comes up so often it deserves its own summary:

WHEREHAVING
When it runsBefore GROUP BYAfter GROUP BY
What it filtersIndividual rowsAggregated groups
Can use aggregate functionsNoYes
ExampleWHERE region = 'West'HAVING SUM(sales) > 10000
Common useFilter raw data before summarisingKeep only groups that meet a threshold

A useful mental model: WHERE is a row-level gate. HAVING is a group-level gate. Data passes through WHERE first, gets grouped and aggregated, then passes through HAVING.

Common Mistakes with Aggregations

MistakeWhat HappensFix
Using WHERE instead of HAVINGError or wrong results when filtering on aggregated valuesUse HAVING for any condition involving SUM, COUNT, AVG, MIN, MAX
Forgetting to GROUP BY all non-aggregated columnsSQL error in strict databasesAdd every non-aggregated SELECT column to GROUP BY
Not using ROUND on financial figuresResults show many decimal placesWrap financial aggregations in ROUND(value, 2)
Dividing by SUM instead of individual values for marginSlightly different result than expectedFor overall margin use SUM(profit)/SUM(sales), not AVG(profit/sales)
Selecting an aggregated alias in HAVINGError — alias not yet availableRepeat the full expression: HAVING SUM(sales) > 10000, not HAVING total_sales > 10000

Practice Exercises

Try these before moving to Topic 4. Each one maps to a real analyst scenario.

  1. What is the total revenue and total profit for each category? Sort by total profit descending.
  2. Which ship modes are used most frequently? Show the count of orders per ship mode.
  3. Find all sub-categories where the average order value is above $300.
  4. Which states have more than 100 orders? Show state name and order count.
  5. Among orders in the Consumer segment only, what is the total profit per category?

Summary — What You Can Now Do

  • Use COUNT, SUM, AVG, MIN, and MAX to calculate summary statistics across an entire table
  • Apply GROUP BY to break aggregations down by one or multiple categories
  • Follow the GROUP BY rule — every non-aggregated SELECT column must appear in GROUP BY
  • Use HAVING to filter groups after aggregation
  • Combine WHERE and HAVING in the same query to filter both rows and groups
  • Write business-ready summary queries that answer questions about revenue, profit, and performance by segment

Up next — Topic 4: JOINs

So far every query has touched a single table. In Topic 4 you learn INNER JOIN and LEFT JOIN — how to combine data from two tables into one result. This is where SQL becomes genuinely powerful for real-world databases where customer data, order data, and product data all live separately.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *