SQL Subqueries Explained: WHERE and FROM Subqueries for Data Analysts with Examples

What Is a Subquery?

A subquery is a SELECT statement written inside another SELECT statement. The inner query runs first, produces a result, and the outer query uses that result to complete its own logic.

You have already written queries that filter rows, aggregate data, and join tables. A subquery combines those capabilities into a single statement — letting you answer multi-step business questions without breaking them into separate queries or creating temporary tables.

Here is the simplest way to think about it. Imagine you want to find all orders where the sales value is above average. You cannot write WHERE sales > AVG(sales) directly — SQL does not allow aggregate functions inside a WHERE clause. But you can write a subquery that calculates the average first, then use that result in the WHERE condition:

-- Find all orders above the average sales value
query("""

SELECT order_id, customer_name, sales
FROM superstore
WHERE sales > (SELECT AVG(sales) FROM superstore)
ORDER BY sales DESC
LIMIT 10

""")

The inner query SELECT AVG(sales) FROM superstore runs first and returns a single number. The outer query then uses that number as the filter threshold. This is a subquery in its most fundamental form.

Why Subqueries Matter for Analysts

New analysts sometimes wonder whether subqueries are necessary — after all, you could run two separate queries and use the result of the first manually. That works for one-off exploration but breaks down quickly in real work

.Subqueries let you build self-contained, reusable queries that answer complex questions in a single execution. They make your SQL readable, auditable, and easy to hand off to a colleague. When you save a query in a reporting tool or share it with your team, everything is in one place — not split across two separate statements that need to be run in a specific order.

They also open the door to a category of questions that would be genuinely difficult to answer any other way — questions like “which customers spend more than the average customer?” or “which products outsell the category average?” These comparisons require knowing a benchmark first, then filtering against it. Subqueries are the natural SQL tool for that.

Types of Subqueries Covered in This Topic

There are two subquery patterns every analyst needs to know:

WHERE subquery — the inner query runs and produces a value or list of values that the outer query filters on. Used for comparison and filtering against calculated benchmarks.

FROM subquery — the inner query runs and produces a temporary table that the outer query selects from. Used for multi-step aggregation and pre-filtering before a summary.

Both patterns follow the same core principle: the inner query runs first, produces a result, and the outer query uses that result.

WHERE Subqueries

A WHERE subquery places a SELECT statement inside the WHERE clause. The inner query must return either a single value or a list of values that the outer WHERE condition can compare against.

Comparing Against a Single Calculated Value

This is the most common WHERE subquery pattern. Calculate a benchmark with the inner query, then filter rows against it in the outer query.

-- Which orders have sales above the overall average?
query("""

SELECT
order_id,
customer_name,
region,
ROUND(sales, 2) AS sales
FROM superstore
WHERE sales > (
SELECT AVG(sales)
FROM superstore
)
ORDER BY sales DESC
LIMIT 10

""")

The inner query returns one number — the average sales value across all orders. The outer query filters to rows where individual sales exceed that number. This gives you above-average orders without needing to know the average value in advance.

-- Which orders have profit above the average profit?
query("""

SELECT
order_id,
customer_name,
category,
ROUND(profit, 2) AS profit
FROM superstore
WHERE profit > (
SELECT AVG(profit)
FROM superstore
)
ORDER BY profit DESC
LIMIT 10

""")

Using IN with a Subquery

When the inner query returns multiple values instead of one, use IN to check whether the outer query’s column matches any value in that list.

-- Find all orders placed by customers in the Corporate segment
query("""

SELECT
order_id,
customer_id,
order_date,
region
FROM orders
WHERE customer_id IN (
SELECT customer_id
FROM customers
WHERE segment = 'Corporate'
)
ORDER BY order_date DESC
LIMIT 10

""")

The inner query returns a list of customer IDs belonging to the Corporate segment. The outer query then filters the orders table to only show orders where the customer ID appears in that list. This achieves a similar result to a JOIN — but the logic reads differently and is sometimes clearer depending on the question being asked.

-- Find orders containing Technology products with high profit
query("""

SELECT
o.order_id,
o.order_date,
o.region
FROM orders o
WHERE o.order_id IN (
SELECT order_id
FROM order_items
WHERE category = 'Technology'
AND profit > 500
)
ORDER BY o.order_date DESC
LIMIT 10

""")

NOT IN — Exclusion Filter

The reverse of IN is NOT IN — filter to rows where the value does not appear in the subquery result. Useful for finding records that are absent from another dataset.

-- Find customers who have never ordered a Technology product
query("""

SELECT DISTINCT
customer_id,
customer_name,
segment
FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT o.customer_id
FROM orders o
INNER JOIN order_items oi
ON o.order_id = oi.order_id
WHERE oi.category = 'Technology'
)
ORDER BY customer_name
LIMIT 10

""")

NOT IN and NULLs — important warning: If the subquery result contains any NULL values, NOT IN returns no rows at all. This is a subtle but serious bug. Always add WHERE column IS NOT NULL inside a NOT IN subquery to be safe.

FROM Subqueries

A FROM subquery places a SELECT statement in the FROM clause, treating the result of the inner query as a temporary table. The outer query then selects from that temporary table as if it were a real one.

This pattern is used when you need to aggregate data in two stages — for example, first summarise by one dimension, then summarise or filter that summary further.

Syntax:

SELECT columns<br>FROM (<br>SELECT columns<br>FROM table<br>GROUP BY something<br>) AS subquery_alias<br>WHERE condition

The alias after the closing bracket — AS subquery_alias — is required. SQL needs a name to refer to the temporary table in the outer query.

Two-Stage Aggregation

-- Step 1 inner query: calculate total sales per customer
-- Step 2 outer query: find customers above a revenue threshold
query("""

SELECT
customer_name,
segment,
ROUND(total_sales, 2) AS total_sales
FROM (
SELECT
c.customer_name,
c.segment,
SUM(oi.sales) AS total_sales
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
GROUP BY c.customer_name, c.segment
) AS customer_summary
WHERE total_sales > 5000
ORDER BY total_sales DESC
LIMIT 10

""")

Notice the outer query uses WHERE total_sales > 5000 — filtering on the aggregated column from the inner query. You cannot do this with HAVING in a simple GROUP BY because HAVING filters groups, not a pre-calculated summary table. The FROM subquery pattern gives you this flexibility.

Filtering a Summary Before Further Analysis

-- Find regions where average order value exceeds $250
query("""

SELECT
region,
ROUND(avg_order_value, 2) AS avg_order_value,
total_orders
FROM (
SELECT
o.region,
AVG(oi.sales) AS avg_order_value,
COUNT(*) AS total_orders
FROM orders o
INNER JOIN order_items oi
ON o.order_id = oi.order_id
GROUP BY o.region
) AS region_summary
WHERE avg_order_value > 250
ORDER BY avg_order_value DESC

""")

The inner query calculates average order value and total orders per region. The outer query then filters to regions where the average exceeds $250. This is clean, readable, and easy to modify — just change the threshold in the outer WHERE clause.

Ranking Categories by Performance

-- Which product categories have above-average total profit?
query("""

SELECT
category,
ROUND(total_profit, 2) AS total_profit
FROM (
SELECT
category,
SUM(profit) AS total_profit
FROM order_items
GROUP BY category
) AS category_totals
WHERE total_profit > (
SELECT AVG(total_profit)
FROM (
SELECT SUM(profit) AS total_profit
FROM order_items
GROUP BY category
) AS avg_calc
)
ORDER BY total_profit DESC

""")

This query has a subquery inside a FROM clause and another subquery inside a WHERE clause — both in the same statement. It reads from the inside out: the innermost queries run first, their results feed the next level, and the outermost query produces the final answer. This is advanced but follows the exact same rules you have already learned.

Subquery vs JOIN — When to Use Which

Both subqueries and JOINs can answer many of the same questions. Choosing between them is partly about correctness and partly about readability.

Situation	Better Choice	Why
Filtering based on a calculated value (avg, max)	Subquery in WHERE	JOINs cannot filter on aggregations directly
Finding records absent from another table	Subquery with NOT IN or LEFT JOIN + IS NULL	Both work — LEFT JOIN is safer with NULLs
Combining columns from two tables in the result	JOIN	Subqueries in WHERE do not add columns to output
Two-stage aggregation	FROM subquery	Cleaner than a JOIN for pre-summarised data
Simple lookup across two tables	JOIN	Faster and more readable for straightforward matches
Filtering to a dynamic list from another table	Either — IN subquery or INNER JOIN both work	JOIN is generally faster on large datasets

The practical guideline: use a JOIN when you need columns from both tables in your output. Use a subquery when you are filtering or calculating based on a value derived from another query and do not need to show those extra columns.

Common Subquery Mistakes

Mistake	What Happens	Fix
Missing alias on FROM subquery	SQL error — every derived table needs a name	Always add `AS alias_name` after the closing bracket
Inner query returns multiple rows in a single-value context	SQL error	Use IN instead of = when the subquery can return multiple rows
NOT IN with NULLs in subquery result	Returns zero rows silently	Add `WHERE column IS NOT NULL` inside the NOT IN subquery
Deeply nested subqueries that are hard to read	Difficult to debug and maintain	Break into steps using pandas after pulling data, or use CTEs in future
Using a subquery when a JOIN would be simpler	Slower and harder to read	If you need columns from both tables, use a JOIN

Practice Exercises

Find all orders where sales are below the average sales value. Show order ID, customer name, and sales. Sort by sales ascending.
Find all customers from the Consumer segment who have placed more than 5 orders. Use a FROM subquery to first count orders per customer, then filter.
Find all sub-categories where total profit is above the average sub-category profit. Use a FROM subquery for the totals and a WHERE subquery for the average.
Using NOT IN, find all customers who have never placed an order in the West region.
Find the top 3 regions by average order value using a FROM subquery. Show region, average order value, and total orders.

Summary — What You Can Now Do

Explain what a subquery is and why it runs before the outer query
Write a WHERE subquery to filter rows against a single calculated value
Use IN and NOT IN with subqueries to filter against a list of values
Write a FROM subquery to create a temporary summary table for further filtering
Combine WHERE and FROM subqueries in the same query for multi-step analysis
Choose between a subquery and a JOIN based on what the question requires
Avoid common subquery errors including missing aliases and NOT IN with NULLs

Up next — Topic 6: SQL Meets Python

Topic 6 brings everything together — connecting SQLite to Python, running queries with pd.read_sql(), deciding when to filter in SQL versus pandas, and building a workflow where SQL retrieves the data and Python does the analysis. This is the bridge between Module 2 and everything that follows in the course.

SQL Subqueries Explained: WHERE and FROM Subqueries for Data Analysts with Examples

What Is a Subquery?

Why Subqueries Matter for Analysts

Types of Subqueries Covered in This Topic

WHERE Subqueries

Using IN with a Subquery

NOT IN — Exclusion Filter

FROM Subqueries

Subquery vs JOIN — When to Use Which

Common Subquery Mistakes

Practice Exercises

Summary — What You Can Now Do

Up next — Topic 6: SQL Meets Python

Comments

One response to “SQL Subqueries Explained: WHERE and FROM Subqueries for Data Analysts with Examples”

Leave a Reply Cancel reply

More posts

Distribution Analysis: Understanding the Shape of Data

Descriptive Statistics: Understanding Data Through Summary Measures

Exploratory Data Analysis: Understanding Data Before Analysis

Feature Creation & Data Transformation in Python