SQL JOINs Explained: INNER JOIN and LEFT JOIN for Data Analysts with Examples

What Is a JOIN and Why Do You Need It

Every query in Topics 2 and 3 touched a single table. That works fine when all your data lives in one place — but in real databases it almost never does.

Customer details live in a customers table. Orders live in an orders table. Products live in a products table. These tables are kept separate deliberately — it avoids storing the same customer name and address on every single order they place. Instead, each order stores a customer ID, and that ID links back to the customer record.

This is efficient for storage and data integrity. But it means that to answer most real business questions, you need to combine two or more tables. That is exactly what a JOIN does.

A JOIN connects two tables based on a shared column — usually an ID that appears in both. The result is a new combined table containing columns from both sources, matched row by row.

Without JOINs, a relational database is just a collection of disconnected tables. With JOINs, it becomes a connected system you can query across freely.

Setting Up the Superstore Tables for JOINs

In Topics 2 and 3 the Superstore data was one flat table. To practice JOINs properly you need it split into separate tables the way a real database works. Run this setup code once in your notebook to create the tables:

import sqlite3
import pandas as pd

conn = sqlite3.connect('superstore.db')

# Load the original flat table
df = pd.read_csv('superstore_sales.csv')
df.columns = [col.strip().replace(' ', '_').lower() for col in df.columns]

# Create customers table
customers = df[['customer_id', 'customer_name', 'segment']].drop_duplicates()
customers.to_sql('customers', conn, if_exists='replace', index=False)

# Create orders table
orders = df[['order_id', 'customer_id', 'order_date',
             'ship_date', 'ship_mode', 'region',
             'city', 'state']].drop_duplicates()
orders.to_sql('orders', conn, if_exists='replace', index=False)

# Create order_items table
order_items = df[['order_id', 'product_id', 'sub_category',
                  'category', 'sales', 'quantity',
                  'discount', 'profit']].drop_duplicates()
order_items.to_sql('order_items', conn, if_exists='replace', index=False)

print("Tables created successfully.")
conn.close()

Now you have three separate tables — customers, orders, and order_items — linked by shared ID columns. This mirrors how a real company database is structured.

def query(sql):
return pd.read_sql(sql, conn)

conn = sqlite3.connect('superstore.db')

How a JOIN Works — The Mental Model

Before writing JOIN syntax, understand what is happening conceptually.

Imagine two tables sitting side by side. The orders table has a column called customer_id. The customers table also has a column called customer_id. A JOIN says: for every row in orders, find the matching row in customers where the customer_id values are equal, and combine them into one wider row.

The column you join on is called the join key. It must exist in both tables and contain matching values. In the Superstore setup:

•   orders.customer_id links to customers.customer_id
•   order_items.order_id links to orders.order_id

The difference between JOIN types is what happens when a match is not found. That is the entire distinction between INNER JOIN and LEFT JOIN.

INNER JOIN — Only Matching Rows

Syntax:

SELECT columns<br>FROM table_one<br>INNER JOIN table_two ON table_one.key = table_two.key

INNER JOIN returns only rows where a match exists in both tables. If a row in the left table has no matching row in the right table, it is excluded from the result. If a row in the right table has no match in the left table, it is also excluded.

Think of it as the intersection — only rows that exist in both tables come through.

Your First INNER JOIN

-- Combine orders with customer details
query("""

SELECT
o.order_id,
o.order_date,
o.region,
c.customer_name,
c.segment
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
LIMIT 10

""")

Output (first 5 rows):

order_idorder_dateregioncustomer_namesegment
CA-2020-1521562020-11-08SouthClaire GuteConsumer
CA-2020-1386882020-06-12WestDarrin Van HuffCorporate
US-2019-1089662019-10-11SouthSean O’DonnellConsumer
CA-2019-1158122019-06-09EastBrosina HoffmanConsumer
CA-2019-1144122019-04-15WestAndrew AllenConsumer

Notice o. and c. before column names. These are table aliases — shorthand so you don’t have to type the full table name every time. orders o means “refer to the orders table as o.” This becomes essential when both tables have columns with the same name.

Joining Three Tables

Most real analyst queries join more than two tables. Here is how to bring orders, customers, and order_items together in one query:

-- Full picture: customer details + order details + financials
query("""

SELECT
c.customer_name,
c.segment,
o.order_date,
o.region,
oi.category,
oi.sub_category,
ROUND(oi.sales, 2) AS sales,
ROUND(oi.profit, 2) AS profit
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
LIMIT 10

""")

Each INNER JOIN adds another table into the result. The pattern is always the same — join on the shared key column between the two tables being connected.

INNER JOIN with WHERE and ORDER BY

JOINs combine cleanly with everything from Topics 2 and 3:

-- High-value orders in the West region with customer details
query("""

SELECT
c.customer_name,
c.segment,
o.region,
oi.category,
ROUND(oi.sales, 2) AS sales,
ROUND(oi.profit, 2) AS profit
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
WHERE o.region = 'West'
AND oi.sales > 1000
ORDER BY oi.sales DESC
LIMIT 10

""")

LEFT JOIN — Keep All Rows from the Left Table

Syntax:

SELECT columns<br>FROM table_one<br>LEFT JOIN table_two ON table_one.key = table_two.key

LEFT JOIN returns all rows from the left table, plus matching rows from the right table. When no match exists in the right table, the columns from the right table come back as NULL.

The key difference from INNER JOIN: no rows from the left table are ever dropped. Even if they have no match on the right side, they appear in the result — just with NULL values in the right table’s columns.

When to Use LEFT JOIN

LEFT JOIN is the right choice when you want to keep all records from one table regardless of whether they have matching records in another. Common scenarios:

  • Find customers who have never placed an order
  • Find products that have never been sold
  • Identify records in one system that are missing from another
  • Audit data completeness across two sources

LEFT JOIN Example — Finding Unmatched Records

To demonstrate LEFT JOIN clearly, first add a test customer with no orders:

Add a customer who has never ordered

import pandas as pd
import sqlite3
conn = sqlite3.connect('superstore.db')
new_customer = pd.DataFrame([{
'customer_id': 'TEST-001',
'customer_name': 'Test Customer',
'segment': 'Consumer'
}])
new_customer.to_sql('customers', conn, if_exists='append', index=False)
conn.commit()

Now run a LEFT JOIN to find customers with no orders:

-- Find customers who have never placed an order
query("""

SELECT
c.customer_id,
c.customer_name,
c.segment,
o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL

""")

Output:

customer_idcustomer_namesegmentorder_id
TEST-001Test CustomerConsumerNULL

The NULL in order_id tells you this customer exists in the customers table but has no matching record in the orders table. The INNER JOIN version of this query would have excluded this row entirely — you would never know this customer existed.

This pattern — LEFT JOIN followed by WHERE right_table.key IS NULL — is one of the most useful SQL techniques for data quality auditing.

Handling NULLs After a JOIN

NULL values appearing after a LEFT JOIN are expected and useful — they signal missing matches. But they need to be handled carefully in any further calculations or filtering.

COALESCE — Replacing NULL with a Default Value

COALESCE returns the first non-NULL value from a list of arguments. Use it to replace NULLs with a meaningful default:

-- Replace NULL order_id with a readable label
query("""

SELECT
c.customer_name,
COALESCE(o.order_id, 'No orders yet') AS order_status
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL

""")

COALESCE is also useful when joining tables where a column might be populated in one table but missing in another. Rather than seeing NULL in your output, you get a clean fallback value.

NULL in Aggregate Functions After a JOIN

One important behaviour: aggregate functions like COUNT, SUM, and AVG ignore NULL values automatically. This matters after a LEFT JOIN because unmatched rows produce NULLs in the right table’s columns.

-- Count orders per customer — unmatched customers show 0, not NULL
query("""

SELECT
c.customer_name,
COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name
ORDER BY order_count DESC
LIMIT 10

""")

COUNT(o.order_id) counts non-NULL values only — so customers with no orders correctly show 0. If you used COUNT() here you would get 1 for every customer including unmatched ones, because COUNT() counts the row itself regardless of NULL values.

INNER JOIN vs LEFT JOIN — When to Use Which

SituationUse
You only want rows that exist in both tablesINNER JOIN
You want all rows from the left table, matched or notLEFT JOIN
Finding records that are missing from another tableLEFT JOIN + WHERE right key IS NULL
Combining sales data with product detailsINNER JOIN
Auditing customers with no ordersLEFT JOIN
Getting complete order history including customer infoINNER JOIN
Data completeness check across two systemsLEFT JOIN

The default choice for most analyst queries is INNER JOIN — you usually only want complete, matched records. Reach for LEFT JOIN specifically when missing matches are meaningful information rather than just gaps to exclude.

A Complete Business Query Using JOINs

Here is a realistic analyst query combining JOINs, WHERE, GROUP BY, and ORDER BY to answer a full business question:

Question: Which customer segments generate the most revenue and profit, broken down by product category?

query("""

SELECT
c.segment,
oi.category,
COUNT(DISTINCT o.order_id) AS total_orders,
ROUND(SUM(oi.sales), 2) AS total_revenue,
ROUND(SUM(oi.profit), 2) AS total_profit,
ROUND(SUM(oi.profit) /
SUM(oi.sales) * 100, 1) AS profit_margin_pct
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
GROUP BY c.segment, oi.category
ORDER BY c.segment, total_revenue DESC

""")

This single query pulls from three tables, aggregates across two dimensions, calculates a derived metric, and produces a result a manager could read directly in a meeting. That is the power of combining JOINs with everything from the previous topics.

Common JOIN Mistakes

MistakeWhat HappensFix
Joining on the wrong columnIncorrect or empty resultsDouble-check which columns are the shared keys between tables
Forgetting table aliases when column names clashSQL error — ambiguous column nameAlways use aliases when the same column name exists in both tables
Using INNER JOIN when LEFT JOIN is neededSilently drops unmatched rowsAsk yourself — do I care about rows with no match? If yes, use LEFT JOIN
Not handling NULLs after LEFT JOINWrong aggregation resultsUse COALESCE for display, COUNT(column) not COUNT(*) for counting
Joining without ON clauseCartesian product — every row matched to every rowAlways include the ON condition

Practice Exercises

  1. Join the orders and customers tables. Show the customer name, region, and order date for all orders placed in 2021.
  2. Join all three tables. Find the top 5 customers by total profit across all their orders.
  3. Using a LEFT JOIN, find any order IDs in order_items that do not have a matching record in the orders table.
  4. Join orders and customers. Group by segment and region. Show total revenue per combination sorted by revenue descending.
  5. Join all three tables. Filter to Furniture category only. Show total sales and profit per customer segment.

Summary — What You Can Now Do

  • Explain what a JOIN does and why relational databases require them
  • Write an INNER JOIN to combine two or more tables on a shared key column
  • Write a LEFT JOIN to keep all rows from the left table including unmatched ones
  • Use LEFT JOIN with WHERE IS NULL to find records missing from a second table
  • Handle NULLs after a JOIN using COALESCE and COUNT(column) vs COUNT(*)
  • Combine JOINs with WHERE, GROUP BY, ORDER BY, and HAVING in a single query
  • Choose between INNER JOIN and LEFT JOIN based on whether unmatched rows matter

Up next — Topic 5: Subqueries

Topic 5 covers queries inside queries — how to use a SELECT result as a filter in WHERE, or as a derived table in FROM. Subqueries let you answer multi-step business questions in a single SQL statement without needing temporary tables or multiple separate queries.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *