SQL Query Basics: SELECT, WHERE, ORDER BY, LIMIT and DISTINCT Explained with Examples
Before writing complex analysis, you need to master five keywords. SELECT, WHERE, ORDER BY, LIMIT, and DISTINCT. These five alone let you answer most basic business questions from any database. Every query you write in this module — and most queries you will write in your career — is built on this foundation.
Why These Five Keywords Matter
When a data analyst joins a company, one of the first things they are asked to do is pull data. Not build a model. Not create a dashboard. Just answer a question: how many orders came in last month? Which region had the highest sales? Which customers haven’t ordered in 90 days?
All of those questions are answered with the five keywords in this topic. SELECT chooses what to show. FROM says where to look. WHERE filters to the relevant rows. ORDER BY sorts the results. LIMIT keeps things manageable. Master these and you can answer the majority of day-to-day analyst requests without writing a single line of Python.
This is also why SQL appears in more data analyst job postings than any other technical skill. It is the universal language of data retrieval. Every database system — MySQL, PostgreSQL, SQL Server, BigQuery, Snowflake — understands these five keywords. Learn them once and they work everywhere.
How SQL Thinks
Every SQL query follows the same skeleton structure. Write it in this order every time:
SELECT -- what columns do you want?<br>FROM -- which table?<br>WHERE -- which rows? (filter — optional)<br>ORDER BY -- in what order? (sort — optional)<br>LIMIT -- how many rows? (cap — optional)
You write queries top to bottom in this order. Internally the database reads FROM first, then WHERE, then SELECT — but you don’t need to worry about that yet. Write top to bottom and it works.
Before starting, add this to the top of your notebook. It sets up your database connection and creates a helper function so every query is one clean line:
import sqlite3
import pandas as pd
# Connect to your Superstore database from Topic 1
conn = sqlite3.connect('superstore.db')
# Helper — run any SQL query in one line
def query(sql):
return pd.read_sql(sql, conn)
SELECT — Choosing Your Columns
Syntax: SELECT column1, column2 FROM table_name
SELECT tells the database which columns to return. FROM tells it which table to look in. These two always go together and appear in every query you will ever write.
The asterisk * means “all columns”:
-- Get everything from the superstore table<br>query("""<br>SELECT *<br>FROM superstore<br>""")
Avoid SELECT * in real work. It retrieves every column including ones you don’t need, which slows things down on large tables. Name your columns explicitly.
In practice, always specify only the columns you need:
-- Show order dates, regions, and financials only<br>query("""<br>SELECT order_date, region, category, sales, profit<br>FROM superstore<br>""")
Expected output (first 5 rows):
| order_date | region | category | sales | profit |
|---|---|---|---|---|
| 2020-11-08 | South | Furniture | 261.96 | 41.91 |
| 2020-11-08 | South | Office Supplies | 731.94 | 219.58 |
| 2020-06-12 | West | Office Supplies | 14.62 | 6.87 |
| 2020-06-12 | West | Technology | 957.58 | -383.03 |
| 2020-06-12 | West | Furniture | 22.37 | 2.52 |
Renaming Columns with AS
You can rename any column in your output using AS. This is useful when column names are long, unclear, or when you want your output readable for a non-technical audience. The original column name in the database stays unchanged — AS only affects what comes back to you:
query("""<br>SELECT<br>order_id AS order,<br>customer_name AS customer,<br>sales AS revenue,<br>profit AS net_profit<br>FROM superstore<br>""")
A practical use case: if you are preparing output to paste into a report or share with a manager, clean column names matter. net_profit reads better than profit in a business context, and revenue is more universally understood than sales depending on your company’s terminology.
LIMIT — Controlling How Many Rows Come Back
Syntax: SELECT … LIMIT n
LIMIT caps the number of rows returned. Always use it when exploring a large table — you never want to accidentally load tens of thousands of rows into a notebook.
-- Quick preview — return only the first 10 rows<br>query("""<br>SELECT order_date, customer_name, sales<br>FROM superstore<br>LIMIT 10<br>""")
Good habit: Start every exploration with LIMIT 10. Understand what you’re looking at. Then remove the limit when you’re ready to work with the full dataset.
LIMIT also has an important use in combination with ORDER BY — which you will see later in this topic. When you sort by a value and apply a limit, you get a clean top-N or bottom-N result. For example, the top 10 most profitable orders, or the 5 most recent transactions. This pattern appears constantly in real analyst work.
DISTINCT — Finding Unique Values
Syntax: SELECT DISTINCT column FROM table
DISTINCT removes duplicate values and returns only unique entries. Use it routinely when you first encounter a new dataset to understand what values exist in a column.
-- What regions exist in the dataset?<br>query("""<br>SELECT DISTINCT region<br>FROM superstore<br>""")
Output:
| region |
|---|
| South |
| West |
| Central |
| East |
You can also apply DISTINCT across multiple columns to see unique combinations:
-- What unique category and sub-category combinations exist?<br>query("""<br>SELECT DISTINCT category, sub_category<br>FROM superstore<br>ORDER BY category, sub_category<br>""")
When DISTINCT spans multiple columns it returns unique combinations, not unique values per column individually.
Why DISTINCT Is More Useful Than It Looks
New analysts often underestimate DISTINCT. Beyond simple column exploration, it helps you catch data quality problems early. If a column that should only have four region values suddenly shows five — including a misspelling like “Westt” — DISTINCT will surface it immediately. Use it as a data validation step every time you work with a new table.
-- Validate ship mode values — should only be 4 options<br>query("""<br>SELECT DISTINCT ship_mode<br>FROM superstore<br>ORDER BY ship_mode<br>""")
If you see unexpected values, that is a data quality issue worth flagging before any analysis begins.
WHERE — Filtering Rows
Syntax: SELECT … FROM … WHERE condition
WHERE is the most powerful basic clause in SQL. It filters rows so only the ones matching your condition come back. Everything else is excluded.
WHERE Operators
| Operator | What It Does | Example |
|---|---|---|
| = | Equals | region = 'West' |
| != or <> | Not equals | region != 'West' |
| > / < | Greater / less than | sales > 1000 |
| BETWEEN | Range, inclusive | sales BETWEEN 100 AND 500 |
| LIKE | Pattern match | customer_name LIKE 'A%' |
| IN | Matches a list | region IN ('East', 'West') |
| IS NULL | Value is missing | postal_code IS NULL |
| AND | Both conditions true | region = 'West' AND sales > 500 |
| OR | Either condition true | region = 'West' OR region = 'East' |
Important: Text values always need single quotes — ‘West’ not West. Numbers do not — sales > 500 not sales > ‘500’. This is one of the most common beginner errors.
Basic Filter Examples
-- All orders from the West region<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>WHERE region = 'West'<br>""")
-- High-value orders only — sales above $1,000<br>query("""<br>SELECT order_id, customer_name, sales, profit<br>FROM superstore<br>WHERE sales > 1000<br>LIMIT 10<br>""")
Using BETWEEN for Ranges
BETWEEN is cleaner than writing two separate conditions when you are filtering a range. Both the lower and upper values are inclusive:
-- Orders with sales between $500 and $1,000<br>query("""<br>SELECT order_id, customer_name, sales, profit<br>FROM superstore<br>WHERE sales BETWEEN 500 AND 1000<br>ORDER BY sales DESC<br>""")
This is identical to writing WHERE sales >= 500 AND sales <= 1000 but easier to read, especially when shared with others.
Combining Conditions with AND and OR
AND means both conditions must be true. OR means at least one must be true.
-- High-value orders specifically in the West region<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>WHERE region = 'West'<br>AND sales > 1000<br>""")
Use IN instead of multiple OR conditions — it’s cleaner and easier to read:
-- Orders from East or West<br>query("""<br>SELECT order_id, region, sales<br>FROM superstore<br>WHERE region IN ('East', 'West')<br>""")
Pattern Matching with LIKE
The % wildcard means “zero or more of any character.” The _ wildcard means exactly one character. Here is how both work in practice:
-- Customers whose name starts with 'A'<br>query("""<br>SELECT DISTINCT customer_name, segment<br>FROM superstore<br>WHERE customer_name LIKE 'A%'<br>ORDER BY customer_name<br>""")
-- Find any sub-category containing the word 'Chair'<br>query("""<br>SELECT DISTINCT sub_category<br>FROM superstore<br>WHERE sub_category LIKE '%Chair%'<br>""")
-- Find products where the second letter is 'a'<br>-- _ means exactly one character<br>query("""<br>SELECT DISTINCT sub_category<br>FROM superstore<br>WHERE sub_category LIKE '_a%'<br>""")
Case sensitivity note: SQLite’s LIKE is case-insensitive for ASCII letters by default, so LIKE ‘west’ will still match ‘West’. However, this varies between database systems. In PostgreSQL, LIKE is case-sensitive. Build the habit of matching case correctly so your queries work everywhere.
Filtering for Missing Values with IS NULL
NULL in SQL means a value is missing or unknown. You cannot check for NULL using the equals operator — you must use IS NULL or IS NOT NULL:
-- Find rows where postal code is missing<br>query("""<br>SELECT order_id, customer_name, city, postal_code<br>FROM superstore<br>WHERE postal_code IS NULL<br>""")
-- Find all rows that DO have a postal code<br>query("""<br>SELECT order_id, customer_name, city, postal_code<br>FROM superstore<br>WHERE postal_code IS NOT NULL<br>LIMIT 10<br>""")
NULL handling trips up beginners constantly.
Remember: = NULL never works. Always use IS NULL.
Finding Loss-Making Orders
-- Which orders lost money? Sorted worst first.<br>query("""<br>SELECT order_id, customer_name, category, sales, profit<br>FROM superstore<br>WHERE profit < 0<br>ORDER BY profit ASC<br>LIMIT 10<br>""")
Output (top 5 shown):
| order_id | customer_name | category | sales | profit |
|---|---|---|---|---|
| CA-2019-138422 | Cindy Stewart | Technology | 4199.93 | -1199.97 |
| CA-2021-144568 | Zuschuss Donatelli | Technology | 3199.95 | -1079.98 |
| US-2020-108966 | Bill Donatelli | Furniture | 2799.96 | -839.98 |
| CA-2020-160415 | Erin Smith | Technology | 2799.96 | -839.98 |
| CA-2021-119549 | Dave Brooks | Furniture | 2799.96 | -699.98 |
ORDER BY — Sorting Results
Syntax: ORDER BY column ASC | DESC
ORDER BY sorts the output rows. ASC is ascending (smallest to largest, A to Z) and is the default. DESC is descending (largest to smallest, Z to A).
-- Top 10 highest-value orders<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>ORDER BY sales DESC<br>LIMIT 10<br>""")
You can sort by multiple columns. The database sorts by the first column, then breaks ties using the second:
-- Sort by region alphabetically, then by sales within each region<br>query("""<br>SELECT region, category, customer_name, sales<br>FROM superstore<br>ORDER BY region ASC, sales DESC<br>LIMIT 20<br>""")
Putting It All Together
Here is one query using all five keywords to answer a specific business question:
Question: What are the top 5 profitable Technology orders in the West region?
query("""<br>SELECT<br>order_id,<br>customer_name,<br>sub_category,<br>sales,<br>profit<br>FROM superstore<br>WHERE category = 'Technology'<br>AND region = 'West'<br>AND profit > 0<br>ORDER BY profit DESC<br>LIMIT 5<br>""")
Output:
| order_id | customer_name | sub_category | sales | profit |
|---|---|---|---|---|
| CA-2021-145317 | Ken Black | Copiers | 3299.98 | 1187.99 |
| CA-2020-127180 | Anne McFarland | Copiers | 2799.98 | 1007.99 |
| CA-2021-162688 | Phillip Schmitt | Phones | 1919.97 | 614.39 |
| CA-2020-100294 | Lena Creighton | Phones | 1679.97 | 537.59 |
| CA-2019-118977 | Sung Shields | Phones | 1439.97 | 460.79 |
Read that query out loud: “Select these columns from Superstore, where category is Technology and region is West and profit is positive, sorted by profit descending, top 5 only.” That maps word-for-word to the business question. This is the core skill of SQL — translating a plain English question into precise syntax.
Common Beginner Mistakes
| Mistake | Wrong | Right |
|---|---|---|
| Quotes on text | WHERE region = West | WHERE region = 'West' |
| NULL comparison | WHERE postal_code = NULL | WHERE postal_code IS NULL |
| Alias in WHERE | WHERE revenue > 500 (alias) | WHERE sales > 500 (original name) |
| AND vs OR confusion | Gets zero results | If a column can’t be two values at once, use OR or IN |
| LIKE case sensitivity | LIKE 'west' misses ‘West’ in some databases | Match case exactly or use LOWER(region) LIKE 'west' |
| ORDER BY without LIMIT | Loads entire table sorted | Add LIMIT n when you only need top or bottom rows |
Practice Exercises
Try writing these queries yourself against your Superstore database before looking at the answers. These are the kinds of questions a real manager might ask.
- Show all orders from the Furniture category with sales greater than $500, sorted by sales descending.
- How many unique customer names are in the dataset? (Hint: use DISTINCT then count the rows in Python)
- Find all orders where the ship mode is ‘First Class’ and the region is ‘East’.
- Show the 10 most recent orders by order date.
- 5. Find all orders where profit is between -$100 and $0 — orders that lost a small amount. Sort by profit ascending.
Writing these yourself — even if you need to look back at the examples — is more valuable than reading the topic a second time. Muscle memory in SQL comes from typing queries, not from reading them.
Summary — What You Can Now Do
By the end of this topic you should be able to write all of these from memory:
- ∙ Select specific columns from a table and rename them with AS
- ∙ Use DISTINCT to find unique values and catch data quality issues early
- ∙ Use LIMIT to safely cap results during exploration
- ∙ Filter rows with WHERE using =, >, <, BETWEEN, IN, LIKE, IS NULL, and IS NOT NULL
- ∙ Use % and _ wildcards in LIKE for pattern matching
- ∙ Combine multiple filters with AND and OR
- ∙ Sort results with ORDER BY across one or multiple columns
- ∙ Write a complete query combining all five keywords to answer a specific business question
Up next — Topic 3: Aggregations
COUNT, SUM, AVG, MIN, MAX, GROUP BY, and HAVING. This is where SQL stops working with individual rows and starts answering questions like “what is total revenue per region?” and “which category has the highest average order value?” That is where the real business analysis begins.
Leave a Reply