How to Write Your First SQL Query: A Beginner’s Guide

SQL Query Basics: SELECT, WHERE, ORDER BY, LIMIT and DISTINCT Explained with Examples

Before writing complex analysis, you need to master five keywords. SELECT, WHERE, ORDER BY, LIMIT, and DISTINCT. These five alone let you answer most basic business questions from any database. Every query you write in this module — and most queries you will write in your career — is built on this foundation.

Why These Five Keywords Matter

When a data analyst joins a company, one of the first things they are asked to do is pull data. Not build a model. Not create a dashboard. Just answer a question: how many orders came in last month? Which region had the highest sales? Which customers haven’t ordered in 90 days?
All of those questions are answered with the five keywords in this topic. SELECT chooses what to show. FROM says where to look. WHERE filters to the relevant rows. ORDER BY sorts the results. LIMIT keeps things manageable. Master these and you can answer the majority of day-to-day analyst requests without writing a single line of Python.
This is also why SQL appears in more data analyst job postings than any other technical skill. It is the universal language of data retrieval. Every database system — MySQL, PostgreSQL, SQL Server, BigQuery, Snowflake — understands these five keywords. Learn them once and they work everywhere.

How SQL Thinks

Every SQL query follows the same skeleton structure. Write it in this order every time:

SELECT -- what columns do you want?<br>FROM -- which table?<br>WHERE -- which rows? (filter — optional)<br>ORDER BY -- in what order? (sort — optional)<br>LIMIT -- how many rows? (cap — optional)

You write queries top to bottom in this order. Internally the database reads FROM first, then WHERE, then SELECT — but you don’t need to worry about that yet. Write top to bottom and it works.
Before starting, add this to the top of your notebook. It sets up your database connection and creates a helper function so every query is one clean line:

import sqlite3
import pandas as pd

# Connect to your Superstore database from Topic 1
conn = sqlite3.connect('superstore.db')

# Helper — run any SQL query in one line
def query(sql):
    return pd.read_sql(sql, conn)

SELECT — Choosing Your Columns

Syntax: SELECT column1, column2 FROM table_name
SELECT tells the database which columns to return. FROM tells it which table to look in. These two always go together and appear in every query you will ever write.
The asterisk * means “all columns”:

-- Get everything from the superstore table<br>query("""<br>SELECT *<br>FROM superstore<br>""")

Avoid SELECT * in real work. It retrieves every column including ones you don’t need, which slows things down on large tables. Name your columns explicitly.

In practice, always specify only the columns you need:

-- Show order dates, regions, and financials only<br>query("""<br>SELECT order_date, region, category, sales, profit<br>FROM superstore<br>""")

Expected output (first 5 rows):

order_dateregioncategorysalesprofit
2020-11-08SouthFurniture261.9641.91
2020-11-08SouthOffice Supplies731.94219.58
2020-06-12WestOffice Supplies14.626.87
2020-06-12WestTechnology957.58-383.03
2020-06-12WestFurniture22.372.52

Renaming Columns with AS

You can rename any column in your output using AS. This is useful when column names are long, unclear, or when you want your output readable for a non-technical audience. The original column name in the database stays unchanged — AS only affects what comes back to you:

query("""<br>SELECT<br>order_id AS order,<br>customer_name AS customer,<br>sales AS revenue,<br>profit AS net_profit<br>FROM superstore<br>""")

A practical use case: if you are preparing output to paste into a report or share with a manager, clean column names matter. net_profit reads better than profit in a business context, and revenue is more universally understood than sales depending on your company’s terminology.

LIMIT — Controlling How Many Rows Come Back

Syntax: SELECT … LIMIT n

LIMIT caps the number of rows returned. Always use it when exploring a large table — you never want to accidentally load tens of thousands of rows into a notebook.

-- Quick preview — return only the first 10 rows<br>query("""<br>SELECT order_date, customer_name, sales<br>FROM superstore<br>LIMIT 10<br>""")

Good habit: Start every exploration with LIMIT 10. Understand what you’re looking at. Then remove the limit when you’re ready to work with the full dataset.

LIMIT also has an important use in combination with ORDER BY — which you will see later in this topic. When you sort by a value and apply a limit, you get a clean top-N or bottom-N result. For example, the top 10 most profitable orders, or the 5 most recent transactions. This pattern appears constantly in real analyst work.

DISTINCT — Finding Unique Values

Syntax: SELECT DISTINCT column FROM table

DISTINCT removes duplicate values and returns only unique entries. Use it routinely when you first encounter a new dataset to understand what values exist in a column.

-- What regions exist in the dataset?<br>query("""<br>SELECT DISTINCT region<br>FROM superstore<br>""")

Output:

region
South
West
Central
East

You can also apply DISTINCT across multiple columns to see unique combinations:

-- What unique category and sub-category combinations exist?<br>query("""<br>SELECT DISTINCT category, sub_category<br>FROM superstore<br>ORDER BY category, sub_category<br>""")

When DISTINCT spans multiple columns it returns unique combinations, not unique values per column individually.


Why DISTINCT Is More Useful Than It Looks

New analysts often underestimate DISTINCT. Beyond simple column exploration, it helps you catch data quality problems early. If a column that should only have four region values suddenly shows five — including a misspelling like “Westt” — DISTINCT will surface it immediately. Use it as a data validation step every time you work with a new table.

-- Validate ship mode values — should only be 4 options<br>query("""<br>SELECT DISTINCT ship_mode<br>FROM superstore<br>ORDER BY ship_mode<br>""")

If you see unexpected values, that is a data quality issue worth flagging before any analysis begins.

WHERE — Filtering Rows

Syntax: SELECT … FROM … WHERE condition

WHERE is the most powerful basic clause in SQL. It filters rows so only the ones matching your condition come back. Everything else is excluded.


WHERE Operators

OperatorWhat It DoesExample
=Equalsregion = 'West'
!= or <>Not equalsregion != 'West'
> / <Greater / less thansales > 1000
BETWEENRange, inclusivesales BETWEEN 100 AND 500
LIKEPattern matchcustomer_name LIKE 'A%'
INMatches a listregion IN ('East', 'West')
IS NULLValue is missingpostal_code IS NULL
ANDBoth conditions trueregion = 'West' AND sales > 500
OREither condition trueregion = 'West' OR region = 'East'

Important: Text values always need single quotes — ‘West’ not West. Numbers do not — sales > 500 not sales > ‘500’. This is one of the most common beginner errors.
Basic Filter Examples

-- All orders from the West region<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>WHERE region = 'West'<br>""")
-- High-value orders only — sales above $1,000<br>query("""<br>SELECT order_id, customer_name, sales, profit<br>FROM superstore<br>WHERE sales > 1000<br>LIMIT 10<br>""")

Using BETWEEN for Ranges

BETWEEN is cleaner than writing two separate conditions when you are filtering a range. Both the lower and upper values are inclusive:

-- Orders with sales between $500 and $1,000<br>query("""<br>SELECT order_id, customer_name, sales, profit<br>FROM superstore<br>WHERE sales BETWEEN 500 AND 1000<br>ORDER BY sales DESC<br>""")

This is identical to writing WHERE sales >= 500 AND sales <= 1000 but easier to read, especially when shared with others.


Combining Conditions with AND and OR

AND means both conditions must be true. OR means at least one must be true.

-- High-value orders specifically in the West region<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>WHERE region = 'West'<br>AND sales > 1000<br>""")

Use IN instead of multiple OR conditions — it’s cleaner and easier to read:

-- Orders from East or West<br>query("""<br>SELECT order_id, region, sales<br>FROM superstore<br>WHERE region IN ('East', 'West')<br>""")

Pattern Matching with LIKE

The % wildcard means “zero or more of any character.” The _ wildcard means exactly one character. Here is how both work in practice:

-- Customers whose name starts with 'A'<br>query("""<br>SELECT DISTINCT customer_name, segment<br>FROM superstore<br>WHERE customer_name LIKE 'A%'<br>ORDER BY customer_name<br>""")
-- Find any sub-category containing the word 'Chair'<br>query("""<br>SELECT DISTINCT sub_category<br>FROM superstore<br>WHERE sub_category LIKE '%Chair%'<br>""")
-- Find products where the second letter is 'a'<br>-- _ means exactly one character<br>query("""<br>SELECT DISTINCT sub_category<br>FROM superstore<br>WHERE sub_category LIKE '_a%'<br>""")

Case sensitivity note: SQLite’s LIKE is case-insensitive for ASCII letters by default, so LIKE ‘west’ will still match ‘West’. However, this varies between database systems. In PostgreSQL, LIKE is case-sensitive. Build the habit of matching case correctly so your queries work everywhere.


Filtering for Missing Values with IS NULL

NULL in SQL means a value is missing or unknown. You cannot check for NULL using the equals operator — you must use IS NULL or IS NOT NULL:

-- Find rows where postal code is missing<br>query("""<br>SELECT order_id, customer_name, city, postal_code<br>FROM superstore<br>WHERE postal_code IS NULL<br>""")
-- Find all rows that DO have a postal code<br>query("""<br>SELECT order_id, customer_name, city, postal_code<br>FROM superstore<br>WHERE postal_code IS NOT NULL<br>LIMIT 10<br>""")

NULL handling trips up beginners constantly.
Remember: = NULL never works. Always use IS NULL.

Finding Loss-Making Orders

-- Which orders lost money? Sorted worst first.<br>query("""<br>SELECT order_id, customer_name, category, sales, profit<br>FROM superstore<br>WHERE profit < 0<br>ORDER BY profit ASC<br>LIMIT 10<br>""")

Output (top 5 shown):

order_idcustomer_namecategorysalesprofit
CA-2019-138422Cindy StewartTechnology4199.93-1199.97
CA-2021-144568Zuschuss DonatelliTechnology3199.95-1079.98
US-2020-108966Bill DonatelliFurniture2799.96-839.98
CA-2020-160415Erin SmithTechnology2799.96-839.98
CA-2021-119549Dave BrooksFurniture2799.96-699.98

ORDER BY — Sorting Results

Syntax: ORDER BY column ASC | DESC

ORDER BY sorts the output rows. ASC is ascending (smallest to largest, A to Z) and is the default. DESC is descending (largest to smallest, Z to A).

-- Top 10 highest-value orders<br>query("""<br>SELECT order_id, customer_name, region, sales<br>FROM superstore<br>ORDER BY sales DESC<br>LIMIT 10<br>""")

You can sort by multiple columns. The database sorts by the first column, then breaks ties using the second:

-- Sort by region alphabetically, then by sales within each region<br>query("""<br>SELECT region, category, customer_name, sales<br>FROM superstore<br>ORDER BY region ASC, sales DESC<br>LIMIT 20<br>""")

Putting It All Together

Here is one query using all five keywords to answer a specific business question:
Question: What are the top 5 profitable Technology orders in the West region?

query("""<br>SELECT<br>order_id,<br>customer_name,<br>sub_category,<br>sales,<br>profit<br>FROM superstore<br>WHERE category = 'Technology'<br>AND region = 'West'<br>AND profit > 0<br>ORDER BY profit DESC<br>LIMIT 5<br>""")

Output:

order_idcustomer_namesub_categorysalesprofit
CA-2021-145317Ken BlackCopiers3299.981187.99
CA-2020-127180Anne McFarlandCopiers2799.981007.99
CA-2021-162688Phillip SchmittPhones1919.97614.39
CA-2020-100294Lena CreightonPhones1679.97537.59
CA-2019-118977Sung ShieldsPhones1439.97460.79

Read that query out loud: “Select these columns from Superstore, where category is Technology and region is West and profit is positive, sorted by profit descending, top 5 only.” That maps word-for-word to the business question. This is the core skill of SQL — translating a plain English question into precise syntax.

Common Beginner Mistakes

MistakeWrongRight
Quotes on textWHERE region = WestWHERE region = 'West'
NULL comparisonWHERE postal_code = NULLWHERE postal_code IS NULL
Alias in WHEREWHERE revenue > 500 (alias)WHERE sales > 500 (original name)
AND vs OR confusionGets zero resultsIf a column can’t be two values at once, use OR or IN
LIKE case sensitivityLIKE 'west' misses ‘West’ in some databasesMatch case exactly or use LOWER(region) LIKE 'west'
ORDER BY without LIMITLoads entire table sortedAdd LIMIT n when you only need top or bottom rows

Practice Exercises

Try writing these queries yourself against your Superstore database before looking at the answers. These are the kinds of questions a real manager might ask.

  1. Show all orders from the Furniture category with sales greater than $500, sorted by sales descending.
  2. How many unique customer names are in the dataset? (Hint: use DISTINCT then count the rows in Python)
  3. Find all orders where the ship mode is ‘First Class’ and the region is ‘East’.
  4. Show the 10 most recent orders by order date.
  5. 5. Find all orders where profit is between -$100 and $0 — orders that lost a small amount. Sort by profit ascending.


Writing these yourself — even if you need to look back at the examples — is more valuable than reading the topic a second time. Muscle memory in SQL comes from typing queries, not from reading them.

Summary — What You Can Now Do

By the end of this topic you should be able to write all of these from memory:

  • ∙ Select specific columns from a table and rename them with AS
  • ∙ Use DISTINCT to find unique values and catch data quality issues early
  • ∙ Use LIMIT to safely cap results during exploration
  • ∙ Filter rows with WHERE using =, >, <, BETWEEN, IN, LIKE, IS NULL, and IS NOT NULL
  • ∙ Use % and _ wildcards in LIKE for pattern matching
  • ∙ Combine multiple filters with AND and OR
  • ∙ Sort results with ORDER BY across one or multiple columns
  • ∙ Write a complete query combining all five keywords to answer a specific business question

Up next — Topic 3: Aggregations

COUNT, SUM, AVG, MIN, MAX, GROUP BY, and HAVING. This is where SQL stops working with individual rows and starts answering questions like “what is total revenue per region?” and “which category has the highest average order value?” That is where the real business analysis begins.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *