SQL for Beginners
SQL Basics for Beginners
This post summarizes the most frequently used SQL syntax and concepts, including filtering, aggregation, joins, subqueries, and window functions, especially helpful when using tools like pandasql or BigQuery.
1. SQL Execution Order
Although SQL queries are typically written in the order of WITH, SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY and LIMIT, they are logically executed as the following:
1
FROM > WHERE > GROUP BY > HAVING > SELECT > DISTINCT > ORDER BY > LIMIT
You might not understand what these means yet, but don’t worry. We’ll cover it soon.
2. Filtering with Conditions
Basic WHERE Clause
The WHERE clause is used to filter rows based on specific conditions. Only rows that satisfy the condition will be returned in the result.
1
2
3
SELECT *
FROM table
WHERE score > 80;
In this example, only rows where the score is greater than 80 will be selected.
Logical Operators
Logical operators like AND, OR, IN, NOT, and Between A and B are used to combine multiple conditions in a WHERE clause.
1
WHERE gender = 'M' AND age Between 20 and 30;
This query returns only rows where the gender is male ('M') and the age is between 20 and 30.
1
WHERE city IN ('Seoul', 'Busan');
This condition filters for rows where the city is either 'Seoul' or 'Busan'.
NULL Check
Use IS NULL to filter for rows where a column has no value. This is useful when identifying missing or incomplete data.
1
WHERE email IS NULL;
This query returns only the rows where the email column contains no value. NOTE! It is NULL ,not zero or empty string, but truly missing data.
3. Sorting and Limiting Results
Sorting Rows
The ORDER BY clause is used to sort the result set based on one or more columns. By default, it sorts in ascending order (ASC), but you can specify descending order (DESC) as well.
1
ORDER BY score DESC;
This sorts the result by the score column in descending order, showing the highest scores first.
Limiting Output
The LIMIT clause restricts the number of rows returned in the result set. It is especially useful when previewing large datasets.
1
LIMIT 10;
This returns only the first 10 rows of the result set.
4. Aggregation & Grouping
Aggregate Functions
Aggregate functions summarize data across multiple rows. Common examples include:
SUM(): total sumAVG(): average valueCOUNT(): number of rowsMIN()/MAX(): smallest / largest value
1
2
3
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
This groups employees by department and returns the average salary for each.
Filtering Groups: HAVING
HAVING filters results after aggregation. Unlike WHERE, it can reference aggregate functions.
1
2
3
4
SELECT category, SUM(sales) AS total_sales
FROM orders
GROUP BY category
HAVING total_sales > 100000;
This returns only the categories whose total sales exceed 100,000.
Note: If you try to use WHERE with SUM(sales) instead of HAVING, it will cause an error. Because WHERE filters rows before aggregation, it cannot access aggregate functions like SUM().
5. Subqueries
Subqueries allow you to use the result of one query inside another. They are often used in WHERE or FROM clauses when the data you need depends on other data.
Subquery in WHERE Clause
You can place a subquery inside a WHERE clause to filter rows based on a result from another query.
1
2
3
4
5
6
7
SELECT *
FROM users
WHERE id IN (
SELECT user_id
FROM purchases
WHERE amount > 10000
);
This query selects users who have made a purchase over 10,000 in amount. The subquery returns user_ids that meet the condition, and the outer query filters the users table to include only those IDs.
Subquery in FROM Clause (Inline View)
You can also place a subquery in the FROM clause, treating it as a temporary table (called an inline view).
1
2
3
4
5
6
SELECT region, AVG(temp)
FROM (
SELECT region, temp
FROM weather
WHERE month = 'July'
) AS sub;
This query first filters the weather data for July, then calculates the average temperature per region based on that filtered subset.
6. Table Joins
Joining tables allows you to combine rows from two or more tables based on a related column. This is useful when data is spread across multiple tables.
Inner Join
An INNER JOIN returns only the rows where there is a matching value in both tables.
1
2
3
4
SELECT A.name, B.score
FROM students AS A
JOIN scores AS B
ON A.id = B.student_id;
This query returns the names and scores of students, only if they have matching entries in both students and scores tables based on the student ID.
Left Join
A LEFT JOIN returns all rows from the left table and the matched rows from the right table. If there is no match, NULLs will be returned for columns from the right table.
1
2
3
4
SELECT A.name, B.score
FROM students AS A
LEFT JOIN scores AS B
ON A.id = B.student_id;
This query returns all students along with their scores if available. If a student has no score entry, the score column will show NULL.
Note: RIGHT JOIN is similar but returns all rows from the right table instead, with NULLs for missing matches from the left table.
7. Window Functions
Window functions allow you to perform calculations across rows that are related to the current row, without collapsing them into a single result. These are useful for tasks like ranking, cumulative totals, or accessing values from other rows in a group.
Rank & Row Numbers
These functions assign rankings or row numbers to each row within a partition, based on the specified order.
1
2
3
4
SELECT name, score,
RANK() OVER (ORDER BY score DESC) AS rank,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num
FROM students;
This query ranks students by score in descending order. RANK() assigns the same rank to tied scores (with gaps), while ROW_NUMBER() assigns a unique row number to each row regardless of ties.
Lag & Lead
These functions retrieve values from preceding (LAG) or following (LEAD) rows within a partition.
1
2
3
4
SELECT user_id, timestamp,
LAG(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS prev_time,
LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp) AS next_time
FROM logs;
This example returns both the previous and next timestamp for each user, ordered by time. If there is no previous or next row, the result will be NULL.
8. Other Useful Syntax
Pattern Matching with LIKE
The LIKE operator is used in a WHERE clause to search for a specific pattern in a column. % is a wildcard that matches zero or more characters.
1
2
WHERE name LIKE 'Kim%'; -- Starts with 'Kim'
WHERE comment LIKE '%error%'; -- Contains 'error'
These queries filter for rows where name starts with ‘Kim’ or where comment contains the word ‘error’.
Aliasing Columns (AS)
Aliases let you rename a column or table temporarily for readability or convenience.
1
2
SELECT name AS n
FROM users;
This renames the name column to n in the result set.
Union for Appending Rows
UNION is used to combine the results of two SELECT queries. It removes duplicate rows by default. To include duplicates, use UNION ALL.
1
2
3
SELECT * FROM Table1 AS t1
UNION
SELECT * FROM Table2 AS t2;
This merges the rows from both tables. All columns must match in number and type.
9. Type Conversion
The CAST() function allows you to convert a value from one data type to another. This is useful when the default type doesn’t match your analysis needs, such as turning a string into a number.
1
2
SELECT CAST(price AS INT) AS p_int
FROM sales;
This example assumes that the price column is originally a string(str) and needs to be converted into an integer(int) to perform calculations.
10. Notes for pandasql and BigQuery Users
- Use
julianday()for datetime subtraction inpandasql. - In
pandasql,UNIONremoves duplicates by default; no need to addDISTINCT. (In fact, explicitly addingDISTINCTmay cause an error.) IFstatements are written asIIF(condition, true_result, false_result)
DONE!
You’ve now completed a full beginner friendly guide to SQL.
Was this guide helpful?
Leave a comment if you have questions,
Hit Like if it made your day!