SQL, Properly Explained
SQL was standardized as ANSI X3.135 in 1986 and ISO 9075 in 1987. The standard has been revised roughly every five years since (SQL:1992, :1999, :2003, :2008, :2011, :2016, :2023), and at no point has any major database fully implemented it. The dialects you actually meet β PostgreSQL, MySQL, SQLite, T-SQL, Oracle, BigQuery, Snowflake, Spark, ClickHouse β agree on the core 80% of CRUD and disagree on virtually everything else. A SQL formatter that doesn't know which dialect it's looking at is guessing.
SQL has the most successful syntax in computing β it has outlived four generations of frameworks and is still what every analyst, engineer, and AI system reaches for when there's a database in the loop. It also has more legacy weirdness per square inch than almost any language in current use. Both facts trace back to the same source: SQL is older than C, older than Unix as we know it, and was designed for a market (relational database vendors in the late 1970s) that explicitly wanted to embrace-extend-extinguish each other's syntax.
This post is for the person who reads SQL across multiple systems and would like the recurring confusions to stop being surprises.
What SQL actually is
SQL is a declarative query language for relational data. You describe the result you want; the database planner figures out how to compute it. Every SQL statement is fundamentally one of:
- DML (Data Manipulation Language):
SELECT,INSERT,UPDATE,DELETE,MERGE. - DDL (Data Definition Language):
CREATE,ALTER,DROP,TRUNCATEfor tables, indexes, views, schemas. - DCL (Data Control Language):
GRANT,REVOKE. - TCL (Transaction Control Language):
BEGIN,COMMIT,ROLLBACK,SAVEPOINT.
The SELECT statement is overwhelmingly the most-used and most-misunderstood. Its full grammar (in the standard, simplified):
SELECT [DISTINCT] columns
FROM tables
[JOIN ... ON ...]
WHERE row_predicate
GROUP BY grouping_columns
HAVING group_predicate
ORDER BY columns
LIMIT n OFFSET m
Almost everyone who's written SQL more than once knows that order. The thing they don't know is that the database doesn't execute it in that order.
The clause execution order
The order you write SQL clauses is for human convenience. The order the database processes them, conceptually, is:
FROMβ pick tables, apply joins.WHEREβ filter rows.GROUP BYβ collapse rows into groups.HAVINGβ filter groups.SELECTβ compute output columns.DISTINCTβ drop duplicate rows.ORDER BYβ sort.LIMIT/OFFSETβ paginate.
Three consequences fall out of this order, and they're responsible for most of the "why doesn't this query work" questions:
- You can't reference a
SELECTalias inWHERE. The alias doesn't exist yet at WHERE-evaluation time. PostgreSQL is strict about this; some dialects (MySQL, BigQuery) allow it as an extension. - You can reference a
SELECTalias inORDER BY. ORDER BY runs after SELECT. HAVINGoperates on groups;WHEREoperates on rows. Filtering onCOUNT(*) > 5belongs in HAVING. Filtering oncountry = 'US'belongs in WHERE β pushing it into HAVING still works, but is slower because you're carrying rows through GROUP BY that you'll then throw away.
The execution order is the conceptual one; the actual query planner reorders aggressively for performance. But the dependency rules above are real semantics, not just optimization.
NULL is not what you think
NULL is SQL's "unknown." It is not zero. It is not empty string. It is not false. The semantics propagate eerily.
SELECT NULL = NULL; -- NULL (NOT true!)
SELECT NULL <> NULL; -- NULL
SELECT NULL = ''; -- NULL
SELECT NULL + 1; -- NULL
SELECT NULL || 'x'; -- NULL (Postgres)
SELECT 1 IN (1, 2, NULL); -- TRUE
SELECT 1 NOT IN (1, 2, NULL); -- NULL β not FALSE
The rule: any operation involving NULL returns NULL (with a few documented exceptions like COALESCE, IS NULL, IS NOT NULL).
The killer is NOT IN with NULL in the subquery result:
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned);
If banned contains a single NULL user_id, this query returns no rows, ever. Because for any user id, id NOT IN (..., NULL, ...) evaluates as NULL, which is not TRUE, so WHERE rejects the row. This bug has shipped to production at every company.
Defenses: prefer NOT EXISTS over NOT IN. Or filter NULLs out of the subquery: WHERE id NOT IN (SELECT user_id FROM banned WHERE user_id IS NOT NULL). Or use a left anti-join.
Another NULL surprise: COUNT(*) includes NULLs; COUNT(column) does not. COUNT(DISTINCT column) also drops NULLs. People who use the wrong COUNT to "count rows where this column is set" usually meant COUNT(*) and got COUNT(column).
Sorting and grouping treat NULLs as a single group, even though NULL = NULL is NULL. Consistency wasn't the point.
JOINs: the four kinds and the cross-join trap
Five join types, in increasing scariness:
INNER JOIN(or justJOIN): rows where the predicate matches in both tables.LEFT JOIN(orLEFT OUTER JOIN): every row from the left, plus matching rows from the right (NULLs where no match).RIGHT JOIN: mirror of LEFT. Avoid; people read top-to-bottom and reverse-direction joins are confusing. Rewrite as LEFT.FULL OUTER JOIN: every row from either side, NULLs where no match. MySQL doesn't support this directly until 8.0; emulate withLEFT JOIN UNION ALL RIGHT JOIN.CROSS JOIN(or,in old syntax): every row from the left paired with every row from the right.O(N*M)rows.
The trap: forgetting the JOIN predicate.
SELECT * FROM orders, customers;
That's a CROSS JOIN. If orders has 1M rows and customers has 100k, you've just asked for 100 billion rows. Most planners will start producing them. ANSI-style JOIN ... ON is loud about missing predicates; comma-style is silent. Stop using comma-joins.
Another subtle one: LEFT JOIN ... ON vs LEFT JOIN ... WHERE. The predicates aren't equivalent.
-- A: keeps every order, even ones with no matching shipment
SELECT * FROM orders o
LEFT JOIN shipments s ON s.order_id = o.id AND s.status = 'sent';
-- B: drops orders whose shipment isn't 'sent' OR missing
SELECT * FROM orders o
LEFT JOIN shipments s ON s.order_id = o.id
WHERE s.status = 'sent';
In B, the WHERE clause turns the LEFT JOIN back into an INNER JOIN, because a non-matching s.status is NULL, and NULL = 'sent' is NULL, which fails the WHERE. People write B intending A all the time.
The dialect map
The SQL standard nominally agrees on the keywords. The dialects disagree on:
- Identifier quoting. Standard: double quotes (
"my column"). MySQL: backticks. T-SQL: square brackets[...]or double quotes ifSET QUOTED_IDENTIFIER ON. Postgres: double quotes, case-sensitive. - String literals. Standard: single quotes. MySQL allows double quotes for strings unless
ANSI_QUOTESis set. - String concatenation. Standard:
||. MySQL:CONCAT(...)(or||only withPIPES_AS_CONCAT). T-SQL:+. LIMIT n OFFSET min Postgres / MySQL / SQLite. T-SQL usesOFFSET m ROWS FETCH NEXT n ROWS ONLY(since 2012). Oracle: similarFETCH FIRSTsyntax in 12c+.- Auto-increment. Postgres:
SERIAL(legacy) orGENERATED ... AS IDENTITY(preferred). MySQL:AUTO_INCREMENT. SQLite:INTEGER PRIMARY KEYis implicitly rowid. T-SQL:IDENTITY(1,1). - Upsert. Postgres:
ON CONFLICT ... DO UPDATE. MySQL:ON DUPLICATE KEY UPDATEorINSERT ... ON DUPLICATE KEY UPDATE. SQLite:ON CONFLICT(similar to Postgres). T-SQL:MERGE(with footnotes β Microsoft has documented enough MERGE bugs that careful users prefer atomic INSERT+UPDATE). - Boolean type. Postgres: real
BOOLEAN. MySQL:BOOLEANis alias forTINYINT(1). SQL Server: no boolean β useBIT. SQLite: any expression result. - JSON functions. Postgres:
->,->>,jsonb_*, mature. MySQL 5.7+:JSON_EXTRACT,->, similar but different operator semantics. SQLite:json_extract. SQL Server:JSON_VALUE,JSON_QUERY. BigQuery, Snowflake: their own. Don't expect portability. - Date arithmetic. Postgres:
now() - interval '1 day'. MySQL:DATE_SUB(NOW(), INTERVAL 1 DAY). T-SQL:DATEADD(DAY, -1, GETDATE()). SQLite:datetime('now', '-1 day').
A formatter that produces output for "SQL" without committing to a dialect will silently mis-tokenize at least some inputs. The version on this site supports 14 dialects explicitly because that's the only way the keyword lists, function names, and identifier rules stay correct.
SQL injection
The lesson, restated for the millionth time because every cohort of new programmers re-learns it: never build SQL by string concatenation with user input.
# Catastrophically wrong
cursor.execute(f"SELECT * FROM users WHERE name = '{name}'")
# Correct
cursor.execute("SELECT * FROM users WHERE name = %s", (name,))
The wrong version, given name = "'; DROP TABLE users; --", builds:
SELECT * FROM users WHERE name = ''; DROP TABLE users; --'
The right version sends the SQL and the parameter to the database separately; the database parses the SQL once and binds the parameter as data, never as code. There is no clever escape function that's safe β the only safe path is parameterized queries (a.k.a. prepared statements, a.k.a. bind variables).
Edge cases people miss:
- Identifier interpolation. Parameters bind values, not column or table names.
SELECT * FROM ?doesn't work. If you must dynamically name a table, validate against an allowlist before string-concatenating. LIKEwildcards. A user-supplied search term might contain%or_, which are wildcards. Escape them in the application or use database-specific functions.- ORM injections. Most ORMs are safe by default but offer raw-SQL escape hatches. Audit the escape hatches.
- Stored procedures. A stored procedure that builds dynamic SQL with
EXECand concatenated input is just as injectable as application code. - Second-order injection. Storing a malicious string and executing it later (e.g., from a log table). Defense: parameterize on the use, not just the input.
The first generation of injection defenses (escape ' to '') is provably incomplete and you should never rely on it. Use parameterized queries; use ORMs; if you must build SQL dynamically, build the structure with code-controlled string concatenation and bind the values separately.
Performance: the pieces a formatter can't help with
A pretty SQL query can still be a slow SQL query. The headline failure modes:
- Missing indexes on JOIN columns, WHERE filter columns, ORDER BY columns. Always check the query plan (
EXPLAIN ANALYZEon Postgres,EXPLAINon MySQL, query plan on SSMS). - Implicit type coercion preventing index use.
WHERE phone = 5551234wherephoneisVARCHARmay force a full-table scan on MySQL because of how it casts. Always match the column type literally. SELECT *in production code. It's fine for ad-hoc; it's a footgun in code where someone adds aBLOBcolumn three years later and queries get 100x slower.- N+1 queries from ORMs that lazy-load relationships. A formatter shows you the SQL but not the loop.
ORpredicates that split queries into two index scans. Sometimes faster asUNION ALLof two separate selects.- Functions on the left side of WHERE.
WHERE LOWER(name) = 'alice'can't use an index onname. Use a functional index or store a lowercase column.
Formatting the query nicely makes the bug legible. It doesn't make it fast.
Common pitfalls
NOT INwith a NULL-returning subquery. Rewrite asNOT EXISTS.COUNT(column)when you wantedCOUNT(*).- GROUP BY column ordinals.
GROUP BY 1works in MySQL/Postgres/BigQuery; ANSI standard requires named columns; T-SQL allows it. Names are clearer. - Implicit cross-joins from a forgotten ON clause.
- Mixing aggregates and non-aggregates without grouping. Postgres rejects; MySQL pre-5.7 silently picks an arbitrary row. Use
GROUP BYproperly. - Date arithmetic across timezones. Always store UTC. Display in the user's timezone at the application layer. The blog post on Unix timestamps goes deeper.
0 = ''in MySQL. Implicit coercion strikes again. Quote strings.COUNT(DISTINCT a, b). Postgres counts NULLs as distinct; some other dialects don't. Check.- Locking surprises.
SELECT ... FOR UPDATEsemantics differ subtly across MySQL InnoDB, Postgres, and SQL Server. Read the docs for the engine you're on. - MERGE in T-SQL has documented bugs around concurrent execution. Aaron Bertrand's "Use Caution with SQL Server's MERGE Statement" is required reading if you write MERGE for SQL Server.
Reading SQL well
If you're handed a 200-line query and asked to debug it, the order of operations is:
- Format it. Indent the clauses. Line up the JOINs. Lowercase or uppercase the keywords consistently.
- Identify the FROM and JOIN structure. Draw it on paper if it's hairy.
- Read the WHERE clause for filtering, then GROUP BY for aggregation, then HAVING for post-aggregate filtering.
- SELECT last β it's the output, not the logic.
- ORDER BY and LIMIT are presentation, not query semantics.
Most "what does this query do" questions resolve to "the formatter unmixed the clauses and the structure became visible." That's most of what a SQL formatter is for: not making the query fast, not making it correct, just making it readable enough that the person debugging it can find the bug.
Beyond formatting, the rules that actually keep you out of trouble: parameterize every user input, treat NULL as a third value not a falsy one, prefer NOT EXISTS over NOT IN, never trust comma-joins, and don't expect anything beyond core CRUD to be portable across dialects.
Format SQL for any major dialect
The SQL formatter on this site supports 14 dialects (PostgreSQL, MySQL, SQLite, T-SQL, PL/SQL, BigQuery, Redshift, Spark SQL, and more) with configurable indentation, keyword case, and minification. Useful when you're reading a one-line CI-generated query that's actually 200 columns of joined CTEs. All client-side.
Open the SQL toolRelated guides
Keep the session useful with adjacent reading instead of exiting after one article.
QR Codes, Properly Explained
How QR codes actually work β finder patterns, Reed-Solomon error correction, static vs. dynamic redirects, and the real reasons codes fail in print.
Base64, Properly Explained
A 1989 hack for smuggling binary through 7-bit email transports β and why we still use it for JWTs, data URIs, and a hundred other places. Two alphabets, one common decode failure, and the things it categorically isn't.
URL Encoding, Properly Explained
Why %20 and + both mean space, why encodeURI and encodeURIComponent are not interchangeable, and how the HTML form spec quietly invented its own incompatible variant. RFC 3986 vs application/x-www-form-urlencoded.