</>
Back to Blog
Format 2026-06-01 10 min read

JSON, Properly Explained

Douglas Crockford published json.org in 2001 with a spec that fit on a single page. He was emphatic that he didn't invent JSON — he extracted a subset of JavaScript object literal syntax, named it, and refused to add anything to it. Twenty-five years later that refusal is the reason JSON is everywhere and XML is in retirement homes. The cost of 'we don't add features' is paid every day by people who wanted comments in their config file.

JSONRFC 8259ECMA-404JSON5NDJSONJSON Pointer

Most developers meet JSON as "the format APIs return." That framing is a trap, because it implies JSON is a protocol with semantics. It isn't. JSON is a syntax — six rules for arranging six kinds of values into a tree of text. Everything else is convention layered on top by whoever's reading it.

That distinction explains nearly every JSON bug you'll meet in production.

The grammar fits on a napkin

JSON has six value types: object, array, string, number, boolean, and null. Objects are unordered key/value collections delimited by {}. Arrays are ordered lists delimited by []. Strings are double-quoted with a small set of escape sequences. Numbers are decimal literals with optional fraction and exponent. Booleans are true and false. The literal null represents nothing.

That's the whole language. There are no statements, no expressions, no functions, no references, no dates, no integers vs floats, no binary, no comments. ECMA-404 is normative and ten pages long, and most of those pages are railroad diagrams.

The two specs that actually exist are RFC 8259 (the IETF version, current since 2017, supersedes RFC 7159 and RFC 4627) and ECMA-404 (the standards-body version, which is even thinner). They agree on the grammar. They disagree slightly on what constitutes a valid top-level value — which is usually irrelevant unless you're writing a parser.

Things JSON deliberately doesn't have

Crockford has been remarkably consistent about refusing extensions. The case for each absence is worth understanding because every other format in the same niche made the opposite choice and paid for it.

  • No comments. Crockford's argument: as soon as you allow comments, people put parser directives in them, and now your comments are part of your protocol. He once said removing comments from JSON was "the only thing I have any regrets about." Most developers think this was a mistake. They're probably wrong, given what happened to the config-file ecosystem (more on that below).
  • No trailing commas. Allowing them makes diffs cleaner — every line of an object or array becomes a self-contained edit instead of one line being special. JSON5 adds them. Standard JSON parsers reject them, hard.
  • No dates. A date in JSON is a string. The convention is RFC 3339 / ISO 8601 (2026-06-01T12:00:00Z), but the spec doesn't enforce that, and you'll meet APIs that return Unix milliseconds, Unix seconds, /Date(123456)/, or pretty-printed locale strings. There is no way to disambiguate without out-of-band knowledge.
  • No integer type. JSON has one number type. Whether 1 is an integer or 1.0 is a float is up to the parser. JavaScript's JSON.parse returns number for both, which is an IEEE 754 double, which can't represent integers larger than 2^53 - 1 exactly. This is the cause of the most common silent JSON bug — see below.
  • No NaN or Infinity. They're not valid JSON. JSON.stringify({x: NaN}) in JavaScript serializes them as null, which is a different value, which silently wrecks downstream parsers expecting numbers.
  • No binary. Want to embed bytes? Base64 them and put them in a string. Pay the 33% size tax. Or use a different format.
  • No references / cycles. Try to JSON.stringify a graph with a cycle and the parser throws. There's no &id / *id syntax like YAML's anchors.

The cost of every one of these absences is real. The benefit is that JSON parsers are tiny, fast, predictable, and exist in every language without a single ambiguity to negotiate. Crockford's bet was that simplicity at the format level would beat features. He won.

The number trap

This is the one that bites everyone in production at least once.

JSON.parse('{"id": 12345678901234567890}').id
// => 12345678901234570000

JavaScript's JSON.parse rounds large integers to the nearest representable double. Twitter learned this the hard way around 2010 when their tweet IDs crossed 2^53; clients silently corrupted IDs and threaded conversations broke. They added a id_str field as a string and told everyone to use that.

Most languages other than JavaScript handle this better. Python's json.loads returns a int of arbitrary precision. Go's encoding/json will, if you use json.Number. Java's Jackson can be configured for BigInteger. But the format itself doesn't tell the parser which is correct, so your producer and consumer have to agree out of band.

The pragmatic rule: if an integer can exceed 2^53 - 1 (= 9,007,199,254,740,991), serialize it as a string. Database IDs, transaction IDs, anything snowflake-shaped. Lose 6 bytes per ID, gain a system that doesn't silently corrupt data on the JavaScript side.

Object key uniqueness

RFC 8259 says: "The names within an object SHOULD be unique." Note the SHOULD. Not MUST.

{"foo": 1, "foo": 2}

This is technically valid JSON. The behavior is "implementation-defined." JavaScript's JSON.parse returns the last value ({foo: 2}). Python's json.loads does the same. Java's Jackson by default takes the last. Go's encoding/json into a struct also takes the last; into a map, also the last.

Almost every implementation agrees by accident, but nothing in the spec requires it. If you're generating JSON and have a duplicate key, you should not assume your consumer sees the value you think it does. If you're parsing JSON and the input has a duplicate key, that input is hostile or buggy and should probably be rejected.

This is also a known security vector: if a request validator and a request handler are different parsers, an attacker can craft {"role": "user", "role": "admin"} and let one parser see "user" while the other sees "admin." Real CVEs have been filed against this.

The escape table

Inside a JSON string, exactly these characters require escaping:

Required Why
" Closes the string
\ Starts an escape
Control characters U+0000 through U+001F Per spec

Plus six named escapes: \b \f \n \r \t \" (and \\), and the \uXXXX form for any Unicode codepoint. The forward slash escape \/ is permitted but never required — your parser may emit it for HTML embedding safety, but the spec doesn't require it. JSON also makes the same UTF-16 surrogate-pair design choice as JavaScript: characters outside the BMP must be encoded as a high/low surrogate pair, like 😀 for 😀.

Importantly: U+2028 and U+2029 (line separator and paragraph separator) are valid in JSON strings unescaped, but were invalid in JavaScript source until ES2019. Until 2019, evaluating JSON.parse output with eval or putting it inline in a <script> tag could throw a syntax error if the string contained one of those characters. (Don't eval JSON. But this is why some older code escapes those even when they're not required.)

NDJSON, JSON Lines, and the streaming problem

Standard JSON is a single value, top to bottom. To stream a million records as a single JSON array, you have to write [, then every record, then ], and any consumer has to parse the whole thing or use a streaming parser.

The pragmatic workaround: NDJSON / JSON Lines — one valid JSON value per line, separated by \n, no enclosing array. Easy to produce (just print(json.dumps(record)) in a loop), trivially streamable (read line, parse line), and resumable (if processing crashes at line 4,000,000, you start from there). Used by Elasticsearch's bulk API, BigQuery imports, OpenAI's batch API, log shippers everywhere.

Files conventionally use .ndjson or .jsonl. There's no formal standard, but everyone agrees on the shape.

JSON5, JSONC, HJSON: when you actually wanted YAML

Configuration files are JSON's worst use case. No comments, no trailing commas, no multiline strings, strict double-quoting — all of these cause friction every day. Several extensions have been proposed:

  • JSON5 — adds comments, trailing commas, single-quoted strings, unquoted keys, hex numbers, leading/trailing decimals, and +/Infinity/NaN. Used by Babel, ESLint configs, some others.
  • JSONC (JSON with Comments) — Microsoft's variant: just adds // and /* comments. Used by VS Code's settings.json, tsconfig.json.
  • HJSON — JSON with significant whitespace and minimal punctuation. Niche.

Each of these is a different format that looks like JSON. None of them parses with a standard JSON parser. If you ship a file ending in .json that needs JSONC's comment support, you've created a class of bugs where downstream tools (CI, deploy scripts, language-server validators) see "invalid JSON" for what looks valid to your editor.

A useful rule: a file named *.json should parse with JSON.parse. If it can't, name it *.jsonc or *.json5 and accept that you've stepped outside the standard.

JSON Pointer, JSON Patch, JSON Schema — the things layered on top

JSON the format has no semantics. JSON the ecosystem has spent twenty years adding semantics in separate, layered specs:

  • JSON Pointer (RFC 6901) — addresses one node in a JSON document via a slash-separated path: /items/0/name. Like an XPath, but trivial.
  • JSON Patch (RFC 6902) — describes a diff as an array of operations: [{"op": "replace", "path": "/items/0/name", "value": "x"}]. Used by Kubernetes, ETCD, some HTTP PATCH APIs.
  • JSON Schema (currently draft-2020-12, almost a Real Standard) — a separate JSON document that describes the shape and validation rules for another JSON document. The spec is enormous compared to JSON itself; the implementations vary in compliance; the keywords accumulated organically. Useful, occasionally maddening.
  • JSON-LD — adds semantic-web metadata (@context, @id, @type) so a JSON object can be interpreted as RDF triples. Used by Schema.org structured data, ActivityPub, some W3C things.

You don't need to know any of these to use JSON, but you'll meet them in API specs.

Common pitfalls

  • Large integers losing precision in JavaScript. Send IDs > 2^53 as strings.
  • Duplicate keys. Don't produce them. Don't assume parser behavior on input.
  • Trailing commas sneaking in from a JS object literal. JSON.parse('{"x":1,}') throws.
  • Unquoted keys. {x: 1} is a JavaScript object literal, not JSON. JSON requires "x".
  • Single quotes. Same — valid JS, invalid JSON.
  • Comments in *.json files. Your editor parses them; your CI doesn't.
  • NaN / Infinity in producers. JSON.stringify writes them as null; some custom serializers write NaN literally, which is invalid JSON; consumers will choke either way. Sanitize numbers before serializing.
  • Mixed encoding. JSON strings are Unicode. UTF-8 is the standard transport encoding (RFC 8259 §8.1). If you have a Latin-1 producer and a UTF-8 consumer, you have a bug.
  • Date conventions. Always RFC 3339 / ISO 8601 with timezone, always as a string, always document it in the API contract. Not Unix milliseconds, not pretty-printed dates.
  • Embedded JSON-in-JSON. When a string field's value is a JSON-serialized object, you've doubled the escape complexity and lost type safety. Don't do this unless a downstream system literally requires it.

When a different format is the right answer

JSON is the right format for: HTTP APIs, structured logs, configuration files when the user doesn't need comments, anywhere you need a polyglot text format.

Better choices when:

  • Configuration with comments and trailing commas → TOML (and YAML if you must, with all its caveats).
  • Schema-strict, perf-critical messages → Protobuf, Avro, Cap'n Proto, MessagePack. JSON's text overhead and parser cost matter at scale.
  • Streaming events → NDJSON if you must stay JSON-shaped, otherwise something like Avro container files.
  • Human prose → Markdown. JSON-as-prose-vehicle ({"text": "..."}) is fine for transport, awful for editing.
  • Binary blobs → don't Base64 into JSON if there's any volume. Use a multipart format or a separate channel.

The lesson

Crockford's discipline about not adding features is what made JSON the lingua franca. Every alternative — YAML, XML, INI, TOML, even Protobuf — has more expressive features and a smaller install base. The tradeoff that wins, in 2026 as in 2001, is "simple enough that every parser agrees, every language has support, every developer already knows it."

The features you wish JSON had — comments, trailing commas, dates, integers — are features you can add at the application layer with conventions, schema, or a sibling format. The simplicity at the protocol layer is non-negotiable; that's the whole point.

If you walk away with one heuristic: don't fight the format. Use JSON for what it's good at, reach for something else when you need what JSON refused to give you, and never put data in JSON that needs precision JSON can't carry.

Format and validate locally

The JSON tool on this site formats, minifies, and validates JSON entirely in the browser, with line-level error messages from the underlying JavaScript parser. Useful for verifying config files before they hit production. Nothing leaves your browser.

Open the JSON tool

Related guides

Keep the session useful with adjacent reading instead of exiting after one article.

View all guides

Cookie Consent

We use cookies to enhance your experience and show relevant ads. You can customize your preferences.