JSON String Escaping, Properly Explained
JSON's escape rules have been frozen since 2017 (RFC 8259, ECMA-404 second edition). Douglas Crockford deliberately kept JSON's string syntax as a tiny subset of ECMAScript, and that subset choice has consequences that show up most often when round-tripping data through systems with slightly different ideas about what 'string' means.
The escape table
JSON strings allow only a handful of escape sequences, listed exhaustively in RFC 8259 Β§7:
| Escape | Meaning |
|---|---|
\" |
Double quote |
\\ |
Backslash |
\/ |
Forward slash (optional, see below) |
\b |
Backspace, U+0008 |
\f |
Form feed, U+000C |
\n |
Line feed, U+000A |
\r |
Carriage return, U+000D |
\t |
Tab, U+0009 |
\uXXXX |
Any code point in the BMP |
Anything not in this list is illegal as an escape. \a, \v, \x41 β all rejected by a strict parser. JavaScript's string literals support those; JSON's deliberately don't.
The other thing the spec is strict about: control characters U+0000 through U+001F must be escaped inside a string. A literal newline byte in the middle of a JSON string is a parse error, even though it's invisible. This is one of the few places JSON is more demanding than humans expect.
Why \/ exists
The forward slash escape is permitted but never required. "https://example.com" and "https:\/\/example.com" are equivalent JSON.
The reason it exists: HTML allows </ to terminate a <script> tag inside string literals. If you embed JSON in a server-rendered <script> block without escaping, an attacker-controlled string containing </script> can break out of the script context. Encoders that escape / as \/ short-circuit this entire class of injection. Several JSON serializers β including PHP's json_encode by default and many CSP-aware libraries β escape / for exactly this reason.
If you see \/ in output and wonder why, this is the answer.
Non-ASCII and \uXXXX
\uXXXX lets you write any BMP code point as four hex digits. "Γ©" is Γ©. Most JSON encoders use this defensively for non-ASCII characters, even though plain UTF-8 in the string is also legal β emitting \uXXXX is robust against transports that mangle non-ASCII bytes.
For code points above the BMP (anything past U+FFFF β most emoji, all of the supplementary planes), JSON has no native escape. Instead it borrows UTF-16's surrogate pair convention: "π" (U+1F600) is written as "π", two \uXXXX escapes that together encode the supplementary code point. This is unambiguously the right thing for UTF-16-native languages like JavaScript and Java; it's a minor wart for everyone else, who has to recognize the pair and decode it.
The lone surrogate trap
Here's the fun one. JSON allows \uD800 to appear by itself.
U+D800 is a surrogate code point β illegal in well-formed Unicode, valid only as part of a surrogate pair. But RFC 8259 doesn't strictly require well-formed Unicode in strings. A document like {"x": "\uD800"} parses cleanly in most libraries.
The catch: that string cannot be serialized as UTF-8. UTF-8 has no representation for a lone surrogate. A round trip JSON β string β UTF-8 β string β JSON either errors out or silently substitutes U+FFFD (the replacement character), depending on the library.
This is the source of the most surreal JSON bugs. A document parses, gets persisted, and a few hops later the data has changed. The signature is a lone U+FFFD showing up in your text where a meaningful character used to be β usually one half of an emoji that lost its other surrogate somewhere along the way.
If you control the producer, never emit lone surrogates. If you control the consumer, decide explicitly whether to reject them or substitute them. Don't leave it to library defaults.
The U+2028 / U+2029 quirk
Until 2019, JSON was not a subset of JavaScript.
The reason: JSON allows U+2028 (line separator) and U+2029 (paragraph separator) to appear unescaped in strings. JavaScript string literals didn't β those characters were illegal in JS string syntax. So a JSON document containing one of them was valid JSON and invalid JS.
This mattered because of JSONP β the trick where you serve a JSON-like payload wrapped in callback(...) and the browser executes it as JS. A JSON document with a stray U+2028 would fail to parse as JavaScript even though JSON.parse accepted it. Plenty of production outages traced back to this exact mismatch.
ES2019 fixed it by allowing U+2028 and U+2029 in JS string literals. Old engines without that fix are still around in long-tail places (embedded webviews, museum-piece browsers), so production code that emits JSON for direct JS consumption still escapes those two code points defensively as β¨ and β©.
Common pitfalls
- Manually building JSON with string concatenation. A user-controlled string with a
"or\will break the document. This is also a JSON injection vector, the JSON cousin of SQL injection. Use a serializer. - Round-tripping through tools with different surrogate handling. Python's
json.dumps(ensure_ascii=False)produces UTF-8 with literal characters;json.loadsof a document containing\uD800happily produces an invalid surrogate. Two tools, both correct, incompatible result. - Forgetting that control characters need escaping.
\nbecomes\\nliteral in the JSON; an actual newline byte is a parse error. - Treating
\/as wrong. It's optional but legal. Stripping it out of input on the assumption that "well-formed" JSON doesn't have it will eat real data. - Trusting
JSON.parse(str) === str. It doesn't, in either direction. A round-trip can change Unicode normalization, surrogate pairing, and key ordering. Treat JSON as a transport, not as a faithful echo.
Practical rules
- Use a real serializer. Always. The number of cases your hand-rolled escape handles correctly is smaller than you think.
- For untrusted strings being embedded in HTML inside a
<script>tag: prefer a context-aware serializer that escapes<,>,&,',β¨,β©, and/β or render the JSON into a<script type="application/json">block instead, which sidesteps script-tag breakouts entirely. - For storage and APIs: emit UTF-8 with non-ASCII characters left as themselves; reserve
\uXXXXfor control characters. The result is smaller and easier to inspect. - Reject lone surrogates at boundaries. Don't let them propagate.
- Treat JSON output as one-way: parse-then-reserialize is not guaranteed byte-equal, even if it's logically equivalent.
Round-trip any string
The JSON escape tool on this site shows you the exact escape sequences a string would produce, including \uXXXX form for non-ASCII, and decodes the result back. Local-only, useful for verifying that a string survives the round trip.
Open the JSON escape toolRelated guides
Keep the session useful with adjacent reading instead of exiting after one article.
QR Codes, Properly Explained
How QR codes actually work β finder patterns, Reed-Solomon error correction, static vs. dynamic redirects, and the real reasons codes fail in print.
Base64, Properly Explained
A 1989 hack for smuggling binary through 7-bit email transports β and why we still use it for JWTs, data URIs, and a hundred other places. Two alphabets, one common decode failure, and the things it categorically isn't.
URL Encoding, Properly Explained
Why %20 and + both mean space, why encodeURI and encodeURIComponent are not interchangeable, and how the HTML form spec quietly invented its own incompatible variant. RFC 3986 vs application/x-www-form-urlencoded.