</>
Back to Blog
Encoding 2026-06-06 5 min read

JSON String Escaping, Properly Explained

JSON's escape rules have been frozen since 2017 (RFC 8259, ECMA-404 second edition). Douglas Crockford deliberately kept JSON's string syntax as a tiny subset of ECMAScript, and that subset choice has consequences that show up most often when round-tripping data through systems with slightly different ideas about what 'string' means.

JSONJSON EscapeRFC 8259Surrogate PairsUTF-8

The escape table

JSON strings allow only a handful of escape sequences, listed exhaustively in RFC 8259 Β§7:

Escape Meaning
\" Double quote
\\ Backslash
\/ Forward slash (optional, see below)
\b Backspace, U+0008
\f Form feed, U+000C
\n Line feed, U+000A
\r Carriage return, U+000D
\t Tab, U+0009
\uXXXX Any code point in the BMP

Anything not in this list is illegal as an escape. \a, \v, \x41 β€” all rejected by a strict parser. JavaScript's string literals support those; JSON's deliberately don't.

The other thing the spec is strict about: control characters U+0000 through U+001F must be escaped inside a string. A literal newline byte in the middle of a JSON string is a parse error, even though it's invisible. This is one of the few places JSON is more demanding than humans expect.

Why \/ exists

The forward slash escape is permitted but never required. "https://example.com" and "https:\/\/example.com" are equivalent JSON.

The reason it exists: HTML allows </ to terminate a <script> tag inside string literals. If you embed JSON in a server-rendered <script> block without escaping, an attacker-controlled string containing </script> can break out of the script context. Encoders that escape / as \/ short-circuit this entire class of injection. Several JSON serializers β€” including PHP's json_encode by default and many CSP-aware libraries β€” escape / for exactly this reason.

If you see \/ in output and wonder why, this is the answer.

Non-ASCII and \uXXXX

\uXXXX lets you write any BMP code point as four hex digits. "Γ©" is Γ©. Most JSON encoders use this defensively for non-ASCII characters, even though plain UTF-8 in the string is also legal β€” emitting \uXXXX is robust against transports that mangle non-ASCII bytes.

For code points above the BMP (anything past U+FFFF β€” most emoji, all of the supplementary planes), JSON has no native escape. Instead it borrows UTF-16's surrogate pair convention: "πŸ˜€" (U+1F600) is written as "πŸ˜€", two \uXXXX escapes that together encode the supplementary code point. This is unambiguously the right thing for UTF-16-native languages like JavaScript and Java; it's a minor wart for everyone else, who has to recognize the pair and decode it.

The lone surrogate trap

Here's the fun one. JSON allows \uD800 to appear by itself.

U+D800 is a surrogate code point β€” illegal in well-formed Unicode, valid only as part of a surrogate pair. But RFC 8259 doesn't strictly require well-formed Unicode in strings. A document like {"x": "\uD800"} parses cleanly in most libraries.

The catch: that string cannot be serialized as UTF-8. UTF-8 has no representation for a lone surrogate. A round trip JSON β†’ string β†’ UTF-8 β†’ string β†’ JSON either errors out or silently substitutes U+FFFD (the replacement character), depending on the library.

This is the source of the most surreal JSON bugs. A document parses, gets persisted, and a few hops later the data has changed. The signature is a lone U+FFFD showing up in your text where a meaningful character used to be β€” usually one half of an emoji that lost its other surrogate somewhere along the way.

If you control the producer, never emit lone surrogates. If you control the consumer, decide explicitly whether to reject them or substitute them. Don't leave it to library defaults.

The U+2028 / U+2029 quirk

Until 2019, JSON was not a subset of JavaScript.

The reason: JSON allows U+2028 (line separator) and U+2029 (paragraph separator) to appear unescaped in strings. JavaScript string literals didn't β€” those characters were illegal in JS string syntax. So a JSON document containing one of them was valid JSON and invalid JS.

This mattered because of JSONP β€” the trick where you serve a JSON-like payload wrapped in callback(...) and the browser executes it as JS. A JSON document with a stray U+2028 would fail to parse as JavaScript even though JSON.parse accepted it. Plenty of production outages traced back to this exact mismatch.

ES2019 fixed it by allowing U+2028 and U+2029 in JS string literals. Old engines without that fix are still around in long-tail places (embedded webviews, museum-piece browsers), so production code that emits JSON for direct JS consumption still escapes those two code points defensively as 
 and 
.

Common pitfalls

  • Manually building JSON with string concatenation. A user-controlled string with a " or \ will break the document. This is also a JSON injection vector, the JSON cousin of SQL injection. Use a serializer.
  • Round-tripping through tools with different surrogate handling. Python's json.dumps(ensure_ascii=False) produces UTF-8 with literal characters; json.loads of a document containing \uD800 happily produces an invalid surrogate. Two tools, both correct, incompatible result.
  • Forgetting that control characters need escaping. \n becomes \\n literal in the JSON; an actual newline byte is a parse error.
  • Treating \/ as wrong. It's optional but legal. Stripping it out of input on the assumption that "well-formed" JSON doesn't have it will eat real data.
  • Trusting JSON.parse(str) === str. It doesn't, in either direction. A round-trip can change Unicode normalization, surrogate pairing, and key ordering. Treat JSON as a transport, not as a faithful echo.

Practical rules

  • Use a real serializer. Always. The number of cases your hand-rolled escape handles correctly is smaller than you think.
  • For untrusted strings being embedded in HTML inside a <script> tag: prefer a context-aware serializer that escapes <, >, &, ', 
, 
, and / β€” or render the JSON into a <script type="application/json"> block instead, which sidesteps script-tag breakouts entirely.
  • For storage and APIs: emit UTF-8 with non-ASCII characters left as themselves; reserve \uXXXX for control characters. The result is smaller and easier to inspect.
  • Reject lone surrogates at boundaries. Don't let them propagate.
  • Treat JSON output as one-way: parse-then-reserialize is not guaranteed byte-equal, even if it's logically equivalent.

Round-trip any string

The JSON escape tool on this site shows you the exact escape sequences a string would produce, including \uXXXX form for non-ASCII, and decodes the result back. Local-only, useful for verifying that a string survives the round trip.

Open the JSON escape tool

Related guides

Keep the session useful with adjacent reading instead of exiting after one article.

View all guides

Cookie Consent

We use cookies to enhance your experience and show relevant ads. You can customize your preferences.