</>
Back to Blog
Encoding 2026-06-08 7 min read

URL Encoding, Properly Explained

Tim Berners-Lee added percent-encoding to URLs in 1991 because the only existing precedent for escaping characters in plain text was shell quoting, and that was a mess. The result has gone through six RFC revisions and is still a leading source of subtle bugs — because two slightly different variants are now both 'standard,' depending on whether you're building a URL or submitting an HTML form.

URL EncodingPercent EncodingRFC 3986encodeURIComponentForm URLencoded

What it actually is

Percent-encoding is a way to put any byte into a URL as a printable ASCII triple: % followed by two hex digits. %20 is byte 0x20, the ASCII space. %E4%BD%A0 is three bytes (0xE4 0xBD 0xA0) which together happen to be UTF-8 for "你". The encoding doesn't know or care; it operates on bytes.

Which means percent-encoding by itself doesn't specify how text becomes bytes. RFC 3986 (the current URI spec, from 2005) says use UTF-8 unless a scheme says otherwise. Most modern URL handlers do this. Older ones sometimes don't, which is why a Latin-1 query parameter that scans fine in IE6 becomes garbled in Chrome — the bytes are encoded by one charset and decoded as another. The percent-encoding round-trips perfectly; the interpretation of the resulting bytes is where the bug lives.

Reserved vs unreserved

The RFC 3986 alphabet splits characters into two camps:

Unreserved. Letters, digits, and the four characters -, ., _, ~. These never need encoding. An encoder that emits %41 instead of A is technically wrong — RFC 3986 §2.4 says producers should not encode unreserved characters.

Reserved. Characters with structural meaning in a URL. Two further sub-camps:

  • gen-delims: :, /, ?, #, [, ], @. These delimit the major URL components.
  • sub-delims: !, $, &, ', (, ), *, +, ,, ;, =. These have meaning inside specific components.

Whether a reserved character needs encoding depends on where it sits. A ? in a path segment must be encoded — otherwise the parser thinks the query string starts there. A ? inside a query string is also a delimiter, but later occurrences are usually accepted as data. The rules are scope-dependent, which is the part nobody internalizes.

This is why JavaScript ships two functions:

  • encodeURI() — assumes you're encoding an entire URL, so it leaves reserved characters alone.
  • encodeURIComponent() — assumes you're encoding a single component (one path segment, one query value), so it encodes almost everything reserved.

You almost always want encodeURIComponent. The only legitimate use of encodeURI is escaping spaces in a URL you've otherwise hand-built, which is rare and usually a sign you should be using a URL builder library instead.

The form-data variant

When HTML forms submit, browsers don't use RFC 3986 percent-encoding. They use a variant defined by the WHATWG HTML spec called application/x-www-form-urlencoded. The differences:

  • Spaces become + instead of %20.
  • The character set used to convert text to bytes is the form's accept-charset, often UTF-8 but not guaranteed.
  • More characters are aggressively encoded.

This is why a query string from an HTML form looks like ?q=hello+world and one built by JS using encodeURIComponent looks like ?q=hello%20world. Both are legal, both decode to "hello world" — but only because virtually every server-side parser knows to handle both.

It's also why + in a URL is ambiguous. Inside a query string, it usually means space. Inside a path, it usually doesn't. If you have a literal + you want to preserve in a query value, you must encode it as %2B. This is the source of the perennial "phone numbers in query strings" bug: +1-555-0100 round-trips as 1-555-0100 if your decoder follows form-encoded rules.

Double encoding

The single most common bug in this space.

You take ?q=hello world, encode it to ?q=hello%20world, then encode the whole URL again somewhere downstream — %20 becomes %2520, because the % itself got percent-encoded as %25. Your server now sees the literal string hello%20world as the query value, including the percent sign, instead of hello world.

This happens whenever someone encodes once before passing to a library that encodes again. Or when a frontend percent-encodes for display, then a backend percent-encodes for redirection. The signature is %25 showing up where it shouldn't. The fix is figuring out which layer should own encoding and removing it from everywhere else — usually the layer closest to the wire wins, and everything upstream should hand it raw strings.

Internationalized domain names are different

A URL like https://例.jp/ is not percent-encoded in the host portion. Hosts use Punycode (RFC 3492), which encodes Unicode as xn-- ASCII strings: 例.jp becomes xn--fsq.jp. This is a totally separate mechanism from percent-encoding and applies only to the host. Path and query stay percent-encoded.

If you're trying to "URL-encode" a domain name and it isn't working, that's why — domain names need IDNA processing, not percent-encoding. Conflating the two will silently produce URLs that resolve in your test browser and 404 on someone else's.

Common pitfalls

  • Using encodeURI where you needed encodeURIComponent. The result is a URL where reserved characters in your data (an & in a search query, say) are interpreted as URL structure.
  • Decoding a form-encoded payload with a strict RFC 3986 decoder. The + characters survive as + instead of becoming spaces.
  • Encoding twice. Look for %25 in your inputs.
  • Building URLs by concatenating strings instead of using a URL builder or query-string library. The library knows the rules; your + '&q=' + value does not.
  • Encoding a # in a query value and forgetting that the fragment delimiter has higher precedence than the query delimiter. Many parsers strip everything from # onward before parsing the query, so a literal # in a query value must be %23.

Practical rules

  • Use encodeURIComponent for query parameters and path segments. Never encodeURI. Almost never raw concatenation.
  • If a + ends up where you didn't expect, your URL is being treated as form-encoded by something downstream.
  • Encode once. The layer closest to the wire owns it.
  • Hosts use Punycode, not percent-encoding. Two distinct mechanisms.
  • Spaces should be %20 in paths. + is acceptable only in query strings, and only because of HTML-form heritage.
  • Don't reach for percent-encoding to escape data inside JSON or HTML — those have their own escape mechanisms; reusing percent-encoding silently bakes character-set assumptions into your data.

Try both flavors in the browser

The URL encoder on this site supports both RFC 3986 and form-encoded variants side by side. Useful when you're trying to figure out why your + got read as a space, or why your %20 didn't.

Open the URL encoder

Related guides

Keep the session useful with adjacent reading instead of exiting after one article.

View all guides

Cookie Consent

We use cookies to enhance your experience and show relevant ads. You can customize your preferences.