Encoding 2026-06-08 7 min read

URL Encoding, Properly Explained

Tim Berners-Lee added percent-encoding to URLs in 1991 because the only existing precedent for escaping characters in plain text was shell quoting, and that was a mess. The result has gone through six RFC revisions and is still a leading source of subtle bugs — because two slightly different variants are now both 'standard,' depending on whether you're building a URL or submitting an HTML form.

URL EncodingPercent EncodingRFC 3986encodeURIComponentForm URLencoded

What it actually is

Percent-encoding is a way to put any byte into a URL as a printable ASCII triple: % followed by two hex digits. %20 is byte 0x20, the ASCII space. %E4%BD%A0 is three bytes (0xE4 0xBD 0xA0) which together happen to be UTF-8 for "你". The encoding doesn't know or care; it operates on bytes.

Which means percent-encoding by itself doesn't specify how text becomes bytes. RFC 3986 (the current URI spec, from 2005) says use UTF-8 unless a scheme says otherwise. Most modern URL handlers do this. Older ones sometimes don't, which is why a Latin-1 query parameter that scans fine in IE6 becomes garbled in Chrome — the bytes are encoded by one charset and decoded as another. The percent-encoding round-trips perfectly; the interpretation of the resulting bytes is where the bug lives.

Reserved vs unreserved

The RFC 3986 alphabet splits characters into two camps:

Unreserved. Letters, digits, and the four characters -, ., _, ~. These never need encoding. An encoder that emits %41 instead of A is technically wrong — RFC 3986 §2.4 says producers should not encode unreserved characters.

Reserved. Characters with structural meaning in a URL. Two further sub-camps:

gen-delims: :, /, ?, #, [, ], @. These delimit the major URL components.
sub-delims: !, $, &, ', (, ), *, +, ,, ;, =. These have meaning inside specific components.

Whether a reserved character needs encoding depends on where it sits. A ? in a path segment must be encoded — otherwise the parser thinks the query string starts there. A ? inside a query string is also a delimiter, but later occurrences are usually accepted as data. The rules are scope-dependent, which is the part nobody internalizes.

This is why JavaScript ships two functions:

encodeURI() — assumes you're encoding an entire URL, so it leaves reserved characters alone.
encodeURIComponent() — assumes you're encoding a single component (one path segment, one query value), so it encodes almost everything reserved.

You almost always want encodeURIComponent. The only legitimate use of encodeURI is escaping spaces in a URL you've otherwise hand-built, which is rare and usually a sign you should be using a URL builder library instead.

The form-data variant

When HTML forms submit, browsers don't use RFC 3986 percent-encoding. They use a variant defined by the WHATWG HTML spec called application/x-www-form-urlencoded. The differences:

Spaces become + instead of %20.
The character set used to convert text to bytes is the form's accept-charset, often UTF-8 but not guaranteed.
More characters are aggressively encoded.

This is why a query string from an HTML form looks like ?q=hello+world and one built by JS using encodeURIComponent looks like ?q=hello%20world. Both are legal, both decode to "hello world" — but only because virtually every server-side parser knows to handle both.

It's also why + in a URL is ambiguous. Inside a query string, it usually means space. Inside a path, it usually doesn't. If you have a literal + you want to preserve in a query value, you must encode it as %2B. This is the source of the perennial "phone numbers in query strings" bug: +1-555-0100 round-trips as 1-555-0100 if your decoder follows form-encoded rules.

Double encoding

The single most common bug in this space.

You take ?q=hello world, encode it to ?q=hello%20world, then encode the whole URL again somewhere downstream — %20 becomes %2520, because the % itself got percent-encoded as %25. Your server now sees the literal string hello%20world as the query value, including the percent sign, instead of hello world.

This happens whenever someone encodes once before passing to a library that encodes again. Or when a frontend percent-encodes for display, then a backend percent-encodes for redirection. The signature is %25 showing up where it shouldn't. The fix is figuring out which layer should own encoding and removing it from everywhere else — usually the layer closest to the wire wins, and everything upstream should hand it raw strings.

Internationalized domain names are different

A URL like https://例.jp/ is not percent-encoded in the host portion. Hosts use Punycode (RFC 3492), which encodes Unicode as xn-- ASCII strings: 例.jp becomes xn--fsq.jp. This is a totally separate mechanism from percent-encoding and applies only to the host. Path and query stay percent-encoded.

If you're trying to "URL-encode" a domain name and it isn't working, that's why — domain names need IDNA processing, not percent-encoding. Conflating the two will silently produce URLs that resolve in your test browser and 404 on someone else's.

Common pitfalls

Using encodeURI where you needed encodeURIComponent. The result is a URL where reserved characters in your data (an & in a search query, say) are interpreted as URL structure.
Decoding a form-encoded payload with a strict RFC 3986 decoder. The + characters survive as + instead of becoming spaces.
Encoding twice. Look for %25 in your inputs.
Building URLs by concatenating strings instead of using a URL builder or query-string library. The library knows the rules; your + '&q=' + value does not.
Encoding a # in a query value and forgetting that the fragment delimiter has higher precedence than the query delimiter. Many parsers strip everything from # onward before parsing the query, so a literal # in a query value must be %23.

Practical rules

Use encodeURIComponent for query parameters and path segments. Never encodeURI. Almost never raw concatenation.
If a + ends up where you didn't expect, your URL is being treated as form-encoded by something downstream.
Encode once. The layer closest to the wire owns it.
Hosts use Punycode, not percent-encoding. Two distinct mechanisms.
Spaces should be %20 in paths. + is acceptable only in query strings, and only because of HTML-form heritage.
Don't reach for percent-encoding to escape data inside JSON or HTML — those have their own escape mechanisms; reusing percent-encoding silently bakes character-set assumptions into your data.

Try both flavors in the browser

The URL encoder on this site supports both RFC 3986 and form-encoded variants side by side. Useful when you're trying to figure out why your + got read as a space, or why your %20 didn't.

Open the URL encoder