</>
Back to Blog
Dev 2026-05-27 9 min read

UUIDs, Properly Explained

UUIDs were defined by the OSF in the late 1980s, formalized as RFC 4122 in 2005, and revised as RFC 9562 in May 2024. The 9562 revision added three new versions (6, 7, 8), in part because the dominant choice β€” random UUID v4 β€” turned out to be a quietly bad fit for databases. The new v7 is what most teams should be reaching for in 2026, and isn't yet because the documentation lag is roughly two years.

UUIDRFC 9562RFC 4122UUIDv4UUIDv7ULID

A UUID is a 128-bit number, written as 32 hexadecimal characters split into five groups: xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx. The number can be generated by any of several algorithms; the algorithm used is encoded in the value itself, in the four-bit M (version) and the high bits of N (variant). That's the format. The interesting part is which algorithm produced the bits, and what consequences that has.

The versions

RFC 4122 defined v1 through v5. RFC 9562 added v6, v7, v8 and a "max" UUID (ffffffff-...-ffff). Of the eight versions, three matter in real systems:

  • v1 β€” timestamp + clock sequence + MAC address. Time-ordered, but leaks the producing host's MAC.
  • v4 β€” 122 random bits. The default in most languages and ecosystems for the last decade.
  • v7 β€” 48-bit Unix millisecond timestamp + 74 random bits. Time-ordered like v1, no MAC leak, indistinguishable from random within a millisecond.

The other versions are niche:

  • v2 β€” DCE Security; almost never used.
  • v3 β€” namespace + name, hashed with MD5. Deterministic for a given input. Used to generate stable IDs from URLs / DNS names.
  • v5 β€” same as v3 but SHA-1. Prefer v5 over v3 if you're using namespaced UUIDs.
  • v6 β€” v1 with the timestamp reordered so it sorts correctly. v7 obsoletes most of its use cases.
  • v8 β€” custom; the format reserves the bits for whatever you want, as long as you set the version field.

If you're not maintaining a system from before 2024, your options are basically v4 vs v7, with namespaced v5 for the specific case of "I need the same UUID for the same input."

Why v4 became the default (and shouldn't have)

The case for v4 is straightforward: it's random, you don't need a clock, two generators don't need to coordinate, and the collision probability is astronomically low. RFC 4122 Β§4.4 lays out the math: in a population of 122-bit random values, you need to generate roughly 2^61 (about 2 Γ— 10^18) UUIDs before the chance of a single collision exceeds 50%. For practical purposes, v4 collisions don't happen.

The case against v4, which became visible only as systems scaled, is performance. Specifically: random UUIDs as primary keys in a B-tree-indexed database table cause severe write amplification.

A B-tree index sorts by key. When you insert a row, the new key goes into the leaf page where it sorts. With sequential keys (auto-incrementing integers, time-ordered UUIDs), every insert goes to the same leaf page β€” the rightmost one β€” which stays hot in cache. With random keys (UUID v4), every insert goes to an unpredictable leaf page, which means nearly every insert touches a cold page, evicts something else, and writes the page back. Same for the rows themselves on a clustered-index database (MySQL InnoDB, SQL Server).

The benchmarks vary by workload, but the consensus from real-world write-heavy applications: switching from random UUIDs to time-ordered IDs as the primary key gives 2x to 10x throughput improvement, sometimes more. PostgreSQL is less affected (its primary key is non-clustered by default), but indexes on UUID columns suffer the same problem.

Why v7 is the modern default

UUIDv7 from RFC 9562:

0190b67c-1234-7abc-89de-123456789abc
^^^^^^^^^^^^^^                ^^^^^^^^^^^
48-bit Unix-ms timestamp     random bits

The first 48 bits are the Unix millisecond timestamp. The remaining bits are random (with a few used for version/variant). Two consequences:

  • Sortable. A v7 UUID generated at time T sorts before one generated at time T+1ms. B-tree inserts hit hot pages.
  • Time-extractable. You can pull the millisecond timestamp out of any v7 UUID without storing it separately. Useful for debugging ("when was this row created?").

The cost compared to v4:

  • Tiny information leak. Anyone holding a v7 UUID can read its creation time. Usually fine for internal IDs; sometimes not (a public-facing token where reveal of generation time is sensitive).
  • Slightly less random. The collision space is smaller (74 bits vs 122) but still ample β€” at 1000 IDs/ms (extreme), expected first collision is around 2^37 IDs, which is decades of generation.

For database primary keys, public-facing IDs that don't reveal anything sensitive, and event identifiers β€” v7 is the right choice.

For nonces, session tokens, anything where unpredictability is the whole point β€” stay with v4 (or use a dedicated cryptographic-RNG token format).

ULID: the format that almost won

ULID was published in 2016 by Alizain Feerasta, predating UUIDv7 by eight years. It solves the same problem v7 solves, with a slightly different layout and a different encoding:

01ARZ3NDEKTSV4RRFFQ69G5FAV
^^^^^^^^^^   ^^^^^^^^^^^^^^^^
Crockford-base32 timestamp (48-bit ms)
            random (80-bit)

26 characters of Crockford's Base32 (lowercase-friendly, no I/L/O/U to avoid ambiguity), 128 bits total, sortable, time-ordered. Same idea as v7, different bit layout, different encoding.

ULID won mindshare in distributed-systems and event-sourcing communities. UUIDv7 won standardization. They're both fine β€” pick one and stick with it. UUIDv7 has the advantage of being a UUID (so it fits anywhere a UUID is expected, including the uuid column type in databases). ULID has the advantage of a more compact text representation (26 characters vs 36).

If your system is already storing UUIDs and you're migrating to time-ordered IDs, use v7. If you're starting fresh and value the shorter text representation, ULID is reasonable. Don't agonize.

The v1 MAC-address footnote

UUIDv1 includes the producing host's 48-bit MAC address in the last group. This was the original ID-uniqueness mechanism: "if every machine has a unique MAC, every machine produces unique UUIDs." It also leaks the MAC of the issuing machine in every ID.

This was a real privacy issue: in 2008, Microsoft Word documents leaked author MAC addresses through embedded UUIDs, which led at least one criminal investigation back to the author of a malicious document. Modern v1 implementations sometimes randomize the MAC field per session (RFC 4122 Β§ 4.5 explicitly permits this), but you can't tell from the UUID alone whether the host part is real or randomized.

Don't generate v1 UUIDs in 2026. v6 is the spec's blessed migration path; v7 is the better choice if you have the freedom.

Storage: don't store UUIDs as strings

A UUID is 128 bits = 16 bytes. The text representation is 36 characters = 36 bytes (or 32 bytes if you strip the dashes). Storing UUIDs as VARCHAR instead of native UUID / BINARY(16) / RAW(16) doubles or triples the storage cost, makes indexes bigger, and loses range-scan performance. For high-volume tables it adds up to real money.

  • Postgres: native uuid type, 16 bytes. Use it.
  • MySQL: BINARY(16) plus helper functions (UUID_TO_BIN / BIN_TO_UUID, with the 1 flag to swap parts for sorting on v1). MySQL 8.0 added these; before that, you stored as VARCHAR(36) and lived with the cost.
  • SQL Server: native uniqueidentifier, 16 bytes. Note: SQL Server's sort order for uniqueidentifier is unusual (byte-by-byte from the right, not left), so v7 ordering is not preserved unless you encode carefully.
  • SQLite: no native type; store as BLOB (16 bytes) for efficiency, or TEXT for readability.

The ergonomic answer: store as the native binary type, render to text only at the API layer.

Generating UUIDs correctly

For v4: cryptographically random bits. In JavaScript, that's crypto.randomUUID() (browsers, Node 19+) or crypto.getRandomValues. In Python, uuid.uuid4(). In Go, crypto/rand plus the google/uuid package. Don't roll your own from Math.random(); the entropy isn't there for unique IDs.

For v7: same crypto-random + the current Unix-ms timestamp + the version/variant bit-fiddling. Most modern UUID libraries support v7 directly; if yours doesn't, the algorithm is in RFC 9562 Β§5.7. The detail to watch: within the same millisecond, v7 needs to either increment a counter or re-randomize, otherwise you can produce duplicates within a tight loop. Library implementations vary; check yours generates monotonically increasing IDs in tight loops.

Common pitfalls

  • Random UUIDs as clustered primary keys. Switch to v7 / ULID for write-heavy workloads.
  • Storing UUIDs as VARCHAR(36). Use the native type.
  • Generating v4 with non-crypto random. Math.random() is not crypto-random; you'll get duplicates and they'll be hostile-predictable.
  • Assuming UUID format = uniqueness. Two systems can generate the same v3/v5 from the same input by design. v1 with randomized MAC has higher collision risk than v4. v7 within a millisecond from the same machine, if not implemented carefully.
  • Treating v1 like v7. v1's timestamp is in 100ns intervals since 1582-10-15; v7's is Unix milliseconds since 1970. Different epochs, different units.
  • Embedding sensitive timestamps in public IDs. If a v7 UUID's creation time matters (it shouldn't usually), reveal that explicitly; don't rely on people not pulling it out.
  • Not seeding the RNG correctly in forking servers. A pre-fork RNG state shared across workers can produce duplicate UUIDs. Linux's /dev/urandom and getrandom(2) handle this correctly; older srand-style RNGs do not. This bit Postgres' gen_random_uuid documentation in years past β€” it relies on the OS, which is the right answer.
  • Comparing UUIDs as strings. Case-insensitive in spec, case-sensitive in many string comparisons. Normalize to lowercase-no-dashes before comparing.

Migration: from v4 to v7

If you have an existing system using v4 primary keys and you want the v7 performance benefits, the migration is roughly:

  1. Add a new column id_v7 UUID.
  2. Backfill it with a deterministic transform: for new rows, use v7. For old rows, you can leave as-is or generate a v7 with the row's created_at as the timestamp portion (preserves time ordering for historical data).
  3. Switch the application to use id_v7 as the primary key.
  4. Drop the old id column once nothing references it.

The riskier path is to re-issue IDs and update foreign keys; that's a much bigger change. Most teams accept the indirection of having two ID columns during a transition.

When to pick what

  • Database primary keys, event IDs, anything indexed in a B-tree: v7 (or ULID).
  • Public IDs that shouldn't leak time: v4.
  • Session tokens, nonces, CSRF tokens: v4 (or a purpose-built cryptographic token format with stronger guarantees).
  • Stable IDs derived from a known string (URLs, file hashes, DNS names): v5.
  • Anywhere a uuid is expected by an external API: match what they specify.
  • Internal-only correlation IDs: any time-ordered format. Make sure you can read the time out for debugging.

The 2010s default of "use uuid4 everywhere" was a fine choice given what was standardized at the time. The 2026 default should be uuid7 by default, uuid4 when unpredictability is the requirement.

Generate UUIDs and ULIDs locally

The UUID tool on this site generates v1, v4, v7, and ULIDs in the browser, with cryptographic randomness from the Web Crypto API. Useful for inspecting timestamps embedded in v1/v7/ULID values, or producing batches for testing. Nothing leaves your browser.

Open the UUID tool

Related guides

Keep the session useful with adjacent reading instead of exiting after one article.

View all guides

Cookie Consent

We use cookies to enhance your experience and show relevant ads. You can customize your preferences.