Hashes, Properly Explained
Cryptographic hash functions are deterministic compressors: any-length input becomes a fixed-length output. That's the whole job. Real hash functions also offer properties — preimage resistance, second-preimage resistance, collision resistance — that make them useful for security. Half of the hashes still in active use have lost one or more of those properties. Knowing which is which keeps you out of the headlines.
A standing caveat for this whole topic: don't use a fast hash for passwords. That's the single most consequential rule in the field, and it's where most "I learned hashing in school" intuitions go wrong. We'll get to it. First, the basics.
What a hash function actually does
A hash function H takes any-length input and produces a fixed-length output (the "digest"). For cryptographic hashes, three properties define usefulness:
- Preimage resistance. Given a digest
d, it's computationally infeasible to find any inputxsuch thatH(x) = d. (Hashes shouldn't be reversible.) - Second-preimage resistance. Given
x, it's computationally infeasible to find a differentx' ≠ xsuch thatH(x') = H(x). (Hashes shouldn't be tampered with.) - Collision resistance. It's computationally infeasible to find any two distinct inputs
x ≠ x'withH(x) = H(x'). (No two inputs share a digest.)
Collision resistance is the strongest of the three. A hash that's "broken" usually means: collision resistance is gone. A hash with broken collision resistance can still be useful for some purposes (HMAC keeps working, file fingerprinting is mostly safe) but is dangerous in others (digital signatures, certificate validation).
The hash families
MD family. MD5 (1992) is comprehensively broken. The first practical collision was published in 2004 by Wang and Yu; chosen-prefix attacks followed in 2007 (the Flame malware exploited this in 2012). MD5 produces a 128-bit digest. Use it only as a non-cryptographic checksum — verifying a file wasn't accidentally corrupted in transit, not verifying it wasn't deliberately altered. Better non-cryptographic options exist (CRC32, xxHash). Stop using MD5 for anything new.
SHA-1. NIST 1995, 160-bit output. Collision resistance is broken. Google's "SHAttered" announcement in February 2017 produced the first known SHA-1 collision (two PDFs with the same hash). SHAmbles in 2020 produced chosen-prefix collisions cheaply. Browsers and certificate authorities deprecated SHA-1 between 2014 and 2017. Git still uses SHA-1, but with collision detection (SHA1DC algorithm) since 2017. New code should not use SHA-1.
SHA-2 family. NIST 2001. Includes SHA-224, SHA-256, SHA-384, SHA-512, plus the truncated SHA-512/224 and SHA-512/256. All considered secure as of 2026. SHA-256 is the modern workhorse — used in Bitcoin, TLS certificates, Linux package managers, Git's planned migration target. SHA-512 is sometimes faster on 64-bit hardware than SHA-256 because it operates on 64-bit words natively.
The SHA-2 family is built on the Merkle–Damgård construction, which has a quirk worth knowing about: length extension. Given H(secret || message) and the length of secret, you can compute H(secret || message || padding || extra) without knowing secret. This breaks naïve "hash the secret with the message to authenticate" schemes. The fix is HMAC; see below.
SHA-3 family. NIST 2015, output of the post-SHA-1 NIST competition. Different internal construction (Keccak / sponge), so length extension doesn't apply. SHA-3-256, SHA-3-512 are the typical choices. Slower than SHA-2 in software on most platforms but resistant to a different class of attacks. If you're picking from scratch in 2026 with no compatibility constraints, SHA-3 or BLAKE3 (below) are the modern choices; if you're integrating with existing systems, SHA-256 is fine.
BLAKE2 / BLAKE3. BLAKE2 (2012) was a SHA-3 finalist that lost to Keccak but is faster and equally secure. BLAKE3 (2020) is parallelizable and very fast — measurably the fastest secure hash on modern hardware. Used in b3sum, some package managers, performance-sensitive systems. Adoption is growing.
The take-home: use SHA-256 for general fingerprinting, SHA-3 or BLAKE3 if you want a modern alternative, and never MD5 or SHA-1 for security-bearing decisions.
HMAC: the right way to combine a key with a hash
If you want to authenticate a message — prove it came from someone with a shared secret — the wrong way is H(secret || message) (vulnerable to length extension on Merkle–Damgård hashes) or H(message || secret) (vulnerable to collision attacks). The right way is HMAC, defined in RFC 2104:
HMAC(K, M) = H((K' XOR opad) || H((K' XOR ipad) || M))
That double-hashing dance immunizes against length extension. Concrete: HMAC-SHA256(key, message) is what you want for message authentication. Most languages ship it: Python's hmac.new, Node's crypto.createHmac, Java's Mac.getInstance("HmacSHA256").
HMAC keys can be any length; the algorithm handles short and long keys. A practical note: keys longer than the hash's block size (64 bytes for SHA-256) are first hashed before use, so excessively long keys give you no extra security.
The big confusion: passwords are different
A general-purpose cryptographic hash like SHA-256 is fast on purpose. Hashing a 1 GB file should take a second or two. Hashing a 16-byte password should take microseconds.
That's a problem for password storage. If your database leaks and an attacker has password hashes, "fast" means they can try billions of guesses per second. Modern GPUs do 10 billion SHA-256 hashes per second. Common passwords are reverse-mapped from a hash via rainbow tables instantly.
Password hashing functions are deliberately slow, deliberately memory-intensive, and parameterized so you can dial up the cost as hardware gets faster. The current options:
- bcrypt (1999). The OG. Configurable cost factor (the "work factor"). Caps at 72-byte passwords (the rest is silently truncated). Still secure for most use cases. Default cost in 2026: 12 (= ~250ms per hash on a modern server).
- scrypt (2009). Adds significant memory cost — even GPUs struggle, because GPU memory is scarce. Configurable N, r, p parameters.
- Argon2 (2015). Winner of the Password Hashing Competition. Three variants: Argon2d (data-dependent, GPU-resistant), Argon2i (data-independent, side-channel-resistant), Argon2id (recommended hybrid). The current consensus best choice if available.
- PBKDF2 (RFC 2898). The oldest of the still-acceptable options. Just iterated HMAC. Not memory-hard, so GPUs eat it. Use only when forced (FIPS compliance, legacy systems). Recommended iteration count keeps climbing — OWASP suggested 600,000 for SHA-256 in 2023.
If you're storing passwords today: use Argon2id. The right parameters are documented in the Argon2 RFC (RFC 9106) — typical: 64 MB memory, 3 iterations, parallelism 4. If your platform doesn't have Argon2, use bcrypt (cost 12+) or scrypt. Use PBKDF2 only when nothing else is available. Use raw SHA-256 never, ever, for passwords. People do; CVEs follow.
Salting
Salt is a per-password random value, stored alongside the hash. Its job: prevent precomputation attacks (rainbow tables) and ensure two users with the same password get different hashes.
Key properties:
- Per password, not per system. A site-wide salt doesn't prevent rainbow tables for that site.
- Random and unique. Cryptographic RNG, at least 16 bytes.
- Stored, not secret. The salt can be in the database next to the hash; the security comes from the salt being unique-per-user.
Modern password-hashing functions like bcrypt, scrypt, and Argon2 generate and embed the salt automatically — the resulting string includes the algorithm parameters, the salt, and the digest in one field. You don't manage salt separately when using these.
A separate concept: pepper — a system-wide secret added to every password hash, stored outside the database (e.g., in an environment variable or HSM). If the database leaks but the pepper doesn't, brute force becomes harder. Useful in defense-in-depth, but the modern recommendation is to lean on Argon2's parameters first; pepper is icing, not foundation.
File integrity vs message authentication
Two superficially similar but distinct use cases:
File integrity. "Did this file get corrupted?" / "Does this file match the published checksum?"
- Use SHA-256.
- Distribute the hash through a separate channel (the publisher's website over HTTPS).
- A naked hash without a trusted source is useless — anyone replacing the file can also replace the hash.
Message authentication. "Was this message produced by someone with the shared secret, and not modified in transit?"
- Use HMAC-SHA256 (or modern AEAD if you also need confidentiality — see the AES post).
- The key has to be secret. The hash of
key || messagedoesn't authenticate; HMAC does.
Digital signatures are a third case — public-key authentication, where the signer holds a private key and verifiers hold the public key. That's not a hash; that's RSA-PSS / ECDSA / EdDSA, with a hash inside as part of the algorithm.
Hash collisions and birthday bounds
For an n-bit hash, expect a collision after ~2^(n/2) random inputs. The "birthday bound." So:
- 128-bit hash (MD5): collision expected at 2^64 ≈ 18 quintillion attempts. That used to feel safe; with a known structural weakness, it's not.
- 256-bit hash (SHA-256): collision expected at 2^128 ≈ 3.4 × 10^38 attempts. Not feasible.
- 512-bit hash: 2^256 attempts. Truly infeasible.
For intentional collisions in a broken hash, the cost is much lower than the birthday bound. SHA-1 chosen-prefix collisions in SHAmbles (2020) cost ~$45,000 of cloud compute. MD5 chosen-prefix collisions are minutes on a laptop.
For uses where collisions are merely undesirable (deduplication of files, content-addressable storage), 256 bits is plenty. For uses where collision is catastrophic (signing a contract, verifying a binary that will run with privilege), use 256+ bits and a hash without known weaknesses.
Common pitfalls
- Using SHA-256 (or worse, MD5) for passwords. Use Argon2id / bcrypt / scrypt / PBKDF2.
H(secret || message)for authentication. Use HMAC.- Reusing a single salt across users. Per-password.
- Site-wide pepper baked into the database (defeats the point — it leaks together).
- Verifying a download against an unsigned hash. The hash needs an authenticated channel.
- Comparing hashes with
==in user-input contexts. Timing attacks. Usecrypto.timingSafeEqual(Node),hmac.compare_digest(Python), constant-time compare in any language. - Truncating digests below 128 bits. Sometimes done for storage; reduces collision resistance to half the truncated length. 128 bits is the practical lower bound; 256 is comfortable.
- Assuming "hashed" means safe. A SHA-256 of
password123matches SHA-256 ofpassword123from any other site, because there's no salt. Hash + salt + slow KDF, not just hash. - Using MD5 in 2026 because it's faster. Use BLAKE3 instead — it's faster and secure.
- Hashing a very long input to get a short identifier and assuming collisions are impossible. They're improbable, not impossible. For 256-bit hashes, "improbable" is good enough; for 64-bit truncations, it isn't.
Choosing a hash
| Use case | Recommended | Acceptable | Avoid |
|---|---|---|---|
| File / data fingerprint | SHA-256 | BLAKE3, SHA-3 | MD5, SHA-1 |
| Message authentication | HMAC-SHA256 | HMAC-BLAKE2/3 | Naïve `H(key |
| Password storage | Argon2id | bcrypt, scrypt | SHA-256, PBKDF2 (last resort) |
| Digital signatures | ECDSA-P256 / Ed25519 (with internal SHA-2) | RSA-PSS-SHA256 | Anything with SHA-1 inside |
| Non-cryptographic checksum (CRC, dedup) | xxHash, CRC32 | BLAKE3 | MD5 if you need cryptographic integrity |
| Random ID generation | UUIDv4 / v7 from CSPRNG | — | Hash of a counter |
The fast/slow distinction is the one to internalize: fast hashes are for fingerprinting data; slow hashes are for passwords. Use the wrong one and either your verifications are useless (slow hash for a checksum) or your security is decorative (fast hash for a password).
Everything else about hashing — internal construction, padding, output truncation — is detail your library handles. The library decisions to make are: (1) which hash algorithm, (2) for passwords, which KDF and what parameters. Get those two right and the rest takes care of itself.
Compute and compare hashes locally
The hash tool on this site computes MD5, SHA-1, SHA-256, SHA-384, and SHA-512 over text or files using the browser's Web Crypto API. Useful for verifying downloads against published checksums or comparing two files quickly. Nothing leaves your browser.
Open the hash toolRelated guides
Keep the session useful with adjacent reading instead of exiting after one article.
QR Codes, Properly Explained
How QR codes actually work — finder patterns, Reed-Solomon error correction, static vs. dynamic redirects, and the real reasons codes fail in print.
Base64, Properly Explained
A 1989 hack for smuggling binary through 7-bit email transports — and why we still use it for JWTs, data URIs, and a hundred other places. Two alphabets, one common decode failure, and the things it categorically isn't.
URL Encoding, Properly Explained
Why %20 and + both mean space, why encodeURI and encodeURIComponent are not interchangeable, and how the HTML form spec quietly invented its own incompatible variant. RFC 3986 vs application/x-www-form-urlencoded.