</>
Back to Blog
Crypto 2026-05-26 10 min read

Hashes, Properly Explained

Cryptographic hash functions are deterministic compressors: any-length input becomes a fixed-length output. That's the whole job. Real hash functions also offer properties — preimage resistance, second-preimage resistance, collision resistance — that make them useful for security. Half of the hashes still in active use have lost one or more of those properties. Knowing which is which keeps you out of the headlines.

HashMD5SHA-256SHA-3HMACbcryptArgon2

A standing caveat for this whole topic: don't use a fast hash for passwords. That's the single most consequential rule in the field, and it's where most "I learned hashing in school" intuitions go wrong. We'll get to it. First, the basics.

What a hash function actually does

A hash function H takes any-length input and produces a fixed-length output (the "digest"). For cryptographic hashes, three properties define usefulness:

  1. Preimage resistance. Given a digest d, it's computationally infeasible to find any input x such that H(x) = d. (Hashes shouldn't be reversible.)
  2. Second-preimage resistance. Given x, it's computationally infeasible to find a different x' ≠ x such that H(x') = H(x). (Hashes shouldn't be tampered with.)
  3. Collision resistance. It's computationally infeasible to find any two distinct inputs x ≠ x' with H(x) = H(x'). (No two inputs share a digest.)

Collision resistance is the strongest of the three. A hash that's "broken" usually means: collision resistance is gone. A hash with broken collision resistance can still be useful for some purposes (HMAC keeps working, file fingerprinting is mostly safe) but is dangerous in others (digital signatures, certificate validation).

The hash families

MD family. MD5 (1992) is comprehensively broken. The first practical collision was published in 2004 by Wang and Yu; chosen-prefix attacks followed in 2007 (the Flame malware exploited this in 2012). MD5 produces a 128-bit digest. Use it only as a non-cryptographic checksum — verifying a file wasn't accidentally corrupted in transit, not verifying it wasn't deliberately altered. Better non-cryptographic options exist (CRC32, xxHash). Stop using MD5 for anything new.

SHA-1. NIST 1995, 160-bit output. Collision resistance is broken. Google's "SHAttered" announcement in February 2017 produced the first known SHA-1 collision (two PDFs with the same hash). SHAmbles in 2020 produced chosen-prefix collisions cheaply. Browsers and certificate authorities deprecated SHA-1 between 2014 and 2017. Git still uses SHA-1, but with collision detection (SHA1DC algorithm) since 2017. New code should not use SHA-1.

SHA-2 family. NIST 2001. Includes SHA-224, SHA-256, SHA-384, SHA-512, plus the truncated SHA-512/224 and SHA-512/256. All considered secure as of 2026. SHA-256 is the modern workhorse — used in Bitcoin, TLS certificates, Linux package managers, Git's planned migration target. SHA-512 is sometimes faster on 64-bit hardware than SHA-256 because it operates on 64-bit words natively.

The SHA-2 family is built on the Merkle–Damgård construction, which has a quirk worth knowing about: length extension. Given H(secret || message) and the length of secret, you can compute H(secret || message || padding || extra) without knowing secret. This breaks naïve "hash the secret with the message to authenticate" schemes. The fix is HMAC; see below.

SHA-3 family. NIST 2015, output of the post-SHA-1 NIST competition. Different internal construction (Keccak / sponge), so length extension doesn't apply. SHA-3-256, SHA-3-512 are the typical choices. Slower than SHA-2 in software on most platforms but resistant to a different class of attacks. If you're picking from scratch in 2026 with no compatibility constraints, SHA-3 or BLAKE3 (below) are the modern choices; if you're integrating with existing systems, SHA-256 is fine.

BLAKE2 / BLAKE3. BLAKE2 (2012) was a SHA-3 finalist that lost to Keccak but is faster and equally secure. BLAKE3 (2020) is parallelizable and very fast — measurably the fastest secure hash on modern hardware. Used in b3sum, some package managers, performance-sensitive systems. Adoption is growing.

The take-home: use SHA-256 for general fingerprinting, SHA-3 or BLAKE3 if you want a modern alternative, and never MD5 or SHA-1 for security-bearing decisions.

HMAC: the right way to combine a key with a hash

If you want to authenticate a message — prove it came from someone with a shared secret — the wrong way is H(secret || message) (vulnerable to length extension on Merkle–Damgård hashes) or H(message || secret) (vulnerable to collision attacks). The right way is HMAC, defined in RFC 2104:

HMAC(K, M) = H((K' XOR opad) || H((K' XOR ipad) || M))

That double-hashing dance immunizes against length extension. Concrete: HMAC-SHA256(key, message) is what you want for message authentication. Most languages ship it: Python's hmac.new, Node's crypto.createHmac, Java's Mac.getInstance("HmacSHA256").

HMAC keys can be any length; the algorithm handles short and long keys. A practical note: keys longer than the hash's block size (64 bytes for SHA-256) are first hashed before use, so excessively long keys give you no extra security.

The big confusion: passwords are different

A general-purpose cryptographic hash like SHA-256 is fast on purpose. Hashing a 1 GB file should take a second or two. Hashing a 16-byte password should take microseconds.

That's a problem for password storage. If your database leaks and an attacker has password hashes, "fast" means they can try billions of guesses per second. Modern GPUs do 10 billion SHA-256 hashes per second. Common passwords are reverse-mapped from a hash via rainbow tables instantly.

Password hashing functions are deliberately slow, deliberately memory-intensive, and parameterized so you can dial up the cost as hardware gets faster. The current options:

  • bcrypt (1999). The OG. Configurable cost factor (the "work factor"). Caps at 72-byte passwords (the rest is silently truncated). Still secure for most use cases. Default cost in 2026: 12 (= ~250ms per hash on a modern server).
  • scrypt (2009). Adds significant memory cost — even GPUs struggle, because GPU memory is scarce. Configurable N, r, p parameters.
  • Argon2 (2015). Winner of the Password Hashing Competition. Three variants: Argon2d (data-dependent, GPU-resistant), Argon2i (data-independent, side-channel-resistant), Argon2id (recommended hybrid). The current consensus best choice if available.
  • PBKDF2 (RFC 2898). The oldest of the still-acceptable options. Just iterated HMAC. Not memory-hard, so GPUs eat it. Use only when forced (FIPS compliance, legacy systems). Recommended iteration count keeps climbing — OWASP suggested 600,000 for SHA-256 in 2023.

If you're storing passwords today: use Argon2id. The right parameters are documented in the Argon2 RFC (RFC 9106) — typical: 64 MB memory, 3 iterations, parallelism 4. If your platform doesn't have Argon2, use bcrypt (cost 12+) or scrypt. Use PBKDF2 only when nothing else is available. Use raw SHA-256 never, ever, for passwords. People do; CVEs follow.

Salting

Salt is a per-password random value, stored alongside the hash. Its job: prevent precomputation attacks (rainbow tables) and ensure two users with the same password get different hashes.

Key properties:

  • Per password, not per system. A site-wide salt doesn't prevent rainbow tables for that site.
  • Random and unique. Cryptographic RNG, at least 16 bytes.
  • Stored, not secret. The salt can be in the database next to the hash; the security comes from the salt being unique-per-user.

Modern password-hashing functions like bcrypt, scrypt, and Argon2 generate and embed the salt automatically — the resulting string includes the algorithm parameters, the salt, and the digest in one field. You don't manage salt separately when using these.

A separate concept: pepper — a system-wide secret added to every password hash, stored outside the database (e.g., in an environment variable or HSM). If the database leaks but the pepper doesn't, brute force becomes harder. Useful in defense-in-depth, but the modern recommendation is to lean on Argon2's parameters first; pepper is icing, not foundation.

File integrity vs message authentication

Two superficially similar but distinct use cases:

File integrity. "Did this file get corrupted?" / "Does this file match the published checksum?"

  • Use SHA-256.
  • Distribute the hash through a separate channel (the publisher's website over HTTPS).
  • A naked hash without a trusted source is useless — anyone replacing the file can also replace the hash.

Message authentication. "Was this message produced by someone with the shared secret, and not modified in transit?"

  • Use HMAC-SHA256 (or modern AEAD if you also need confidentiality — see the AES post).
  • The key has to be secret. The hash of key || message doesn't authenticate; HMAC does.

Digital signatures are a third case — public-key authentication, where the signer holds a private key and verifiers hold the public key. That's not a hash; that's RSA-PSS / ECDSA / EdDSA, with a hash inside as part of the algorithm.

Hash collisions and birthday bounds

For an n-bit hash, expect a collision after ~2^(n/2) random inputs. The "birthday bound." So:

  • 128-bit hash (MD5): collision expected at 2^64 ≈ 18 quintillion attempts. That used to feel safe; with a known structural weakness, it's not.
  • 256-bit hash (SHA-256): collision expected at 2^128 ≈ 3.4 × 10^38 attempts. Not feasible.
  • 512-bit hash: 2^256 attempts. Truly infeasible.

For intentional collisions in a broken hash, the cost is much lower than the birthday bound. SHA-1 chosen-prefix collisions in SHAmbles (2020) cost ~$45,000 of cloud compute. MD5 chosen-prefix collisions are minutes on a laptop.

For uses where collisions are merely undesirable (deduplication of files, content-addressable storage), 256 bits is plenty. For uses where collision is catastrophic (signing a contract, verifying a binary that will run with privilege), use 256+ bits and a hash without known weaknesses.

Common pitfalls

  • Using SHA-256 (or worse, MD5) for passwords. Use Argon2id / bcrypt / scrypt / PBKDF2.
  • H(secret || message) for authentication. Use HMAC.
  • Reusing a single salt across users. Per-password.
  • Site-wide pepper baked into the database (defeats the point — it leaks together).
  • Verifying a download against an unsigned hash. The hash needs an authenticated channel.
  • Comparing hashes with == in user-input contexts. Timing attacks. Use crypto.timingSafeEqual (Node), hmac.compare_digest (Python), constant-time compare in any language.
  • Truncating digests below 128 bits. Sometimes done for storage; reduces collision resistance to half the truncated length. 128 bits is the practical lower bound; 256 is comfortable.
  • Assuming "hashed" means safe. A SHA-256 of password123 matches SHA-256 of password123 from any other site, because there's no salt. Hash + salt + slow KDF, not just hash.
  • Using MD5 in 2026 because it's faster. Use BLAKE3 instead — it's faster and secure.
  • Hashing a very long input to get a short identifier and assuming collisions are impossible. They're improbable, not impossible. For 256-bit hashes, "improbable" is good enough; for 64-bit truncations, it isn't.

Choosing a hash

Use case Recommended Acceptable Avoid
File / data fingerprint SHA-256 BLAKE3, SHA-3 MD5, SHA-1
Message authentication HMAC-SHA256 HMAC-BLAKE2/3 Naïve `H(key
Password storage Argon2id bcrypt, scrypt SHA-256, PBKDF2 (last resort)
Digital signatures ECDSA-P256 / Ed25519 (with internal SHA-2) RSA-PSS-SHA256 Anything with SHA-1 inside
Non-cryptographic checksum (CRC, dedup) xxHash, CRC32 BLAKE3 MD5 if you need cryptographic integrity
Random ID generation UUIDv4 / v7 from CSPRNG Hash of a counter

The fast/slow distinction is the one to internalize: fast hashes are for fingerprinting data; slow hashes are for passwords. Use the wrong one and either your verifications are useless (slow hash for a checksum) or your security is decorative (fast hash for a password).

Everything else about hashing — internal construction, padding, output truncation — is detail your library handles. The library decisions to make are: (1) which hash algorithm, (2) for passwords, which KDF and what parameters. Get those two right and the rest takes care of itself.

Compute and compare hashes locally

The hash tool on this site computes MD5, SHA-1, SHA-256, SHA-384, and SHA-512 over text or files using the browser's Web Crypto API. Useful for verifying downloads against published checksums or comparing two files quickly. Nothing leaves your browser.

Open the hash tool

Related guides

Keep the session useful with adjacent reading instead of exiting after one article.

View all guides

Cookie Consent

We use cookies to enhance your experience and show relevant ads. You can customize your preferences.