The key thing about *cryptographic* hashes (not mere checksums) like SHA1 is tha...

dragontamer · on Oct 13, 2021

> Effectively, they're all the same.

No they're not.

Birthday attack says that a 160-bit perfect cryptographic hash will have a collision with just 80-bits, on the average. This means that an 80-bit burst-error would probabilistically contain a potential SHA1 collision. (80-bits burst error doesn't mean that all the bits are flipped btw: it means that 80-bits have been randomized)

In contrast, CRC is designed specifically against burst errors. CRC is "regular" and "tweaked" in such a way that a 160-bit CRC would be immune to 160-bit burst errors of any and all kinds!

So if you care about burst errors, then CRC is in fact, better, than crypto-level hashes. And in practice, burst errors are the primary error that occurs in practice (scratches on a CD-ROM, bad sectors on a hard drive, lightning storm cuts out a few microseconds of WiFi, etc. etc.)

That is: noise isn't random in the real world. Noise is "clustered" around bursty events in practice.

--------

If burst-error is king, you can do far, far better than random methodologies. CRC is proof of that. That's why error distributions matter.

jiggawatts · on Oct 14, 2021

I did say that the birthday attack doesn't apply!

It only applies if you're comparing a large set of samples against each other. An example would be a "content-based indexing" system where a database primary key is the hash. Every insert then compares the hash against every entry that already exists. If there are 1 billion stored items, each 1 insert can have a potential collision with all 1 billion.

For validation, you have 1 input being compared against 1 valid value (or its hash/crc). There's no "billion inputs" in this scenario... just 1 potentially corrupt vs 1 known good.

Hence, no birthday attack.

It's the difference between two random people meeting and having the same birthday, versus any two people in a room full of people having the same birthday. Not the same scenario!

In practice, cryptographic hashes are always superior to checksums, once both have more than 128 bits. They're both strong enough, but the cryptographic has is resistant to deliberate attacks. The CRC won't be.