The Defence for MD5

Show blog post de­tails
Posted
Hide blog post de­tails

A few days ago, I tried to re­set my pass­word on PR.com, the press re­leases web­site. I en­tered my email, and they sent me the user­name and pass­word in plain text. That’s right, in plain text.

Screenshot of email

The prob­lem with this method of pass­word stor­age is that if any­one gets ac­cess to your data­base, they can lit­er­ally just see the pass­words. This is why hash­ing is used, which con­verts the plain text pass­word to an en­crypted hashed” ver­sion that is, in an ideal world, un­de­crypt­able. The prob­lem with this hash­ing is re­ally about how hash­ing fun­da­men­tally works: col­li­sions are not un­com­mon, i.e., mul­ti­ple strings could have the same hashed string.

For ex­am­ple, if the hash func­tion con­verts all vow­els to X”, then the hash of Hello” is HXllX” and the hash of Hille” is also HXIIX, even though the orig­i­nal strings are def­i­nitely dis­tinct. Of course, real world hash­ing func­tions are math­e­mat­i­cally com­plex, but col­li­sions are still not that un­com­mon. This is why the MD5 and more re­cently SHA-1 hash­ing al­go­rithms aren’t rec­om­mended for se­cu­rity us­ages, and larger ones such as SHA-256 which don’t have any proven col­li­sions so far are.

These two strings have the same MD5 hash:

String 1: 4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
String 2: 4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
Hash:     008ee33a9d58b51cfeb425b0959121c9

The next thing step to safe pass­word stor­age is called salt­ing. Salting is es­sen­tially in­sert­ing char­ac­ters in the string be­fore hash­ing it. Hello” can be­come H1e2l3lo” if you in­sert 123” af­ter every al­ter­nate char­ac­ter. This means that the hashed file is now much more se­cure be­cause an in­truder would have to know the spe­cific salt­ing tech­nique, which is usu­ally based on server time­stamp, to­kens, or some­thing unguess­able.

Now, even though col­li­sions are com­mon in MD5, it’s still much much bet­ter at stor­ing sen­si­tive in­for­ma­tion than plain text. Since in­trud­ers usu­ally just match your hashed file to hashes of com­mon pass­words, dic­tio­nary words, com­bi­na­tions, etc., if you have a nice, long pass­word, the brute force method be­comes in­ef­fi­cient.

This is why, as long as pass­words are lengthy and there­fore rel­a­tively se­cure, outdated” hash­ing al­go­rithms such as MD5 are also ac­tu­ally not a bad choice if it’s as sim­ple as md5($string) vs $string when stor­ing the pass­word. I have a nice long Facebook pass­word, and I’ve de­cided to make its MD5 hash pub­lic to prove my point:

cf7dd0b01c061029778c72facdc14451

Even though it’s just MD5, I don’t think any­one can de­crypt it. Not for 573 quadrillion years, at least.

Footnote: I’m not say­ing that we should use MD5 to sign TLS cer­tifi­cates, that’s crazy talk. All I’m say­ing is that (a) MD5 is bet­ter than plain text, and (b) it works for prac­ti­cal pur­poses, as long as there’s no sen­si­tive data to be ac­cessed and the user has a long, non-dic­tio­nary pass­word.