While I am reading back to my blog Generate MD5 Value from Big Data, I should have mention MD5 collision. This issue is within the standard algorithm not SQL Server itself. It happens rarely for character based hashing. It’s still a very reliable algorithm for change detecting but you need to be aware of this. Here is the code snippet of MD5 collision.

declare @a varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
declare @b varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70 
select LEN(@a) LengthA, LEN(@b) LengthB
select case when @a = @b then 1 else 0 end [@a = @b ?]
select HASHBYTES('MD5', @a) [Hash from @a], HASHBYTES('MD5', @b) [hash from @b];
/*
LengthA     LengthB
----------- -----------
128         128

(1 row(s) affected)

@a = @b ?
-----------
0

(1 row(s) affected)

Hash from @a                       hash from @b
---------------------------------- ----------------------------------
0x79054025255FB1A26E4BC422AEF54EB4 0x79054025255FB1A26E4BC422AEF54EB4

(1 row(s) affected)

*/
		
MD5 Collision

You May Also Like

4 thoughts on “MD5 Collision

  1. I knew that collisions could happen but I hadn’t seen an actual example before so this was interesting. I had an application where I had to detect duplicate email attachments so I don’t load the same data twice just because someone did a reply all. I chose SHA1 rather than MD5 and it hasn’t let me down. However, my risk isn’t great as it isn’t a security issue if a collision did occur and we would get a followup if a collision caused my application to ignore something that wasn’t a duplicate. This is something we have to keep an eye on as you can see here:

    http://www.rsa.com/rsalabs/node.asp?id=2738

  2. Wow, John, interesting post. I assume you found this example collision by brute force looping through sample data? Or did you use some other method? I’m just wondering what the statistical likelihood of a collision is for any given set of data…

    1. MD5 algorithm is broken (wiki), you can pretty easily find even 2 block collison collisions (2^18), or generate a file with the same hash as some other has (common prefix attack). Example of some other collisions.

  3. I don’t think I can find 2 unmatched binary but with the same MD5 in my life if I use brute force looping…;) I found this pair long time ago and saved it on my hard drive. I think someone else use mathematical approach figure it out.
    You might need to check Google to find statistical likelihood of a collision. From database world, it’s very unlikely getting collision except comparing small images.

Leave a Reply to John H Cancel reply

Your email address will not be published. Required fields are marked *

C# | HTML | Plain Text | SQL | XHTML | XML | XSLT |

This site uses Akismet to reduce spam. Learn how your comment data is processed.