MD5 Collision

While I am reading back to my blog Generate MD5 Value from Big Data, I should have mention MD5 collision. This issue is within the standard algorithm not SQL Server itself. It happens rarely for character based hashing. It’s still a very reliable algorithm for change detecting but you need to be aware of this. Here is the code snippet of MD5 collision.

declare @a varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
declare @b varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70 
select LEN(@a) LengthA, LEN(@b) LengthB
select case when @a = @b then 1 else 0 end [@a = @b ?]
select HASHBYTES('MD5', @a) [Hash from @a], HASHBYTES('MD5', @b) [hash from @b];
LengthA     LengthB
----------- -----------
128         128

(1 row(s) affected)

@a = @b ?

(1 row(s) affected)

Hash from @a                       hash from @b
---------------------------------- ----------------------------------
0x79054025255FB1A26E4BC422AEF54EB4 0x79054025255FB1A26E4BC422AEF54EB4

(1 row(s) affected)


Generate MD5 Value from Big Data

How do you generate MD5 hash in SQL Server? You might immediately tell that it can be generated MD5 by calling HASHBYTES built-in function. That’s true, however, it only accepts generating MD5 hashes from variables which has less than 8000 bytes. How would you generate MD5 hash from big variables?

Continue reading “Generate MD5 Value from Big Data”