MD5 Collision

While I am reading back to my blog Generate MD5 Value from Big Data, I should have mention MD5 collision. This issue is within the standard algorithm not SQL Server itself. It happens rarely for character based hashing. It’s still a very reliable algorithm for change detecting but you need to be aware of this. Here is the code snippet of MD5 collision.

declare @a varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
declare @b varbinary(130)  = 0xd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70 
select LEN(@a) LengthA, LEN(@b) LengthB
select case when @a = @b then 1 else 0 end [@a = @b ?]
select HASHBYTES('MD5', @a) [Hash from @a], HASHBYTES('MD5', @b) [hash from @b];
/*
LengthA     LengthB
----------- -----------
128         128

(1 row(s) affected)

@a = @b ?
-----------
0

(1 row(s) affected)

Hash from @a                       hash from @b
---------------------------------- ----------------------------------
0x79054025255FB1A26E4BC422AEF54EB4 0x79054025255FB1A26E4BC422AEF54EB4

(1 row(s) affected)

*/
		

Generate MD5 Value from Big Data

How do you generate MD5 hash in SQL Server? You might immediately tell that it can be generated MD5 by calling HASHBYTES built-in function. That’s true, however, it only accepts generating MD5 hashes from variables which has less than 8000 bytes. How would you generate MD5 hash from big variables?

Continue reading “Generate MD5 Value from Big Data”