How do you generate MD5 hash in SQL Server? You might immediately tell that it can be generated MD5 by calling HASHBYTES built-in function. That’s true, however, it only accepts generating MD5 hashes from variables which has less than 8000 bytes. How would you generate MD5 hash from big variables?

First Let’s do a quick test here. We will generate MD5 from a string below

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max))
select LEN(@value)
select HASHBYTES ('md5',  @value)
/*
--------------------
8000

(1 row(s) affected)


------------------------------------
0x030D638F1246092F18DB00C1C29A915D

(1 row(s) affected)
*/

The lengh of the string is 8000. We successfully get MD5 of it. Now let’s add one more bytes to @values

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max)) + 0x60
select LEN(@value)
select HASHBYTES ('md5',  @value)
/*

--------------------
8001

(1 row(s) affected)


-------------------------------------------
Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.
*/

In this case, you get error message 8152, “String or binary data would be truncated.” This is because HashBytes only accept 8000 characters.

MD5 hash is one of the fastest hash algorithm in SQL Server one-way encryption. Many data warehouse applications use it to detect row changes. For instance, a table has a row with 50 fields with varchar(max) or image field in the row. Now the updates are delivered by ETL application from the source systems. If the values in a row before the updates are the same as the values after the updates, then the updates should be ignored. To implement this, one option is that you can put all fields into the predicate(where clause) in your update statement. If no changes, then don’t perform the update. However, not all the data types support comparison, such as image data type. For resolving such issues, you can employ MD5 hash.

Before the data is inserted to the table, calculate MD5 on all fields combined and save the MD5 value to the row as a field. Before updating the row, generate MD5 for the new row and then compare the new MD5 with existing MD5. If they are the same, then discard the changes, otherwise, perform update with the new MD5. With the limitatioin as stated above of HASHBYTES system function, you are also limiting the size of the row. You can always develop a CLR function to call .net library to perfrom MD5 for big field. But actually, SQL Server already provides such functionality in master database. The procedure name is called sys.fn_repl_hash_binary. It can only perform MD5 hash from a binary, which is not as diverse as HASHBYTES. However, it simplifies a lot of work for you, such as development, test, deployment, of CLR etc.

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max)) + 0x60
select LEN(@value)
select master.sys.fn_repl_hash_binary(@value)
/*

--------------------
8001

(1 row(s) affected)


----------------------------------
0x5475642AF686A60B809A8959FE04A3E8

(1 row(s) affected)


*/
Generate MD5 Value from Big Data

You May Also Like

13 thoughts on “Generate MD5 Value from Big Data

  1. This doesn’t work for strings over the 8000 character limit. If I run the following the hashes are the same, when they should differ because of the final character over the 8000 character limit. If you replicate one less time and repeat, the hashes are different, so only the first 8000 characters are considered when calculating the hash

    declare @value varbinary(max)
    select @value = cast(replicate(‘a’, 8000) + ‘a’ as varbinary(max)) + 0x60

    select master.sys.fn_repl_hash_binary(@value)

    select @value = cast(replicate(‘a’, 8000) + ‘b’ as varbinary(max)) + 0x60

    select master.sys.fn_repl_hash_binary(@value)

    1. Paul, it’s not the limitation of hash_binary. It’s the limitation of replicate function. try this

      declare @value varbinary(max)
      select @value = cast(replicate(cast('a' as varchar(max)), 8000) + 'a' as varbinary(max)) + 0x60
      select master.sys.fn_repl_hash_binary(@value)
      select @value = cast(replicate(cast('a' as varchar(max)), 8000) + 'b' as varbinary(max)) + 0x60
      select master.sys.fn_repl_hash_binary(@value)
  2. Noticed it won’t persist in a computed column, which is an issue if you want to index. Still cool, though.

  3. This function is available with no need to compile, etc in the SQL# library (which I am the author of). There are two versions of it, Util_Hash which returns a string version of the bytes and Util_HashBinary which returns a binary version of the bytes. These functions support the following algorithms: MD5, SHA1, SHA256, SHA384, and SHA512.

  4. In SQL Server 2017 the 8000 character limit on HASHBYTES has been lifted, so it can be used for any data.

  5. To bring things up to date, note that SQL Server 2017 removes the 8000 character limit on HASHBYTES.

Leave a Reply to Arnold Lieberman Cancel reply

Your email address will not be published. Required fields are marked *

C# | HTML | Plain Text | SQL | XHTML | XML | XSLT |

This site uses Akismet to reduce spam. Learn how your comment data is processed.