Jan 162012
 

How do you generate MD5 hash in SQL Server? You might immediately tell that it can be generated MD5 by calling HASHBYTES built-in function. That’s true, however, it only accepts generating MD5 hashes from variables which has less than 8000 bytes. How would you generate MD5 hash from big variables?

First Let’s do a quick test here. We will generate MD5 from a string below

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max))
select LEN(@value)
select HASHBYTES ('md5',  @value)
/*
--------------------
8000

(1 row(s) affected)


------------------------------------
0x030D638F1246092F18DB00C1C29A915D

(1 row(s) affected)
*/

The lengh of the string is 8000. We successfully get MD5 of it. Now let’s add one more bytes to @values

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max)) + 0x60
select LEN(@value)
select HASHBYTES ('md5',  @value)
/*

--------------------
8001

(1 row(s) affected)


-------------------------------------------
Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.
*/

In this case, you get error message 8152, “String or binary data would be truncated.” This is because HashBytes only accept 8000 characters.

MD5 hash is one of the fastest hash algorithm in SQL Server one-way encryption. Many data warehouse applications use it to detect row changes. For instance, a table has a row with 50 fields with varchar(max) or image field in the row. Now the updates are delivered by ETL application from the source systems. If the values in a row before the updates are the same as the values after the updates, then the updates should be ignored. To implement this, one option is that you can put all fields into the predicate(where clause) in your update statement. If no changes, then don’t perform the update. However, not all the data types support comparison, such as image data type. For resolving such issues, you can employ MD5 hash.

Before the data is inserted to the table, calculate MD5 on all fields combined and save the MD5 value to the row as a field. Before updating the row, generate MD5 for the new row and then compare the new MD5 with existing MD5. If they are the same, then discard the changes, otherwise, perform update with the new MD5. With the limitatioin as stated above of HASHBYTES system function, you are also limiting the size of the row. You can always develop a CLR function to call .net library to perfrom MD5 for big field. But actually, SQL Server already provides such functionality in master database. The procedure name is called sys.fn_repl_hash_binary. It can only perform MD5 hash from a binary, which is not as diverse as HASHBYTES. However, it simplifies a lot of work for you, such as development, test, deployment, of CLR etc.

declare @value varbinary(max)
select @value  = cast(replicate('a', 8000) as varbinary(max)) + 0x60
select LEN(@value)
select master.sys.fn_repl_hash_binary(@value)
/*

--------------------
8001

(1 row(s) affected)


----------------------------------
0x5475642AF686A60B809A8959FE04A3E8

(1 row(s) affected)


*/
Share/Bookmark

  9 Responses to “Generate MD5 Value from Big Data”

  1. Great article! Thank you very much!

  2. Is this function: fn_repl_hash_binary new in Sql Server 2008?
    I can’t find it in SQL Server 2005.

  3. This doesn’t work for strings over the 8000 character limit. If I run the following the hashes are the same, when they should differ because of the final character over the 8000 character limit. If you replicate one less time and repeat, the hashes are different, so only the first 8000 characters are considered when calculating the hash

    declare @value varbinary(max)
    select @value = cast(replicate(‘a’, 8000) + ‘a’ as varbinary(max)) + 0x60

    select master.sys.fn_repl_hash_binary(@value)

    select @value = cast(replicate(‘a’, 8000) + ‘b’ as varbinary(max)) + 0x60

    select master.sys.fn_repl_hash_binary(@value)

    • Paul, it’s not the limitation of hash_binary. It’s the limitation of replicate function. try this

      declare @value varbinary(max)
      select @value = cast(replicate(cast('a' as varchar(max)), 8000) + 'a' as varbinary(max)) + 0x60
      select master.sys.fn_repl_hash_binary(@value)
      select @value = cast(replicate(cast('a' as varchar(max)), 8000) + 'b' as varbinary(max)) + 0x60
      select master.sys.fn_repl_hash_binary(@value)
  4. Noticed it won’t persist in a computed column, which is an issue if you want to index. Still cool, though.

  5. So, is there a similar function for SHA1 of strings > 8000 bytes in SQL?

Add Comment Register



 Leave a Reply

(required)

(required)

C# | HTML | Plain Text | SQL | XHTML | XML | XSLT |

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="">

Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: