27 July 2006

The hash race

Posted by Barnabas under: Programming .

On my earlier post this week about coding for speed, I said that reinventing the framework is bad. That is to say, it’s unlikely that some home-grown bit of code will outperform similar code that’s already been written by finer minds. There are always exceptions. I present to you a case in point: some experimental code on hashing binary data.

Hashing is useful when you need to check to see if any data has changed (a checksum), or if you need to generate a unique number from some arbitrary content. The standard solution to this problem is to generate an MD5 hash. Another more secure option is to generate a SHA hash. These are built into the .NET framework, so you don’t have to know anything other than you can feed a byte array in one end and get a small byte array out the other end. You can then take the byte array and turn it into a number or string or whatever. Another option is to do an XOR hash over the whole thing. A 32-bit example XOR hash function in VB.NET will follow this post.

I ran a short test to compare how long it would take to hash 1 million random bytes 1,000 times using MD5, SHA, and three versions of XOR: 64-bit, 32-bit, or 16-bit.

Hash race chart

In this case, 64-bit XOR hashing was half as slow (or twice as fast?) as MD5, just as MD5 was half the speed of SHA. The lesson is: be familiar with the framework to know your options, but code algorithms yourself when there are obvious speed advantages.

I found the original code for this on this forum, but I’ve since punched it up a bit. Here’s the 32-bit version; the 16 and 64-bit versions are obvious extrapolations.

Private Shared Function ComputeInt32XORHash(ByVal Seed As Int32, _
  ByVal Buffer() As Byte, ByVal Index As Integer, _
  ByVal Length As Integer) As Int32

Dim HashCode As Int32 = Seed

While Index <= Length - 4
    HashCode = HashCode Xor BitConverter.ToInt32(Buffer, Index)
    Index += 4
  End While

If Index < Length Then
    Dim Offset As Integer = 0
    Dim LastBlock(3) As Byte

For Offset = 0 To Length - Index - 1
      LastBlock(Offset) = Buffer(Index + Offset)
    Next
    HashCode = HashCode Xor BitConverter.ToInt32(LastBlock, 0)
  End If

Return HashCode
End Function

Leave a Reply

Categories

Archives

Links

Meta