27 July 2006
The hash race
Posted by Barnabas under: Programming .
On my earlier post this week about coding for speed, I said that reinventing the framework is bad. That is to say, it’s unlikely that some home-grown bit of code will outperform similar code that’s already been written by finer minds. There are always exceptions. I present to you a case in point: some experimental code on hashing binary data.
Hashing is useful when you need to check to see if any data has changed (a checksum), or if you need to generate a unique number from some arbitrary content. The standard solution to this problem is to generate an MD5 hash. Another more secure option is to generate a SHA hash. These are built into the .NET framework, so you don’t have to know anything other than you can feed a byte array in one end and get a small byte array out the other end. You can then take the byte array and turn it into a number or string or whatever. Another option is to do an XOR hash over the whole thing. A 32-bit example XOR hash function in VB.NET will follow this post.
I ran a short test to compare how long it would take to hash 1 million random bytes 1,000 times using MD5, SHA, and three versions of XOR: 64-bit, 32-bit, or 16-bit.

In this case, 64-bit XOR hashing was half as slow (or twice as fast?) as MD5, just as MD5 was half the speed of SHA. The lesson is: be familiar with the framework to know your options, but code algorithms yourself when there are obvious speed advantages.
I found the original code for this on this forum, but I’ve since punched it up a bit. Here’s the 32-bit version; the 16 and 64-bit versions are obvious extrapolations.
Private Shared Function ComputeInt32XORHash(ByVal Seed As Int32, _
ByVal Buffer() As Byte, ByVal Index As Integer, _
ByVal Length As Integer) As Int32
Dim HashCode As Int32 = Seed
While Index <= Length - 4
HashCode = HashCode Xor BitConverter.ToInt32(Buffer, Index)
Index += 4
End While
If Index < Length Then
Dim Offset As Integer = 0
Dim LastBlock(3) As Byte
For Offset = 0 To Length - Index - 1
LastBlock(Offset) = Buffer(Index + Offset)
Next
HashCode = HashCode Xor BitConverter.ToInt32(LastBlock, 0)
End If
Return HashCode
End Function