RSS Feed

Stupid Hash Functions

I’m getting very tired of reading about people implementing Hashable using the following (this is in Swift):

var hashValue : Int {
    get {
        return x ^ y   // x and y are Int
    }
}

Okay, first, let’s examine why this is wrong. The point of a hash function is to “create random data from the nonrandom data” (Knuth, Sorting and Searching) and since x ^ y is equal to y ^ x, it can hardly be considered a method of creating random data. Let us use some real world numbers (represented as hex)

  1. 0x2A ^ 0x40 = 0x6A
  2. 0x40 ^ 0x2A = 0x6A
  3. 0x00 ^ 0x6A = 0x6A
  4. 0x6A ^ 0x00 = 0x6A

A decent hash function? Hardly.

“Oh, but that is what Equitable is for” people have murmured. No, it’s because you slept during class about hashing. Hashing isn’t fun and yes it is hard to create a good hash function, but you didn’t even bother trying.

I don’t like to write hash functions either, but I have at least a basic fundamental understanding of the problems inherent in hash functions. It didn’t take me but a few seconds to come up with four test cases that produced exactly the same hash. Worse, x and y can be swapped and result in the same hash. Position should influence the result of a hash function.

But let us extend the problem to a buffer of four items, w, x, y, z, if you exclusive-or’ed them, you would get the same result had you done it z, y, x, w (go ahead, try it). The problem is that XOR is a communative operation exactly like addition. In fact, in hardware, XOR used to be called “half-adding” because the carry never influenced the next position.

  1. 0x2B ^ 0x5A ^ 0x11 = 0x60
  2. 0x2B ^ 0x11 ^ 0x5A = 0x60
  3. 0x11 ^ 0x5A ^ 0x2B = 0x60
  4. 0x11 ^ 0x2B ^ 0x5A = 0x60
  5. 0x5A ^ 0x11 ^ 0x2B = 0x60
  6. 0x5A ^ 0x2B ^ 0x11 = 0x60

As you can see, three bytes in different order produce exactly the same value. Exclusive or is obviously a poor way to create a hash value.

So this time, instead of pulling something out of our nether regions, let’s try something that makes a little more sense:

var hashValue : Int {
    get {
        return (x * 0x21) ^ y   // x and y are Int
    }
}

This is better. Why? Because we have now made sure that the values are based on their position. This becomes really important when you have a large buffer that needs to be hashed. So, using our previous values we have:

  1. (0x2A * 0x21) ^ 0x40 = 0x52A
  2. (0x40 * 0x21) ^ 0x2A = 0x86A
  3. (0x00 * 0x21) ^ 0x6A = 0x06A
  4. (0x6A * 0x21) ^ 0x00 = 0xDAA

This is a lot closer to creating random data from nonrandom data. Possibly the only minor irritant is the third case where anything with zero will result in the y value as the hash. We can correct this by using a simplified version of Daniel J. Bernstein’s hashing method:

var hashValue : Int {
    get {
        var hash = 5381        
        hash = (hash * 33) + x
        hash = (hash * 33) + y
        return hash
    }
}

Again, we assured that x and y are positional and that zero now affects the resultant hash.

I have hoped I have shown why hashing is not a simple science and why just exclusive-or’ing will result in a horrible hash.

Advertisements

About Catalina Feloneous

Catalina Feloneous (“Call me Cat, I’m not a felon”) has been on Final Fantasy 14 since the initial beta. She is very much into fashion, humor, and games. As a gamer (lower case g) she always plays on Super Easy much to dismay of her friends. She lives on a ranch in Texas with her husband, one horses and two cats. “Feloneous” is misspelled intentionally.

2 responses »

  1. Matthias Zenger

    In Swift it’s actually dangerous to use + and * for computing hash values. Usage of these operators might lead to overflows which terminate the application. I’d suggest using &+ and &* instead.

    Reply
    • Good point!

      The article was addressing the fact that most people don’t put in the time and effort to understand why a good hash function is important.

      But overflows are just as important and are all a part of good software design. Thanks for pointing that out!

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: