Class MurmurHash3


  • public final class MurmurHash3
    extends java.lang.Object
    Implementation of the MurmurHash3 32-bit and 128-bit hash functions.

    MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop. Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.

    This contains a Java port of the 32-bit hash function MurmurHash3_x86_32 and the 128-bit hash function MurmurHash3_x64_128 from Austin Applyby's original c++ code in SMHasher.

    This is public domain code with no copyrights. From home page of SMHasher:

    "All MurmurHash versions are public domain software, and the author disclaims all copyright to their code."

    Original adaption from Apache Hive. That adaption contains a hash64 method that is not part of the original MurmurHash3 code. It is not recommended to use these methods. They will be removed in a future release. To obtain a 64-bit hash use half of the bits from the hash128x64 methods using the input data converted to bytes.

    Since:
    1.13
    See Also:
    MurmurHash, Original MurmurHash3 c++ code, Apache Hive Murmer3
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_SEED
      A default seed to use for the murmur hash algorithm.
      static long NULL_HASHCODE
      Deprecated.
      This is not used internally and will be removed in a future release.
    • Method Summary

      All Methods Static Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      static long[] hash128​(byte[] data)
      Generates 128-bit hash from the byte array with a default seed.
      static long[] hash128​(byte[] data, int offset, int length, int seed)
      static long[] hash128​(java.lang.String data)
      Deprecated.
      Use hash128x64(byte[]) using the bytes returned from String.getBytes(java.nio.charset.Charset).
      static long[] hash128x64​(byte[] data)
      Generates 128-bit hash from the byte array with a seed of zero.
      static long[] hash128x64​(byte[] data, int offset, int length, int seed)
      Generates 128-bit hash from the byte array with the given offset, length and seed.
      static int hash32​(byte[] data)
      static int hash32​(byte[] data, int length)
      static int hash32​(byte[] data, int length, int seed)
      static int hash32​(byte[] data, int offset, int length, int seed)
      static int hash32​(long data)
      Generates 32-bit hash from a long with a default seed value.
      static int hash32​(long data, int seed)
      Generates 32-bit hash from a long with the given seed.
      static int hash32​(long data1, long data2)
      Generates 32-bit hash from two longs with a default seed value.
      static int hash32​(long data1, long data2, int seed)
      Generates 32-bit hash from two longs with the given seed.
      static int hash32​(java.lang.String data)
      Deprecated.
      Use hash32x86(byte[], int, int, int) with the bytes returned from String.getBytes(java.nio.charset.Charset).
      static int hash32x86​(byte[] data)
      Generates 32-bit hash from the byte array with a seed of zero.
      static int hash32x86​(byte[] data, int offset, int length, int seed)
      Generates 32-bit hash from the byte array with the given offset, length and seed.
      static long hash64​(byte[] data)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      static long hash64​(byte[] data, int offset, int length)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      static long hash64​(byte[] data, int offset, int length, int seed)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      static long hash64​(int data)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      static long hash64​(long data)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      static long hash64​(short data)
      Deprecated.
      Not part of the MurmurHash3 implementation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NULL_HASHCODE

        @Deprecated
        public static final long NULL_HASHCODE
        Deprecated.
        This is not used internally and will be removed in a future release.
        A random number to use for a hash code.
        See Also:
        Constant Field Values
      • DEFAULT_SEED

        public static final int DEFAULT_SEED
        A default seed to use for the murmur hash algorithm. Has the value 104729.
        See Also:
        Constant Field Values
    • Method Detail

      • hash32

        public static int hash32​(long data1,
                                 long data2)
        Generates 32-bit hash from two longs with a default seed value. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(16)
                                                    .putLong(data1)
                                                    .putLong(data2)
                                                    .array(), offset, 16, seed);
         
        Parameters:
        data1 - The first long to hash
        data2 - The second long to hash
        Returns:
        The 32-bit hash
        See Also:
        hash32x86(byte[], int, int, int)
      • hash32

        public static int hash32​(long data1,
                                 long data2,
                                 int seed)
        Generates 32-bit hash from two longs with the given seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(16)
                                                    .putLong(data1)
                                                    .putLong(data2)
                                                    .array(), offset, 16, seed);
         
        Parameters:
        data1 - The first long to hash
        data2 - The second long to hash
        seed - The initial seed value
        Returns:
        The 32-bit hash
        See Also:
        hash32x86(byte[], int, int, int)
      • hash32

        public static int hash32​(long data)
        Generates 32-bit hash from a long with a default seed value. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(8)
                                                    .putLong(data)
                                                    .array(), offset, 8, seed);
         
        Parameters:
        data - The long to hash
        Returns:
        The 32-bit hash
        See Also:
        hash32x86(byte[], int, int, int)
      • hash32

        public static int hash32​(long data,
                                 int seed)
        Generates 32-bit hash from a long with the given seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(8)
                                                    .putLong(data)
                                                    .array(), offset, 8, seed);
         
        Parameters:
        data - The long to hash
        seed - The initial seed value
        Returns:
        The 32-bit hash
        See Also:
        hash32x86(byte[], int, int, int)
      • hash32

        @Deprecated
        public static int hash32​(byte[] data)
        Deprecated.
        Use hash32x86(byte[], int, int, int). This corrects the processing of trailing bytes.
        Generates 32-bit hash from the byte array with a default seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         int hash = MurmurHash3.hash32(data, offset, data.length, seed);
         

        This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.

        Parameters:
        data - The input byte array
        Returns:
        The 32-bit hash
        See Also:
        hash32(byte[], int, int, int)
      • hash32

        @Deprecated
        public static int hash32​(java.lang.String data)
        Deprecated.
        Use hash32x86(byte[], int, int, int) with the bytes returned from String.getBytes(java.nio.charset.Charset). This corrects the processing of trailing bytes.
        Generates 32-bit hash from a string with a default seed.

        Before 1.14 the string was converted using default encoding. Since 1.14 the string is converted to bytes using UTF-8 encoding.

        This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         byte[] bytes = data.getBytes(StandardCharsets.UTF_8);
         int hash = MurmurHash3.hash32(bytes, offset, bytes.length, seed);
         

        This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.

        Parameters:
        data - The input string
        Returns:
        The 32-bit hash
        See Also:
        hash32(byte[], int, int, int)
      • hash32

        @Deprecated
        public static int hash32​(byte[] data,
                                 int length)
        Deprecated.
        Use hash32x86(byte[], int, int, int). This corrects the processing of trailing bytes.
        Generates 32-bit hash from the byte array with the given length and a default seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         int hash = MurmurHash3.hash32(data, offset, length, seed);
         

        This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.

        Parameters:
        data - The input byte array
        length - The length of array
        Returns:
        The 32-bit hash
        See Also:
        hash32(byte[], int, int, int)
      • hash32

        @Deprecated
        public static int hash32​(byte[] data,
                                 int length,
                                 int seed)
        Deprecated.
        Use hash32x86(byte[], int, int, int). This corrects the processing of trailing bytes.
        Generates 32-bit hash from the byte array with the given length and seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int hash = MurmurHash3.hash32(data, offset, length, seed);
         

        This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.

        Parameters:
        data - The input byte array
        length - The length of array
        seed - The initial seed value
        Returns:
        The 32-bit hash
        See Also:
        hash32(byte[], int, int, int)
      • hash32

        @Deprecated
        public static int hash32​(byte[] data,
                                 int offset,
                                 int length,
                                 int seed)
        Deprecated.
        Use hash32x86(byte[], int, int, int). This corrects the processing of trailing bytes.
        Generates 32-bit hash from the byte array with the given offset, length and seed.

        This is an implementation of the 32-bit hash function MurmurHash3_x86_32 from from Austin Applyby's original MurmurHash3 c++ code in SMHasher.

        This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.

        Parameters:
        data - The input byte array
        offset - The offset of data
        length - The length of array
        seed - The initial seed value
        Returns:
        The 32-bit hash
      • hash32x86

        public static int hash32x86​(byte[] data)
        Generates 32-bit hash from the byte array with a seed of zero. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 0;
         int hash = MurmurHash3.hash32x86(data, offset, data.length, seed);
         
        Parameters:
        data - The input byte array
        Returns:
        The 32-bit hash
        Since:
        1.14
        See Also:
        hash32x86(byte[], int, int, int)
      • hash32x86

        public static int hash32x86​(byte[] data,
                                    int offset,
                                    int length,
                                    int seed)
        Generates 32-bit hash from the byte array with the given offset, length and seed.

        This is an implementation of the 32-bit hash function MurmurHash3_x86_32 from from Austin Applyby's original MurmurHash3 c++ code in SMHasher.

        Parameters:
        data - The input byte array
        offset - The offset of data
        length - The length of array
        seed - The initial seed value
        Returns:
        The 32-bit hash
        Since:
        1.14
      • hash64

        @Deprecated
        public static long hash64​(long data)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[]) with the bytes from the long.
        Generates 64-bit hash from a long with a default seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the long. This method will be removed in a future release.

        Note: The sign extension bug in hash64(byte[], int, int, int) does not effect this result as the default seed is positive.

        This is a helper method that will produce the same result as:

         int offset = 0;
         int seed = 104729;
         long hash = MurmurHash3.hash64(ByteBuffer.allocate(8)
                                                  .putLong(data)
                                                  .array(), offset, 8, seed);
         
        Parameters:
        data - The long to hash
        Returns:
        The 64-bit hash
        See Also:
        hash64(byte[], int, int, int)
      • hash64

        @Deprecated
        public static long hash64​(int data)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[]) with the bytes from the int.
        Generates 64-bit hash from an int with a default seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the int. This method will be removed in a future release.

        Note: The sign extension bug in hash64(byte[], int, int, int) does not effect this result as the default seed is positive.

        This is a helper method that will produce the same result as:

         int offset = 0;
         int seed = 104729;
         long hash = MurmurHash3.hash64(ByteBuffer.allocate(4)
                                                  .putInt(data)
                                                  .array(), offset, 4, seed);
         
        Parameters:
        data - The int to hash
        Returns:
        The 64-bit hash
        See Also:
        hash64(byte[], int, int, int)
      • hash64

        @Deprecated
        public static long hash64​(short data)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[]) with the bytes from the short.
        Generates 64-bit hash from a short with a default seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the short. This method will be removed in a future release.

        Note: The sign extension bug in hash64(byte[], int, int, int) does not effect this result as the default seed is positive.

        This is a helper method that will produce the same result as:

         int offset = 0;
         int seed = 104729;
         long hash = MurmurHash3.hash64(ByteBuffer.allocate(2)
                                                  .putShort(data)
                                                  .array(), offset, 2, seed);
         
        Parameters:
        data - The short to hash
        Returns:
        The 64-bit hash
        See Also:
        hash64(byte[], int, int, int)
      • hash64

        @Deprecated
        public static long hash64​(byte[] data)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[]).
        Generates 64-bit hash from a byte array with a default seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data. This method will be removed in a future release.

        Note: The sign extension bug in hash64(byte[], int, int, int) does not effect this result as the default seed is positive.

        This is a helper method that will produce the same result as:

         int offset = 0;
         int seed = 104729;
         long hash = MurmurHash3.hash64(data, offset, data.length, seed);
         
        Parameters:
        data - The input byte array
        Returns:
        The 64-bit hash
        See Also:
        hash64(byte[], int, int, int)
      • hash64

        @Deprecated
        public static long hash64​(byte[] data,
                                  int offset,
                                  int length)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[], int, int, int).
        Generates 64-bit hash from a byte array with the given offset and length and a default seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data. This method will be removed in a future release.

        Note: The sign extension bug in hash64(byte[], int, int, int) does not effect this result as the default seed is positive.

        This is a helper method that will produce the same result as:

         int seed = 104729;
         long hash = MurmurHash3.hash64(data, offset, length, seed);
         
        Parameters:
        data - The input byte array
        offset - The offset of data
        length - The length of array
        Returns:
        The 64-bit hash
        See Also:
        hash64(byte[], int, int, int)
      • hash64

        @Deprecated
        public static long hash64​(byte[] data,
                                  int offset,
                                  int length,
                                  int seed)
        Deprecated.
        Not part of the MurmurHash3 implementation. Use half of the hash bytes from hash128x64(byte[], int, int, int).
        Generates 64-bit hash from a byte array with the given offset, length and seed.

        This is not part of the original MurmurHash3 c++ implementation.

        This is a Murmur3-like 64-bit variant. This method will be removed in a future release.

        This implementation contains a sign-extension bug in the seed initialization. This manifests if the seed is negative.

        This algorithm processes 8 bytes chunks of data in a manner similar to the 16 byte chunks of data processed in the MurmurHash3 MurmurHash3_x64_128 method. However the hash is not mixed with a hash chunk from the next 8 bytes of data. The method will not return the same value as the first or second 64-bits of the function hash128(byte[], int, int, int).

        Use of this method is not advised. Use the first long returned from hash128x64(byte[], int, int, int).

        Parameters:
        data - The input byte array
        offset - The offset of data
        length - The length of array
        seed - The initial seed value
        Returns:
        The 64-bit hash
      • hash128

        public static long[] hash128​(byte[] data)
        Generates 128-bit hash from the byte array with a default seed. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         int hash = MurmurHash3.hash128(data, offset, data.length, seed);
         

        Note: The sign extension bug in hash128(byte[], int, int, int) does not effect this result as the default seed is positive.

        Parameters:
        data - The input byte array
        Returns:
        The 128-bit hash (2 longs)
        See Also:
        hash128(byte[], int, int, int)
      • hash128x64

        public static long[] hash128x64​(byte[] data)
        Generates 128-bit hash from the byte array with a seed of zero. This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 0;
         int hash = MurmurHash3.hash128x64(data, offset, data.length, seed);
         
        Parameters:
        data - The input byte array
        Returns:
        The 128-bit hash (2 longs)
        Since:
        1.14
        See Also:
        hash128x64(byte[], int, int, int)
      • hash128

        @Deprecated
        public static long[] hash128​(java.lang.String data)
        Deprecated.
        Use hash128x64(byte[]) using the bytes returned from String.getBytes(java.nio.charset.Charset).
        Generates 128-bit hash from a string with a default seed.

        Before 1.14 the string was converted using default encoding. Since 1.14 the string is converted to bytes using UTF-8 encoding.

        This is a helper method that will produce the same result as:
         int offset = 0;
         int seed = 104729;
         byte[] bytes = data.getBytes(StandardCharsets.UTF_8);
         int hash = MurmurHash3.hash128(bytes, offset, bytes.length, seed);
         

        Note: The sign extension bug in hash128(byte[], int, int, int) does not effect this result as the default seed is positive.

        Parameters:
        data - The input String
        Returns:
        The 128-bit hash (2 longs)
        See Also:
        hash128(byte[], int, int, int)
      • hash128

        @Deprecated
        public static long[] hash128​(byte[] data,
                                     int offset,
                                     int length,
                                     int seed)
        Deprecated.
        Use hash128x64(byte[], int, int, int). This corrects the seed initialization.
        Generates 128-bit hash from the byte array with the given offset, length and seed.

        This is an implementation of the 128-bit hash function MurmurHash3_x64_128 from from Austin Applyby's original MurmurHash3 c++ code in SMHasher.

        This implementation contains a sign-extension bug in the seed initialization. This manifests if the seed is negative.

        Parameters:
        data - The input byte array
        offset - The first element of array
        length - The length of array
        seed - The initial seed value
        Returns:
        The 128-bit hash (2 longs)
      • hash128x64

        public static long[] hash128x64​(byte[] data,
                                        int offset,
                                        int length,
                                        int seed)
        Generates 128-bit hash from the byte array with the given offset, length and seed.

        This is an implementation of the 128-bit hash function MurmurHash3_x64_128 from from Austin Applyby's original MurmurHash3 c++ code in SMHasher.

        Parameters:
        data - The input byte array
        offset - The first element of array
        length - The length of array
        seed - The initial seed value
        Returns:
        The 128-bit hash (2 longs)
        Since:
        1.14