Clojure

hash is inconsistent with = for some BigInteger and floating point values

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Declined
  • Affects Version/s: Release 1.4, Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Triaged

Description

hash is documented to be consistent with = but Util/hasheq returns different hash values for some pairs of numbers that are =

user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)

Consequences include incorrect behavior for hash-maps containing keys that are =, but have different hash values:

;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}

Patch: clj-1036-hasheq-for-biginteger-patch-v4.txt

Approach:

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.

One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.

Activity

Andy Fingerhut made changes -
Field Original Value New Value
Attachment clj-1036-hasheq-for-biginteger-patch-v1.txt [ 11524 ]
Andy Fingerhut made changes -
Patch Code and Test [ 10002 ]
Stuart Halloway made changes -
Approval Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Not Approved [ 10008 ]
Andy Fingerhut made changes -
Attachment clj-1036-hasheq-for-biginteger-patch-v2.txt [ 11674 ]
Andy Fingerhut made changes -
Attachment clj-1036-hasheq-for-biginteger-patch-v1.txt [ 11524 ]
Andy Fingerhut made changes -
Approval Not Approved [ 10008 ] Vetted [ 10003 ]
Andy Fingerhut made changes -
Description The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.

{noformat}
user=> (apply = [(Long. -1) -1N (biginteger -1)])
true
user=> (map hash [(Long. -1) -1N (biginteger -1)])
(0 0 -1)
user=>
{noformat}

It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different:

{noformat}
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1N :one, -1 :oops!}
{noformat}

The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys:

{noformat}
user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1 :one}
{noformat}
The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.

{noformat}
user=> (apply = [(Long. -1) -1N (biginteger -1)])
true
user=> (map hash [(Long. -1) -1N (biginteger -1)])
(0 0 -1)
user=>
{noformat}

It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different:

{noformat}
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1N :one, -1 :oops!}
{noformat}

The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys:

{noformat}
user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1 :one}
{noformat}

This same misbehavior also occurs for Doubles and Floats:

{noformat}
thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)])
true
thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{noformat}

That leads to the same difference in array-map and hash-map behavior as above for BigInteger and BigInt.
Rich Hickey made changes -
Approval Vetted [ 10003 ] Not Approved [ 10008 ]
Alex Miller made changes -
Approval Not Approved [ 10008 ]
Andy Fingerhut made changes -
Description The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.

{noformat}
user=> (apply = [(Long. -1) -1N (biginteger -1)])
true
user=> (map hash [(Long. -1) -1N (biginteger -1)])
(0 0 -1)
user=>
{noformat}

It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different:

{noformat}
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1N :one, -1 :oops!}
{noformat}

The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys:

{noformat}
user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1 :one}
{noformat}

This same misbehavior also occurs for Doubles and Floats:

{noformat}
thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)])
true
thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{noformat}

That leads to the same difference in array-map and hash-map behavior as above for BigInteger and BigInt.
Problem: {{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}

Demonstration:

{noformat}
user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{noformat}

Consequences include incorrect behavior for hash-maps containing keys that are {{=}}, but have different hash values:

{noformat}
;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}
{noformat}


Current patch: clj-1036-hasheq-for-biginteger-patch-v3.txt

Approach of current patch:

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x.

One alternate approach would be to convert all float values to doubles.
Affects Version/s Release 1.5 [ 10150 ]
Attachment clj-1036-hasheq-for-biginteger-patch-v3.txt [ 12094 ]
Summary Util/hasheq should be hashing a BigInteger to the same values as Long, and BigInt hash is inconsistent with = for some BigInteger and floating point values
Andy Fingerhut made changes -
Attachment clj-1036-hasheq-for-biginteger-patch-v2.txt [ 11674 ]
Alex Miller made changes -
Approval Triaged [ 10120 ]
Description Problem: {{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}

Demonstration:

{noformat}
user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{noformat}

Consequences include incorrect behavior for hash-maps containing keys that are {{=}}, but have different hash values:

{noformat}
;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}
{noformat}


Current patch: clj-1036-hasheq-for-biginteger-patch-v3.txt

Approach of current patch:

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x.

One alternate approach would be to convert all float values to doubles.
{{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}

{code}
user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{code}

Consequences include incorrect behavior for hash-maps containing keys that are {{=}}, but have different hash values:

{code}
;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}
{code}

*Patch:* clj-1036-hasheq-for-biginteger-patch-v3.txt

*Approach:*

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x.

One alternate approach would be to convert all float values to doubles.
Andy Fingerhut made changes -
Andy Fingerhut made changes -
Attachment clj-1036-hasheq-for-biginteger-patch-v3.txt [ 12094 ]
Andy Fingerhut made changes -
Description {{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}

{code}
user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{code}

Consequences include incorrect behavior for hash-maps containing keys that are {{=}}, but have different hash values:

{code}
;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}
{code}

*Patch:* clj-1036-hasheq-for-biginteger-patch-v3.txt

*Approach:*

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x.

One alternate approach would be to convert all float values to doubles.
{{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}

{code}
user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
{code}

Consequences include incorrect behavior for hash-maps containing keys that are {{=}}, but have different hash values:

{code}
;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}
{code}

*Patch:* clj-1036-hasheq-for-biginteger-patch-v4.txt

*Approach:*

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.

One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.
Rich Hickey made changes -
Resolution Declined [ 2 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: