Clojure

hash is inconsistent with = for some BigInteger and floating point values

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Declined
  • Affects Version/s: Release 1.4, Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Triaged

Description

hash is documented to be consistent with = but Util/hasheq returns different hash values for some pairs of numbers that are =

user> (apply = [-1 -1N (biginteger -1)])
true
user> (map hash [-1 -1N (biginteger -1)])
(0 0 -1)

user> (apply = [(Float. 1e9) (Double. 1e9)])
true
user> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)

Consequences include incorrect behavior for hash-maps containing keys that are =, but have different hash values:

;; Incorrect return value with multiple keys = to each other
user> (assoc (hash-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :should-be-replaced, -1 :new-val}

;; array-map gives correct value, since it uses =, not hash
user> (assoc (array-map -1N :should-be-replaced) (biginteger -1) :new-val)
{-1N :new-val}

Patch: clj-1036-hasheq-for-biginteger-patch-v4.txt

Approach:

The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().

All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.

For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.

One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.

Activity

Hide
Andy Fingerhut added a comment -

It seems to have become in scope to fix this for Java type BigInteger with this commit: https://github.com/clojure/clojure/commit/96e72517615cd2ccdb4fdbbeb6ffba5ad99dbdac

Show
Andy Fingerhut added a comment - It seems to have become in scope to fix this for Java type BigInteger with this commit: https://github.com/clojure/clojure/commit/96e72517615cd2ccdb4fdbbeb6ffba5ad99dbdac
Hide
Rich Hickey added a comment -

It is out of scope for Clojure to fix this for Java types Float/Double/BigInteger

Show
Rich Hickey added a comment - It is out of scope for Clojure to fix this for Java types Float/Double/BigInteger
Hide
Andy Fingerhut added a comment -

Add new patch clj-1036-hasheq-for-biginteger-patch-v4.txt dated Aug 8 2013 that handles floating point hashes better. Description will be updated to match.

Show
Andy Fingerhut added a comment - Add new patch clj-1036-hasheq-for-biginteger-patch-v4.txt dated Aug 8 2013 that handles floating point hashes better. Description will be updated to match.
Hide
Andy Fingerhut added a comment -

Updated summary to mention floating point types, and completely revamped description so that it should be self-contained, without having to read any comments.

Show
Andy Fingerhut added a comment - Updated summary to mention floating point types, and completely revamped description so that it should be self-contained, without having to read any comments.
Hide
Rich Hickey added a comment -

Patches and tickets need to be better than this. Talks about BigInteger, changes hash for doubles. Lists problem but not approach, need to trawl through comments and code to see what's going on, etc.

Show
Rich Hickey added a comment - Patches and tickets need to be better than this. Talks about BigInteger, changes hash for doubles. Lists problem but not approach, need to trawl through comments and code to see what's going on, etc.
Hide
Andy Fingerhut added a comment -

Presumptuously changing status from Not Approved to Vetted, since patch clj-1036-hasheq-for-biginteger-patch-v2.txt should address the reasons that Rich marked the previous patch as Not Approved. Changing it to Vetted on the assumption that if Stuart Halloway marked the previous patch as Screened, the ticket itself is good enough to be Vetted.

Show
Andy Fingerhut added a comment - Presumptuously changing status from Not Approved to Vetted, since patch clj-1036-hasheq-for-biginteger-patch-v2.txt should address the reasons that Rich marked the previous patch as Not Approved. Changing it to Vetted on the assumption that if Stuart Halloway marked the previous patch as Screened, the ticket itself is good enough to be Vetted.
Hide
Andy Fingerhut added a comment -

I can open another ticket for the Float/Double issue if that is what people would prefer.

I think what is happening in the test case you give, Paul, is that the hash values for (Float. -1.0) and (Double. -1.0) happen to be the same in their least significant 20 bits, and PHM isn't using the upper bits where the hash values differ.

Clojure 1.5.0-beta without patch:
user=> (map #(format "%x" %) (map hash [(Float. -1.0) (Double. -1.0)]))
("bf800000" "bff00000")

There are other Float/Double values where this lucky accident doesn't happen, e.g.

Clojure 1.5.0-beta1 without patch:

user=> (= (Float. 1e9) (Double. 1e9))
true
user=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one)

{1.0E9 :one, 1.0E9 :oops!}

With 1.5.0-beta1 plus patch clj-1036-hasheq-for-biginteger-patch-v2.txt:

user=> (= (Float. 1e9) (Double. 1e9))
true
user=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1315859240)
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one)

{1.0E9 :one}
Show
Andy Fingerhut added a comment - I can open another ticket for the Float/Double issue if that is what people would prefer. I think what is happening in the test case you give, Paul, is that the hash values for (Float. -1.0) and (Double. -1.0) happen to be the same in their least significant 20 bits, and PHM isn't using the upper bits where the hash values differ. Clojure 1.5.0-beta without patch: user=> (map #(format "%x" %) (map hash [(Float. -1.0) (Double. -1.0)])) ("bf800000" "bff00000") There are other Float/Double values where this lucky accident doesn't happen, e.g. Clojure 1.5.0-beta1 without patch: user=> (= (Float. 1e9) (Double. 1e9)) true user=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one) {1.0E9 :one, 1.0E9 :oops!} With 1.5.0-beta1 plus patch clj-1036-hasheq-for-biginteger-patch-v2.txt: user=> (= (Float. 1e9) (Double. 1e9)) true user=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1315859240) user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one) {1.0E9 :one}
Hide
Paul Stadig added a comment -

Although you are correct that Double and Float are =, but have different hashes:

user=> (apply = [(Double. -1.0) (Float. -1.0)])
true
user=> (map hash [(Double. -1.0) (Float. -1.0)])
(-1074790400 -1082130432)

I could not get the same errant behavior out of PHM:

user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. -1.0) :oops! (Double. -1.0) :one)
{-1.0 :one}

I haven't taken the time to investigate exactly what is happening here, but either way I think this ticket is very specifically about BigInteger and the Float/Double issue could be explored in another ticket.

Show
Paul Stadig added a comment - Although you are correct that Double and Float are =, but have different hashes:
user=> (apply = [(Double. -1.0) (Float. -1.0)])
true
user=> (map hash [(Double. -1.0) (Float. -1.0)])
(-1074790400 -1082130432)
I could not get the same errant behavior out of PHM:
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. -1.0) :oops! (Double. -1.0) :one)
{-1.0 :one}
I haven't taken the time to investigate exactly what is happening here, but either way I think this ticket is very specifically about BigInteger and the Float/Double issue could be explored in another ticket.
Hide
Andy Fingerhut added a comment -

clj-1036-hasheq-for-biginteger-patch-v2.txt dated Nov 13 2012 is identical to clj-1036-hasheq-for-biginteger-patch-v1.txt except that it addresses Rich's comment that for BigInt's and BigInteger values that don't fit in a long, their entire value must be hashed.

The rationale for the changes to hasheq for Float and Double types is the same as the rationale for the change for BigInteger: without that change, Float and Double types that are = can have different hasheq values.

Show
Andy Fingerhut added a comment - clj-1036-hasheq-for-biginteger-patch-v2.txt dated Nov 13 2012 is identical to clj-1036-hasheq-for-biginteger-patch-v1.txt except that it addresses Rich's comment that for BigInt's and BigInteger values that don't fit in a long, their entire value must be hashed. The rationale for the changes to hasheq for Float and Double types is the same as the rationale for the change for BigInteger: without that change, Float and Double types that are = can have different hasheq values.
Hide
Rich Hickey added a comment -

You can't just consider only the lower long of bigints. Also, what's the rationale for the float stuff?

Show
Rich Hickey added a comment - You can't just consider only the lower long of bigints. Also, what's the rationale for the float stuff?
Hide
Andy Fingerhut added a comment -

clj-1036-hasheq-for-biginteger-patch-v1.txt dated Sep 26 2012 makes BigInteger's return equal hash values for values considered equal by =.

It does the same for Float and Double types, which before returned different hash values for values considered equal by =

I went ahead and changed the :added metadata on bigint and biginteger, although I can see that without my change, the person who did that may have meant for the :added to go with the behavior of the function, not with the name. Paul's suggested change that I have in the patch is for the :added metadata to go with the name, not the function behavior. It is easy to remove that part of the patch if that change is not desired.

Show
Andy Fingerhut added a comment - clj-1036-hasheq-for-biginteger-patch-v1.txt dated Sep 26 2012 makes BigInteger's return equal hash values for values considered equal by =. It does the same for Float and Double types, which before returned different hash values for values considered equal by = I went ahead and changed the :added metadata on bigint and biginteger, although I can see that without my change, the person who did that may have meant for the :added to go with the behavior of the function, not with the name. Paul's suggested change that I have in the patch is for the :added metadata to go with the name, not the function behavior. It is easy to remove that part of the patch if that change is not desired.
Hide
Paul Stadig added a comment -

Also, the biginteger function has metadata saying that it has been added since 1.0, but it was actually added in 1.3. The bigint function has metadata saying that it has been added since 1.3, but it has been added since 1.0.

I think during the work to implement BigInt someone renamed the existing bigint function (which used to return a BigInteger) to biginteger, and the metadata got carried with it, then a new bigint function was added with :since 1.3 metadata even though that function name has existed since 1.0.

Show
Paul Stadig added a comment - Also, the biginteger function has metadata saying that it has been added since 1.0, but it was actually added in 1.3. The bigint function has metadata saying that it has been added since 1.3, but it has been added since 1.0. I think during the work to implement BigInt someone renamed the existing bigint function (which used to return a BigInteger) to biginteger, and the metadata got carried with it, then a new bigint function was added with :since 1.3 metadata even though that function name has existed since 1.0.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: