Clojure

Util/hasheq should be hashing a BigInteger to the same values as Long, and BigInt

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.4
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Not Approved

Description

The doc string for hash states that it defines a hash code function that is consistent with =, but for java.math.BigInteger hash is not consistent with =.

user=> (apply = [(Long. -1) -1N (biginteger -1)])
true
user=> (map hash [(Long. -1) -1N (biginteger -1)])
(0 0 -1)
user=>

It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different:

user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1N :one, -1 :oops!}

The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys:

user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1 :one}

This same misbehavior also occurs for Doubles and Floats:

thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)])
true
thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)

That leads to the same difference in array-map and hash-map behavior as above for BigInteger and BigInt.

Activity

Hide
Paul Stadig added a comment -

Also, the biginteger function has metadata saying that it has been added since 1.0, but it was actually added in 1.3. The bigint function has metadata saying that it has been added since 1.3, but it has been added since 1.0.

I think during the work to implement BigInt someone renamed the existing bigint function (which used to return a BigInteger) to biginteger, and the metadata got carried with it, then a new bigint function was added with :since 1.3 metadata even though that function name has existed since 1.0.

Show
Paul Stadig added a comment - Also, the biginteger function has metadata saying that it has been added since 1.0, but it was actually added in 1.3. The bigint function has metadata saying that it has been added since 1.3, but it has been added since 1.0. I think during the work to implement BigInt someone renamed the existing bigint function (which used to return a BigInteger) to biginteger, and the metadata got carried with it, then a new bigint function was added with :since 1.3 metadata even though that function name has existed since 1.0.
Hide
Andy Fingerhut added a comment -

clj-1036-hasheq-for-biginteger-patch-v1.txt dated Sep 26 2012 makes BigInteger's return equal hash values for values considered equal by =.

It does the same for Float and Double types, which before returned different hash values for values considered equal by =

I went ahead and changed the :added metadata on bigint and biginteger, although I can see that without my change, the person who did that may have meant for the :added to go with the behavior of the function, not with the name. Paul's suggested change that I have in the patch is for the :added metadata to go with the name, not the function behavior. It is easy to remove that part of the patch if that change is not desired.

Show
Andy Fingerhut added a comment - clj-1036-hasheq-for-biginteger-patch-v1.txt dated Sep 26 2012 makes BigInteger's return equal hash values for values considered equal by =. It does the same for Float and Double types, which before returned different hash values for values considered equal by = I went ahead and changed the :added metadata on bigint and biginteger, although I can see that without my change, the person who did that may have meant for the :added to go with the behavior of the function, not with the name. Paul's suggested change that I have in the patch is for the :added metadata to go with the name, not the function behavior. It is easy to remove that part of the patch if that change is not desired.
Hide
Rich Hickey added a comment -

You can't just consider only the lower long of bigints. Also, what's the rationale for the float stuff?

Show
Rich Hickey added a comment - You can't just consider only the lower long of bigints. Also, what's the rationale for the float stuff?
Hide
Andy Fingerhut added a comment -

clj-1036-hasheq-for-biginteger-patch-v2.txt dated Nov 13 2012 is identical to clj-1036-hasheq-for-biginteger-patch-v1.txt except that it addresses Rich's comment that for BigInt's and BigInteger values that don't fit in a long, their entire value must be hashed.

The rationale for the changes to hasheq for Float and Double types is the same as the rationale for the change for BigInteger: without that change, Float and Double types that are = can have different hasheq values.

Show
Andy Fingerhut added a comment - clj-1036-hasheq-for-biginteger-patch-v2.txt dated Nov 13 2012 is identical to clj-1036-hasheq-for-biginteger-patch-v1.txt except that it addresses Rich's comment that for BigInt's and BigInteger values that don't fit in a long, their entire value must be hashed. The rationale for the changes to hasheq for Float and Double types is the same as the rationale for the change for BigInteger: without that change, Float and Double types that are = can have different hasheq values.
Hide
Paul Stadig added a comment -

Although you are correct that Double and Float are =, but have different hashes:

user=> (apply = [(Double. -1.0) (Float. -1.0)])
true
user=> (map hash [(Double. -1.0) (Float. -1.0)])
(-1074790400 -1082130432)

I could not get the same errant behavior out of PHM:

user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. -1.0) :oops! (Double. -1.0) :one)
{-1.0 :one}

I haven't taken the time to investigate exactly what is happening here, but either way I think this ticket is very specifically about BigInteger and the Float/Double issue could be explored in another ticket.

Show
Paul Stadig added a comment - Although you are correct that Double and Float are =, but have different hashes:
user=> (apply = [(Double. -1.0) (Float. -1.0)])
true
user=> (map hash [(Double. -1.0) (Float. -1.0)])
(-1074790400 -1082130432)
I could not get the same errant behavior out of PHM:
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. -1.0) :oops! (Double. -1.0) :one)
{-1.0 :one}
I haven't taken the time to investigate exactly what is happening here, but either way I think this ticket is very specifically about BigInteger and the Float/Double issue could be explored in another ticket.
Hide
Andy Fingerhut added a comment -

I can open another ticket for the Float/Double issue if that is what people would prefer.

I think what is happening in the test case you give, Paul, is that the hash values for (Float. -1.0) and (Double. -1.0) happen to be the same in their least significant 20 bits, and PHM isn't using the upper bits where the hash values differ.

Clojure 1.5.0-beta without patch:
user=> (map #(format "%x" %) (map hash [(Float. -1.0) (Double. -1.0)]))
("bf800000" "bff00000")

There are other Float/Double values where this lucky accident doesn't happen, e.g.

Clojure 1.5.0-beta1 without patch:

user=> (= (Float. 1e9) (Double. 1e9))
true
user=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1104006501)
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one)

{1.0E9 :one, 1.0E9 :oops!}

With 1.5.0-beta1 plus patch clj-1036-hasheq-for-biginteger-patch-v2.txt:

user=> (= (Float. 1e9) (Double. 1e9))
true
user=> (map hash [(Float. 1e9) (Double. 1e9)])
(1315859240 1315859240)
user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one)

{1.0E9 :one}
Show
Andy Fingerhut added a comment - I can open another ticket for the Float/Double issue if that is what people would prefer. I think what is happening in the test case you give, Paul, is that the hash values for (Float. -1.0) and (Double. -1.0) happen to be the same in their least significant 20 bits, and PHM isn't using the upper bits where the hash values differ. Clojure 1.5.0-beta without patch: user=> (map #(format "%x" %) (map hash [(Float. -1.0) (Double. -1.0)])) ("bf800000" "bff00000") There are other Float/Double values where this lucky accident doesn't happen, e.g. Clojure 1.5.0-beta1 without patch: user=> (= (Float. 1e9) (Double. 1e9)) true user=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one) {1.0E9 :one, 1.0E9 :oops!} With 1.5.0-beta1 plus patch clj-1036-hasheq-for-biginteger-patch-v2.txt: user=> (= (Float. 1e9) (Double. 1e9)) true user=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1315859240) user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. 1e9) :oops! (Double. 1e9) :one) {1.0E9 :one}
Hide
Andy Fingerhut added a comment -

Presumptuously changing status from Not Approved to Vetted, since patch clj-1036-hasheq-for-biginteger-patch-v2.txt should address the reasons that Rich marked the previous patch as Not Approved. Changing it to Vetted on the assumption that if Stuart Halloway marked the previous patch as Screened, the ticket itself is good enough to be Vetted.

Show
Andy Fingerhut added a comment - Presumptuously changing status from Not Approved to Vetted, since patch clj-1036-hasheq-for-biginteger-patch-v2.txt should address the reasons that Rich marked the previous patch as Not Approved. Changing it to Vetted on the assumption that if Stuart Halloway marked the previous patch as Screened, the ticket itself is good enough to be Vetted.
Hide
Rich Hickey added a comment -

Patches and tickets need to be better than this. Talks about BigInteger, changes hash for doubles. Lists problem but not approach, need to trawl through comments and code to see what's going on, etc.

Show
Rich Hickey added a comment - Patches and tickets need to be better than this. Talks about BigInteger, changes hash for doubles. Lists problem but not approach, need to trawl through comments and code to see what's going on, etc.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated: