[CLJ-1036] Util/hasheq should be hashing a BigInteger to the same values as Long, and BigInt Created: 02/Aug/12 Updated: 12/Apr/13 |
|
| Status: | Open |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | Release 1.4 |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Major |
| Reporter: | Paul Stadig | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Patch: | Code and Test |
| Approval: | Not Approved |
| Description |
|
The doc string for hash states that it defines a hash code function that is consistent with =, but for java.math.BigInteger hash is not consistent with =. user=> (apply = [(Long. -1) -1N (biginteger -1)]) true user=> (map hash [(Long. -1) -1N (biginteger -1)]) (0 0 -1) user=> It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different: user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1N :one, -1 :oops!}
The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys: user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger -1) :oops! -1N :one)
{-1 :one}
This same misbehavior also occurs for Doubles and Floats: thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)]) true thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) That leads to the same difference in array-map and hash-map behavior as above for BigInteger and BigInt. |
| Comments |
| Comment by Paul Stadig [ 02/Aug/12 9:55 AM ] |
|
Also, the biginteger function has metadata saying that it has been added since 1.0, but it was actually added in 1.3. The bigint function has metadata saying that it has been added since 1.3, but it has been added since 1.0. I think during the work to implement BigInt someone renamed the existing bigint function (which used to return a BigInteger) to biginteger, and the metadata got carried with it, then a new bigint function was added with :since 1.3 metadata even though that function name has existed since 1.0. |
| Comment by Andy Fingerhut [ 26/Sep/12 11:59 AM ] |
|
clj-1036-hasheq-for-biginteger-patch-v1.txt dated Sep 26 2012 makes BigInteger's return equal hash values for values considered equal by =. It does the same for Float and Double types, which before returned different hash values for values considered equal by = I went ahead and changed the :added metadata on bigint and biginteger, although I can see that without my change, the person who did that may have meant for the :added to go with the behavior of the function, not with the name. Paul's suggested change that I have in the patch is for the :added metadata to go with the name, not the function behavior. It is easy to remove that part of the patch if that change is not desired. |
| Comment by Rich Hickey [ 13/Nov/12 3:29 PM ] |
|
You can't just consider only the lower long of bigints. Also, what's the rationale for the float stuff? |
| Comment by Andy Fingerhut [ 13/Nov/12 9:44 PM ] |
|
clj-1036-hasheq-for-biginteger-patch-v2.txt dated Nov 13 2012 is identical to clj-1036-hasheq-for-biginteger-patch-v1.txt except that it addresses Rich's comment that for BigInt's and BigInteger values that don't fit in a long, their entire value must be hashed. The rationale for the changes to hasheq for Float and Double types is the same as the rationale for the change for BigInteger: without that change, Float and Double types that are = can have different hasheq values. |
| Comment by Paul Stadig [ 14/Nov/12 5:18 AM ] |
|
Although you are correct that Double and Float are =, but have different hashes: user=> (apply = [(Double. -1.0) (Float. -1.0)]) true user=> (map hash [(Double. -1.0) (Float. -1.0)]) (-1074790400 -1082130432) I could not get the same errant behavior out of PHM: user=> (assoc clojure.lang.PersistentHashMap/EMPTY (Float. -1.0) :oops! (Double. -1.0) :one)
{-1.0 :one}
I haven't taken the time to investigate exactly what is happening here, but either way I think this ticket is very specifically about BigInteger and the Float/Double issue could be explored in another ticket. |
| Comment by Andy Fingerhut [ 14/Nov/12 10:08 AM ] |
|
I can open another ticket for the Float/Double issue if that is what people would prefer. I think what is happening in the test case you give, Paul, is that the hash values for (Float. -1.0) and (Double. -1.0) happen to be the same in their least significant 20 bits, and PHM isn't using the upper bits where the hash values differ. Clojure 1.5.0-beta without patch: There are other Float/Double values where this lucky accident doesn't happen, e.g. Clojure 1.5.0-beta1 without patch: user=> (= (Float. 1e9) (Double. 1e9)) With 1.5.0-beta1 plus patch clj-1036-hasheq-for-biginteger-patch-v2.txt: user=> (= (Float. 1e9) (Double. 1e9)) |
| Comment by Andy Fingerhut [ 01/Jan/13 11:30 AM ] |
|
Presumptuously changing status from Not Approved to Vetted, since patch clj-1036-hasheq-for-biginteger-patch-v2.txt should address the reasons that Rich marked the previous patch as Not Approved. Changing it to Vetted on the assumption that if Stuart Halloway marked the previous patch as Screened, the ticket itself is good enough to be Vetted. |
| Comment by Rich Hickey [ 12/Apr/13 8:48 AM ] |
|
Patches and tickets need to be better than this. Talks about BigInteger, changes hash for doubles. Lists problem but not approach, need to trawl through comments and code to see what's going on, etc. |