Details

Type: Defect

Status: Closed

Priority: Major

Resolution: Declined

Affects Version/s: Release 1.4, Release 1.5

Fix Version/s: None

Component/s: None

Labels:None

Patch:Code and Test

Approval:Triaged
Description
hash is documented to be consistent with = but Util/hasheq returns different hash values for some pairs of numbers that are =
user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501)
Consequences include incorrect behavior for hashmaps containing keys that are =, but have different hash values:
;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval}
Patch: clj1036hasheqforbigintegerpatchv4.txt
Approach:
The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode().
All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does.
For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing.
One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values.
Attachments
Activity
Field  Original Value  New Value 

Attachment  clj1036hasheqforbigintegerpatchv1.txt [ 11524 ] 
Patch  Code and Test [ 10002 ] 
Approval  Screened [ 10004 ] 
Approval  Screened [ 10004 ]  Not Approved [ 10008 ] 
Attachment  clj1036hasheqforbigintegerpatchv2.txt [ 11674 ] 
Attachment  clj1036hasheqforbigintegerpatchv1.txt [ 11524 ] 
Approval  Not Approved [ 10008 ]  Vetted [ 10003 ] 
Description 
The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.
{noformat} user=> (apply = [(Long. 1) 1N (biginteger 1)]) true user=> (map hash [(Long. 1) 1N (biginteger 1)]) (0 0 1) user=> {noformat} It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different: {noformat} user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger 1) :oops! 1N :one) {1N :one, 1 :oops!} {noformat} The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys: {noformat} user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger 1) :oops! 1N :one) {1 :one} {noformat} 
The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.
{noformat} user=> (apply = [(Long. 1) 1N (biginteger 1)]) true user=> (map hash [(Long. 1) 1N (biginteger 1)]) (0 0 1) user=> {noformat} It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different: {noformat} user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger 1) :oops! 1N :one) {1N :one, 1 :oops!} {noformat} The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys: {noformat} user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger 1) :oops! 1N :one) {1 :one} {noformat} This same misbehavior also occurs for Doubles and Floats: {noformat} thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)]) true thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {noformat} That leads to the same difference in arraymap and hashmap behavior as above for BigInteger and BigInt. 
Approval  Vetted [ 10003 ]  Not Approved [ 10008 ] 
Approval  Not Approved [ 10008 ] 
Description 
The doc string for {{hash}} states that it defines a hash code function that is consistent with {{=}}, but for java.math.BigInteger {{hash}} is not consistent with {{=}}.
{noformat} user=> (apply = [(Long. 1) 1N (biginteger 1)]) true user=> (map hash [(Long. 1) 1N (biginteger 1)]) (0 0 1) user=> {noformat} It is possible to have a PHM with two key/value pairs where the keys are equal, and the hash codes are different: {noformat} user=> (assoc clojure.lang.PersistentHashMap/EMPTY (biginteger 1) :oops! 1N :one) {1N :one, 1 :oops!} {noformat} The expected behavior is the same as PersistentArrayMap, which does not have this issue, because it does not hash its keys: {noformat} user=> (assoc clojure.lang.PersistentArrayMap/EMPTY (biginteger 1) :oops! 1N :one) {1 :one} {noformat} This same misbehavior also occurs for Doubles and Floats: {noformat} thalia.core=> (apply = [(Float. 1e9) (Double. 1e9)]) true thalia.core=> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {noformat} That leads to the same difference in arraymap and hashmap behavior as above for BigInteger and BigInt. 
Problem: {{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}
Demonstration: {noformat} user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {noformat} Consequences include incorrect behavior for hashmaps containing keys that are {{=}}, but have different hash values: {noformat} ;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval} {noformat} Current patch: clj1036hasheqforbigintegerpatchv3.txt Approach of current patch: The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode(). All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does. The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x. One alternate approach would be to convert all float values to doubles. 
Affects Version/s  Release 1.5 [ 10150 ]  
Attachment  clj1036hasheqforbigintegerpatchv3.txt [ 12094 ]  
Summary  Util/hasheq should be hashing a BigInteger to the same values as Long, and BigInt  hash is inconsistent with = for some BigInteger and floating point values 
Attachment  clj1036hasheqforbigintegerpatchv2.txt [ 11674 ] 
Approval  Triaged [ 10120 ]  
Description 
Problem: {{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}
Demonstration: {noformat} user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {noformat} Consequences include incorrect behavior for hashmaps containing keys that are {{=}}, but have different hash values: {noformat} ;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval} {noformat} Current patch: clj1036hasheqforbigintegerpatchv3.txt Approach of current patch: The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode(). All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does. The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x. One alternate approach would be to convert all float values to doubles. 
{{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}
{code} user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {code} Consequences include incorrect behavior for hashmaps containing keys that are {{=}}, but have different hash values: {code} ;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval} {code} *Patch:* clj1036hasheqforbigintegerpatchv3.txt *Approach:* The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode(). All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does. The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x. One alternate approach would be to convert all float values to doubles. 
Attachment  clj1036hasheqforbigintegerpatchv4.txt [ 12095 ] 
Attachment  clj1036hasheqforbigintegerpatchv3.txt [ 12094 ] 
Description 
{{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}
{code} user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {code} Consequences include incorrect behavior for hashmaps containing keys that are {{=}}, but have different hash values: {code} ;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval} {code} *Patch:* clj1036hasheqforbigintegerpatchv3.txt *Approach:* The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode(). All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does. The approach taken to make hash consistent for numerically equal float and double values is to convert double values to the nearest float value x using Double.floatValue()), and then return the hashCode() for x. One alternate approach would be to convert all float values to doubles. 
{{hash}} is documented to be consistent with {{=}} but {{Util/hasheq}} returns different hash values for some pairs of numbers that are {{=}}
{code} user> (apply = [1 1N (biginteger 1)]) true user> (map hash [1 1N (biginteger 1)]) (0 0 1) user> (apply = [(Float. 1e9) (Double. 1e9)]) true user> (map hash [(Float. 1e9) (Double. 1e9)]) (1315859240 1104006501) {code} Consequences include incorrect behavior for hashmaps containing keys that are {{=}}, but have different hash values: {code} ;; Incorrect return value with multiple keys = to each other user> (assoc (hashmap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :shouldbereplaced, 1 :newval} ;; arraymap gives correct value, since it uses =, not hash user> (assoc (arraymap 1N :shouldbereplaced) (biginteger 1) :newval) {1N :newval} {code} *Patch:* clj1036hasheqforbigintegerpatchv4.txt *Approach:* The only BigInteger values that have inconsistent hash values should be those in the range of a long. BigInteger and BigInt values outside the range of a long already both return BigInteger.hashCode(). All integer values will return consistent hash codes if we add a new case to Numbers.hasheq(Number) for BigIntegers that lie in the range of a long, returning the same hash that such a long value does. For floating point values, the patch makes their hashes consistent by converting floats to doubles and then hashing. One alternate approach would be to convert all double values to floats and hash float values only. However, this throws away half of the bits of the double value before hashing, leading to many undesirable hash collisions between different double values. 
Resolution  Declined [ 2 ]  
Status  Open [ 1 ]  Closed [ 6 ] 