Reported by wolfe.a.jason, Jan 07, 2009
What (small set of) steps will reproduce the problem?
(list (.equals [1 2] (seq [1 2])) (.hashCode [1 2]) (.hashCode '(1 2)))
What is the expected output? What do you see instead?
I see (true 994 -1919631597)
I think I should see (true i i), or perhaps (false i j) for some ints i, j
What version are you using?
svn 1162
Was this discussed on the group? If so, please provide a link to the
discussion:
http:
Nobody asked me to add this here so I hope this isn't out of line ...
Please provide any additional information below.
The crux is, the Java API says:
"If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the
same integer result. "
As you can see here:
user> (doseq [s ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\n" (.hashCode s))
(doseq [t ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\t" (.equals s t))))
-1919631597 true true true true false
-1919631597 true true true true false
994 true true true true true
-1919631597 true true true true false
994 false false true false true
this contract is violated by Clojure collections. In particular,
(.equals [1 2] (seq [1 2]) but they have different .hashCode()s.
Some issues this can cause
- Clojure hash-map and array-map have different behaviors
- (get (hash-map {(vector &args) true} (list &args)) will return true
iff there is a hash collision between the vector and list (?)
- Unexpected pain in unknown places when using Clojure objects in Java
code that expects the contract to hold
-Jason
Comment 1 by wolfe.a.jason, Jan 07, 2009
P.S. I realized that the reason for the discrepancy might be that java.util.list
hashCodes (used by vectors) are defined in a front-to-back manner. This makes it
perhaps non-obvious how to efficiently compute them recursively back-to-front, as one
might like to to for seqs (so that each rest-seq can cache its hash value). This
code shows that this can in fact be done efficiently.
(defn forward-hash "How hashCode() is defined in java.util.List" [s]
(loop [s (seq s) h (int 1)]
(if s
(recur (rest s) (int (unchecked-add (unchecked-multiply h 31) (hash (first s)))))
h)))
(def int-pow
(memoize
(fn [x n]
(let [x (int x) n (int n)]
(cond (zero? n) 1
(even? n) (let [h (int-pow x (/ n 2))] (unchecked-multiply h h))
:else (unchecked-multiply x (int-pow x (dec n))))))))
(defn backward-hash "Compute the same value as forward-hash, but back-to-front." [s]
(let [s (seq s) p (int-pow 31 (count (rest s)))]
(if s
(unchecked-add (unchecked-multiply 30 p)
(unchecked-add (int (backward-hash (rest s)))
(unchecked-multiply (int (hash (first s))) p)))
1)))
user> (map #(% (vec (range 1000))) [forward-hash backward-hash #(.hashCode %)])
(133786869 133786869 133786869)
HTH, Jason
P.P.S. I'll put my Contributor Agreement in the mail today.
Comment 2 by wolfe.a.jason, Jan 07, 2009
Note on a small special case: Java null effectively has hash code 0 (I think) whereas
java.util.List hashCode() must be 1 for empty Lists. Fortunately, the REPL says:
user> (.equals [1] (seq [1]))
true
user> (.equals [] (seq []))
false
so I guess the empty vector is not .equal to the empty seq anyway (as mandated by the
Java API). So, giving non-empty seqs java.util.List hashCodes would in fact be the
right thing to do, it seems to me.
Comment 3 by richhickey, Jan 15, 2009
Fixed - svn 1215 - thanks for the report
Status: Fixed
Converted from http://www.assembla.com/spaces/clojure/tickets/41