GC Issue 37: .hashCode and .equals for Clojure vectors/seqs break Java contract.


  • Type: Defect Defect
  • Status: Closed Closed
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:


Reported by wolfe.a.jason, Jan 07, 2009
What (small set of) steps will reproduce the problem?

(list (.equals [1 2] (seq [1 2])) (.hashCode [1 2]) (.hashCode '(1 2)))

What is the expected output? What do you see instead?

I see (true 994 -1919631597)
I think I should see (true i i), or perhaps (false i j) for some ints i, j

What version are you using?

svn 1162

Was this discussed on the group? If so, please provide a link to the

Nobody asked me to add this here so I hope this isn't out of line ... 

Please provide any additional information below.

The crux is, the Java API says:

"If two objects are equal according to the equals(Object)  method, then
calling the hashCode method on each of the two objects must produce the
same integer result. "

As you can see here:

user> (doseq [s ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
  (print "\n" (.hashCode s))
  (doseq [t ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
    (print "\t" (.equals s t))))

 -1919631597     true    true    true    true    false
 -1919631597     true    true    true    true    false
 994             true    true    true    true    true
 -1919631597     true    true    true    true    false
 994             false   false   true    false   true 

this contract is violated by Clojure collections.  In particular, 
(.equals [1 2] (seq [1 2]) but they have different .hashCode()s.  

Some issues this can cause
  - Clojure hash-map and array-map have different behaviors
  - (get (hash-map {(vector &args) true} (list &args)) will return true
    iff there is a hash collision between the vector and list (?) 
  - Unexpected pain in unknown places when using Clojure objects in Java
code that expects the contract to hold


Comment 1 by wolfe.a.jason, Jan 07, 2009
P.S. I realized that the reason for the discrepancy might be that java.util.list
hashCodes (used by vectors) are defined in a front-to-back manner.  This makes it
perhaps non-obvious how to efficiently compute them recursively back-to-front, as one
might like to to for seqs (so that each rest-seq can cache its hash value).  This
code shows that this can in fact be done efficiently.

(defn forward-hash "How hashCode() is defined in java.util.List"  [s]
  (loop [s (seq s) h (int 1)]
    (if s 
        (recur (rest s) (int (unchecked-add (unchecked-multiply h 31) (hash (first s)))))

(def int-pow 
      (fn [x n]
 (let [x (int x) n (int n)]
   (cond (zero? n) 1
  (even? n) (let [h (int-pow x (/ n 2))] (unchecked-multiply h h))
  :else     (unchecked-multiply x (int-pow x (dec n))))))))

(defn backward-hash "Compute the same value as forward-hash, but back-to-front."  [s]
  (let [s (seq s) p (int-pow 31 (count (rest s)))]
    (if s 
        (unchecked-add (unchecked-multiply 30 p)
         (unchecked-add (int (backward-hash (rest s)))
          (unchecked-multiply (int (hash (first s))) p)))

user> (map #(% (vec (range 1000))) [forward-hash backward-hash #(.hashCode %)])
(133786869 133786869 133786869)

HTH, Jason

P.P.S. I'll put my Contributor Agreement in the mail today.   
Comment 2 by wolfe.a.jason, Jan 07, 2009
Note on a small special case: Java null effectively has hash code 0 (I think) whereas
java.util.List hashCode() must be 1 for empty Lists.  Fortunately, the REPL says:

user> (.equals [1] (seq [1]))
user> (.equals [] (seq []))

so I guess the empty vector is not .equal to the empty seq anyway (as mandated by the
Java API).  So, giving non-empty seqs java.util.List hashCodes would in fact be the
right thing to do, it seems to me.  

Comment 3 by richhickey, Jan 15, 2009
Fixed - svn 1215 - thanks for the report

Status: Fixed


Assembla Importer added a comment -
Assembla Importer added a comment - Converted from


  • Assignee:
Vote (0)
Watch (0)


  • Created: