Clojure’s hashing strategy for numbers, sequences/vectors, sets, and maps mimics Java’s. In Clojure, however, it is far more common than in Java to use longs, vectors, sets, maps and compound objects comprised of those components (e.g., a map from vectors of longs to sets) as keys in other hash maps. It appears that Java’s hash strategy is not well-tuned for this kind of usage. Clojure’s hashing for longs, vectors, sets, and maps each suffer from some weaknesses that can multiply together to create a crippling number of collisions.
For example, Paul Butcher wrote a simple Clojure program that produces a set of solutions to a chess problem. Each solution in the set was itself a set of vectors of the form [piece-keyword [row-int col-int]]. Clojure 1.5.1's current hash function hashed about 20 million different solutions to about 20 thousand different hash values, for an average of about 1000 solutions per unique hash value. This causes PersistentHashSet and PersistentHashMap to use long linear searches for testing set/map membership or adding new elements/keys.
Mark Engelberg document on the subject of the hash function in Clojure, its behavior, and potential improvements, is here:
Andy Fingerhut's modified version of Paul Butcher's N-queens solver, with extra code for printing stats with several different hash functions. The README has instructions for retrieving and installing locally a version of Clojure modified with one of Mark's proposed alternate hash functions: