Clojure

Inconsistent hash with java collections

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.6
  • Fix Version/s: None
  • Component/s: None
  • Environment:
    1.6.0 master
  • Patch:
    Code and Test
  • Approval:
    Triaged

Description

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3]))
false
user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3])
true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.

Link to a discussion of this topic in the Clojure group: https://groups.google.com/forum/#!topic/clojure/dQhdwZsyIEw

Activity

Hide
Jozef Wagner added a comment -

Same problem for maps, so hasheq should have a special case for java.util.Map too.

Show
Jozef Wagner added a comment - Same problem for maps, so hasheq should have a special case for java.util.Map too.
Hide
Jozef Wagner added a comment -

Added patch with fix for j.u. Map, Set and List.

Show
Jozef Wagner added a comment - Added patch with fix for j.u. Map, Set and List.
Hide
Andy Fingerhut added a comment -

Add patch clj-1372-2.diff that is identical to Jozef Wagner's clj-1372.diff, except it also adds some new tests that fail without his changes, and pass with them.

Show
Andy Fingerhut added a comment - Add patch clj-1372-2.diff that is identical to Jozef Wagner's clj-1372.diff, except it also adds some new tests that fail without his changes, and pass with them.
Hide
Alex Miller added a comment -

I think the contract on equiv/hasheq is more narrowly scoped than this and only applies if both collections are IPersistentCollection. In other words, I don't think this is wanted or required.

Note that the Java .equals/.hashCode contract is maintained here - these collections will compare as .equals() and do have the same .hashCode().

Show
Alex Miller added a comment - I think the contract on equiv/hasheq is more narrowly scoped than this and only applies if both collections are IPersistentCollection. In other words, I don't think this is wanted or required. Note that the Java .equals/.hashCode contract is maintained here - these collections will compare as .equals() and do have the same .hashCode().
Hide
Jozef Wagner added a comment -

Without the patch the following statement is not valid: "If two objects are equal with c.c/=, than their hash returned by c.c/hash is the same number". We can say that this is valid only iff both objects are 'clojure' objects, but this goes against clojures interop principles (interop is easy, fast, no surprises).

Show
Jozef Wagner added a comment - Without the patch the following statement is not valid: "If two objects are equal with c.c/=, than their hash returned by c.c/hash is the same number". We can say that this is valid only iff both objects are 'clojure' objects, but this goes against clojures interop principles (interop is easy, fast, no surprises).
Hide
Jozef Wagner added a comment -

Manifestation of this bug

user=> (assoc (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]) :bar)
{[1 2 3] :bar, [1 2 3] :foo}
user=> (get (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]))
nil
Show
Jozef Wagner added a comment - Manifestation of this bug
user=> (assoc (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]) :bar)
{[1 2 3] :bar, [1 2 3] :foo}
user=> (get (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]))
nil
Hide
Alex Miller added a comment - - edited

I agree that would be a nice thing to say without qualification.

There is a real cost to adding more branches in hasheq - adding those collection checks affects every hasheq. Running a full Clojure build, I see the following set of classes with >100 occurences where this happens (note that exactly 0 of these are the Java collections - this case doesn't exist in the Clojure build itself):

clojure.lang.Var 107001502
java.lang.Class 2651389
java.lang.Character 2076322 
java.util.UUID 435235 
java.util.Date 430956
clojure.lang.Compiler$LocalBinding 116830
java.lang.Boolean 112361
java.util.regex.Pattern 325

We'd be adding 4 more instanceof checks in the path of every one of those hasheqs. This would also likely blow any JVM inlining.

Rich says "all bets should be off for hasheq/equiv of non-values" where Java collections obviously fall into the class of "non-values".

Show
Alex Miller added a comment - - edited I agree that would be a nice thing to say without qualification. There is a real cost to adding more branches in hasheq - adding those collection checks affects every hasheq. Running a full Clojure build, I see the following set of classes with >100 occurences where this happens (note that exactly 0 of these are the Java collections - this case doesn't exist in the Clojure build itself):
clojure.lang.Var 107001502
java.lang.Class 2651389
java.lang.Character 2076322 
java.util.UUID 435235 
java.util.Date 430956
clojure.lang.Compiler$LocalBinding 116830
java.lang.Boolean 112361
java.util.regex.Pattern 325
We'd be adding 4 more instanceof checks in the path of every one of those hasheqs. This would also likely blow any JVM inlining. Rich says "all bets should be off for hasheq/equiv of non-values" where Java collections obviously fall into the class of "non-values".
Hide
Andy Fingerhut added a comment - - edited

Would a doc patch be considered? Say one that modified the doc of clojure.core/hash to include a phrase indicating that it is only promised to be consistent with clojure.core/= for immutable values? It could even perhaps mention that Floats are out, too: see CLJ-1036

Show
Andy Fingerhut added a comment - - edited Would a doc patch be considered? Say one that modified the doc of clojure.core/hash to include a phrase indicating that it is only promised to be consistent with clojure.core/= for immutable values? It could even perhaps mention that Floats are out, too: see CLJ-1036
Hide
Alex Miller added a comment -

I think it would be preferred to do any detailed docs about hash at http://clojure.org/data_structures rather than in the docstring. Although the docstring on hash probably could use an update and a pointer to the web site after the latest changes.

Show
Alex Miller added a comment - I think it would be preferred to do any detailed docs about hash at http://clojure.org/data_structures rather than in the docstring. Although the docstring on hash probably could use an update and a pointer to the web site after the latest changes.
Hide
Jozef Wagner added a comment -

Neverthless it is a breaking change from 1.5, and it should be mentioned in changelog. What still bugs me is that c.c/= is supported in such cases but the c.c/hash is not. If supporting c.c/hash is expensive, isn't it better to drop support for c.c/= in such cases? It will eliminate surprises such as:

user=> (apply distinct? (hash-set [1 2 3] (java.util.Collections/unmodifiableList [1 2 3])))
false
Show
Jozef Wagner added a comment - Neverthless it is a breaking change from 1.5, and it should be mentioned in changelog. What still bugs me is that c.c/= is supported in such cases but the c.c/hash is not. If supporting c.c/hash is expensive, isn't it better to drop support for c.c/= in such cases? It will eliminate surprises such as:
user=> (apply distinct? (hash-set [1 2 3] (java.util.Collections/unmodifiableList [1 2 3])))
false
Hide
Alex Miller added a comment -

I'm not sure it's a "breaking" change if something not considered to be guaranteed changes. But I take your point.

I don't think it's feasible to drop = support for Clojure and Java collections - that seems important and useful. And if it were free to do so, I would like to be able to say without qualification that if equiv=true, then hasheq is the same.

It's unclear to me that the examples listed on this ticket are actually real problems people are likely to encounter. The main users of hasheq are hash map and hash set. So to manifest, you would need to be putting a mixture of Clojure and Java collections into one of those, in particular a mixture of collections that compare as equal.

Still thinking about it.

Show
Alex Miller added a comment - I'm not sure it's a "breaking" change if something not considered to be guaranteed changes. But I take your point. I don't think it's feasible to drop = support for Clojure and Java collections - that seems important and useful. And if it were free to do so, I would like to be able to say without qualification that if equiv=true, then hasheq is the same. It's unclear to me that the examples listed on this ticket are actually real problems people are likely to encounter. The main users of hasheq are hash map and hash set. So to manifest, you would need to be putting a mixture of Clojure and Java collections into one of those, in particular a mixture of collections that compare as equal. Still thinking about it.
Hide
Jozef Wagner added a comment -

Sorry for spamming but there may be another option, to not fallback into hashCode in hasheq, but to instead throw in cases where hasheq is requested for non-values. This will lead to a cleaner separation of hash types. Of course it will prevent putting non-values into hash-set.

Show
Jozef Wagner added a comment - Sorry for spamming but there may be another option, to not fallback into hashCode in hasheq, but to instead throw in cases where hasheq is requested for non-values. This will lead to a cleaner separation of hash types. Of course it will prevent putting non-values into hash-set.
Hide
Alex Miller added a comment -

There is no simple check for "valueness" though?

Show
Alex Miller added a comment - There is no simple check for "valueness" though?
Hide
Andy Fingerhut added a comment -

An idea, for what it might be worth: Add one test for instance of java.util.Collection in Util.hasheq method instead of 3 separate tests for Set, List, and Map. It doesn't cover Map.Entry.

Show
Andy Fingerhut added a comment - An idea, for what it might be worth: Add one test for instance of java.util.Collection in Util.hasheq method instead of 3 separate tests for Set, List, and Map. It doesn't cover Map.Entry.
Hide
Alex Miller added a comment -

Map doesn't extend Collection either.

Show
Alex Miller added a comment - Map doesn't extend Collection either.
Hide
Stuart Halloway added a comment -

I think this needs more consideration and should not hold up 1.6.

Show
Stuart Halloway added a comment - I think this needs more consideration and should not hold up 1.6.
Hide
Andy Fingerhut added a comment -

Both patches clj-1372.diff and clj-1372-2.diff fail to apply cleanly as of latest Clojure master on Mar 20 2014. They did apply cleanly before the Mar 19 2014 commit, I believe, and the only issue appears to be a changed line of diff context. Given the discussion about whether such a change is desired, it sounds like more thought is needed before deciding what change should be made, if any.

Show
Andy Fingerhut added a comment - Both patches clj-1372.diff and clj-1372-2.diff fail to apply cleanly as of latest Clojure master on Mar 20 2014. They did apply cleanly before the Mar 19 2014 commit, I believe, and the only issue appears to be a changed line of diff context. Given the discussion about whether such a change is desired, it sounds like more thought is needed before deciding what change should be made, if any.
Hide
Mike Anderson added a comment -

This is a pretty bad defect. It absolutely needs to be fixed. It's not really about whether using a mix of Clojure and Java collections is a likely use case or not (it probably isn't...), it's about providing consistent guarantees that people can rely upon.

For example, now I'm really unsure about whether some of the library functions I have that use sets or maps are broken or not. I'd be particularly worried about anything that implements object caches / memoisation / interning based on hashed values. Such code may now have some really nasty subtle defects.

Since they are library functions, I can't guarantee what kind of objects are passed in so the code has to work with all possible inputs (either that or I need to write a clear docstring and throw an exception if the input is not supported).

Show
Mike Anderson added a comment - This is a pretty bad defect. It absolutely needs to be fixed. It's not really about whether using a mix of Clojure and Java collections is a likely use case or not (it probably isn't...), it's about providing consistent guarantees that people can rely upon. For example, now I'm really unsure about whether some of the library functions I have that use sets or maps are broken or not. I'd be particularly worried about anything that implements object caches / memoisation / interning based on hashed values. Such code may now have some really nasty subtle defects. Since they are library functions, I can't guarantee what kind of objects are passed in so the code has to work with all possible inputs (either that or I need to write a clear docstring and throw an exception if the input is not supported).
Hide
Michał Marczyk added a comment - - edited

This patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) makes hasheq consistent with = for java.util.{List,Map,Map.Entry,Set}. Additionally it extends the special treatment of String (return hasheq of hashCode) to all types not otherwise handled (see below for a comment on this).

It is also available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-2

An earlier version is available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq

If I understand correctly, what needs to be benchmarked is primarily the "dispatch time" for clojure.lang.Util/hasheq given a Clojure type. So, I ran a Criterium benchmark repeatedly hashing the same persistent hash map, on the theory that this will measure just the dispatch time on IHashEq instances. I then ran a separate benchmark hashing a PHM, a string and a long and adding up the results with unchecked-add. Hopefully this is a good start; I've no doubt additional benchmarks would be useful.

The results are somewhat surprising to me: hasheq on PHM is actually slightly faster in this benchmark on my build than on 1.6.0; the "add three hasheqs" benchmark is slightly faster on 1.6.0.

;;; 1.6.0

;;; NB. j.u.HM benchmark irrelevant
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.24405836928592 % of runtime
Evaluation count : 5549560980 in 60 samples of 92492683 calls.
             Execution time mean : 9.229881 ns
    Execution time std-deviation : 0.156716 ns
   Execution time lower quantile : 8.985994 ns ( 2.5%)
   Execution time upper quantile : 9.574039 ns (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 6.2652 % Variance is slightly inflated by outliers
Evaluation count : 35647680 in 60 samples of 594128 calls.
             Execution time mean : 1.695145 µs
    Execution time std-deviation : 20.186554 ns
   Execution time lower quantile : 1.670049 µs ( 2.5%)
   Execution time upper quantile : 1.740329 µs (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.028614538339401 % of runtime
Evaluation count : 1029948300 in 60 samples of 17165805 calls.
             Execution time mean : 56.797488 ns
    Execution time std-deviation : 0.732221 ns
   Execution time lower quantile : 55.856731 ns ( 2.5%)
   Execution time upper quantile : 58.469940 ns (97.5%)
                   Overhead used : 1.836671 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; patch applied

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5537698680 in 60 samples of 92294978 calls.
             Execution time mean : 8.973200 ns
    Execution time std-deviation : 0.157079 ns
   Execution time lower quantile : 8.733544 ns ( 2.5%)
   Execution time upper quantile : 9.289350 ns (97.5%)
                   Overhead used : 1.744772 ns
Evaluation count : 2481600 in 60 samples of 41360 calls.
             Execution time mean : 24.287800 µs
    Execution time std-deviation : 288.124326 ns
   Execution time lower quantile : 23.856445 µs ( 2.5%)
   Execution time upper quantile : 24.774097 µs (97.5%)
                   Overhead used : 1.744772 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.298136122909759 % of runtime
Evaluation count : 954751500 in 60 samples of 15912525 calls.
             Execution time mean : 61.681794 ns
    Execution time std-deviation : 0.712110 ns
   Execution time lower quantile : 60.622003 ns ( 2.5%)
   Execution time upper quantile : 62.904801 ns (97.5%)
                   Overhead used : 1.744772 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

As a side note, the earlier version of the patch available on the other branch doesn't have a separate branch for String. This made hasheq faster for objects implementing IHashEq, but slowed down the "three hashes" benchmark roughly by a factor of 2.

Show
Michał Marczyk added a comment - - edited This patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) makes hasheq consistent with = for java.util.{List,Map,Map.Entry,Set}. Additionally it extends the special treatment of String (return hasheq of hashCode) to all types not otherwise handled (see below for a comment on this). It is also available here: https://github.com/michalmarczyk/clojure/tree/alien-hasheq-2 An earlier version is available here: https://github.com/michalmarczyk/clojure/tree/alien-hasheq If I understand correctly, what needs to be benchmarked is primarily the "dispatch time" for clojure.lang.Util/hasheq given a Clojure type. So, I ran a Criterium benchmark repeatedly hashing the same persistent hash map, on the theory that this will measure just the dispatch time on IHashEq instances. I then ran a separate benchmark hashing a PHM, a string and a long and adding up the results with unchecked-add. Hopefully this is a good start; I've no doubt additional benchmarks would be useful. The results are somewhat surprising to me: hasheq on PHM is actually slightly faster in this benchmark on my build than on 1.6.0; the "add three hasheqs" benchmark is slightly faster on 1.6.0.
;;; 1.6.0

;;; NB. j.u.HM benchmark irrelevant
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.24405836928592 % of runtime
Evaluation count : 5549560980 in 60 samples of 92492683 calls.
             Execution time mean : 9.229881 ns
    Execution time std-deviation : 0.156716 ns
   Execution time lower quantile : 8.985994 ns ( 2.5%)
   Execution time upper quantile : 9.574039 ns (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 6.2652 % Variance is slightly inflated by outliers
Evaluation count : 35647680 in 60 samples of 594128 calls.
             Execution time mean : 1.695145 µs
    Execution time std-deviation : 20.186554 ns
   Execution time lower quantile : 1.670049 µs ( 2.5%)
   Execution time upper quantile : 1.740329 µs (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.028614538339401 % of runtime
Evaluation count : 1029948300 in 60 samples of 17165805 calls.
             Execution time mean : 56.797488 ns
    Execution time std-deviation : 0.732221 ns
   Execution time lower quantile : 55.856731 ns ( 2.5%)
   Execution time upper quantile : 58.469940 ns (97.5%)
                   Overhead used : 1.836671 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; patch applied

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5537698680 in 60 samples of 92294978 calls.
             Execution time mean : 8.973200 ns
    Execution time std-deviation : 0.157079 ns
   Execution time lower quantile : 8.733544 ns ( 2.5%)
   Execution time upper quantile : 9.289350 ns (97.5%)
                   Overhead used : 1.744772 ns
Evaluation count : 2481600 in 60 samples of 41360 calls.
             Execution time mean : 24.287800 µs
    Execution time std-deviation : 288.124326 ns
   Execution time lower quantile : 23.856445 µs ( 2.5%)
   Execution time upper quantile : 24.774097 µs (97.5%)
                   Overhead used : 1.744772 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.298136122909759 % of runtime
Evaluation count : 954751500 in 60 samples of 15912525 calls.
             Execution time mean : 61.681794 ns
    Execution time std-deviation : 0.712110 ns
   Execution time lower quantile : 60.622003 ns ( 2.5%)
   Execution time upper quantile : 62.904801 ns (97.5%)
                   Overhead used : 1.744772 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
As a side note, the earlier version of the patch available on the other branch doesn't have a separate branch for String. This made hasheq faster for objects implementing IHashEq, but slowed down the "three hashes" benchmark roughly by a factor of 2.
Hide
Alex Miller added a comment -

Just for clarity, please refer to patches attached here by name so as time goes on we don't have to correlate attachment time with comment time.

I'm not particularly worried about the cost of things that implement IHashEq as they should be unaffected other than potential inlining issues. I am curious about the cost of hasheq for objects that fall through to the end of the cases and pay the cost for all of the checks. The list farther up in the comments is a good place to start - things like Class, Character, and Var (which could possibly be addressed in Var).

Show
Alex Miller added a comment - Just for clarity, please refer to patches attached here by name so as time goes on we don't have to correlate attachment time with comment time. I'm not particularly worried about the cost of things that implement IHashEq as they should be unaffected other than potential inlining issues. I am curious about the cost of hasheq for objects that fall through to the end of the cases and pay the cost for all of the checks. The list farther up in the comments is a good place to start - things like Class, Character, and Var (which could possibly be addressed in Var).
Hide
Michał Marczyk added a comment -

Good point, I've edited the above comment to include the patch name.

Thanks for the benchmarking suggestions – I'll post some new results in ~6 minutes.

Show
Michał Marczyk added a comment - Good point, I've edited the above comment to include the patch name. Thanks for the benchmarking suggestions – I'll post some new results in ~6 minutes.
Hide
Michał Marczyk added a comment -

First, for completeness, here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-alternative.patch) which doesn't do the extra murmuring for types not otherwise handled. It's slower for the single PHM case; see below for details. Also, here's the branch on GitHub:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-3

As for the new results, the perf hit is quite large, I'm afraid:

;;; with patch (murmur hashCode for default version)
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.409118084170768 % of runtime
Evaluation count : 655363680 in 60 samples of 10922728 calls.
             Execution time mean : 96.459888 ns
    Execution time std-deviation : 1.019817 ns
   Execution time lower quantile : 95.079086 ns ( 2.5%)
   Execution time upper quantile : 98.684168 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 675919140 in 60 samples of 11265319 calls.
             Execution time mean : 88.965959 ns
    Execution time std-deviation : 0.825226 ns
   Execution time lower quantile : 87.817159 ns ( 2.5%)
   Execution time upper quantile : 90.755688 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 574987680 in 60 samples of 9583128 calls.
             Execution time mean : 103.881498 ns
    Execution time std-deviation : 1.103615 ns
   Execution time lower quantile : 102.257474 ns ( 2.5%)
   Execution time upper quantile : 106.071144 ns (97.5%)
                   Overhead used : 1.708347 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; 1.6.0
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.3353133083866688 % of runtime
Evaluation count : 1829305260 in 60 samples of 30488421 calls.
             Execution time mean : 34.205701 ns
    Execution time std-deviation : 0.379106 ns
   Execution time lower quantile : 33.680636 ns ( 2.5%)
   Execution time upper quantile : 34.990138 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1858100340 in 60 samples of 30968339 calls.
             Execution time mean : 30.401309 ns
    Execution time std-deviation : 0.213878 ns
   Execution time lower quantile : 30.095976 ns ( 2.5%)
   Execution time upper quantile : 30.871497 ns (97.5%)
                   Overhead used : 1.718257 ns
Evaluation count : 1592932200 in 60 samples of 26548870 calls.
             Execution time mean : 36.292934 ns
    Execution time std-deviation : 0.333512 ns
   Execution time lower quantile : 35.795063 ns ( 2.5%)
   Execution time upper quantile : 36.918183 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

One PHM and Class/Character/Var results with the new patch (no extra murmur step in the default case):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.258952964663877 % of runtime
Evaluation count : 1007768460 in 60 samples of 16796141 calls.
             Execution time mean : 58.195608 ns
    Execution time std-deviation : 0.482804 ns
   Execution time lower quantile : 57.655857 ns ( 2.5%)
   Execution time upper quantile : 59.154655 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
Evaluation count : 647944080 in 60 samples of 10799068 calls.
             Execution time mean : 91.275863 ns
    Execution time std-deviation : 0.659943 ns
   Execution time lower quantile : 90.330980 ns ( 2.5%)
   Execution time upper quantile : 92.711120 ns (97.5%)
                   Overhead used : 1.567532 ns
Evaluation count : 699506160 in 60 samples of 11658436 calls.
             Execution time mean : 84.564131 ns
    Execution time std-deviation : 0.517071 ns
   Execution time lower quantile : 83.765607 ns ( 2.5%)
   Execution time upper quantile : 85.569206 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 594919980 in 60 samples of 9915333 calls.
             Execution time mean : 100.336792 ns
    Execution time std-deviation : 0.811312 ns
   Execution time lower quantile : 99.313490 ns ( 2.5%)
   Execution time upper quantile : 102.167675 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Show
Michał Marczyk added a comment - First, for completeness, here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-alternative.patch) which doesn't do the extra murmuring for types not otherwise handled. It's slower for the single PHM case; see below for details. Also, here's the branch on GitHub: https://github.com/michalmarczyk/clojure/tree/alien-hasheq-3 As for the new results, the perf hit is quite large, I'm afraid:
;;; with patch (murmur hashCode for default version)
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.409118084170768 % of runtime
Evaluation count : 655363680 in 60 samples of 10922728 calls.
             Execution time mean : 96.459888 ns
    Execution time std-deviation : 1.019817 ns
   Execution time lower quantile : 95.079086 ns ( 2.5%)
   Execution time upper quantile : 98.684168 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 675919140 in 60 samples of 11265319 calls.
             Execution time mean : 88.965959 ns
    Execution time std-deviation : 0.825226 ns
   Execution time lower quantile : 87.817159 ns ( 2.5%)
   Execution time upper quantile : 90.755688 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 574987680 in 60 samples of 9583128 calls.
             Execution time mean : 103.881498 ns
    Execution time std-deviation : 1.103615 ns
   Execution time lower quantile : 102.257474 ns ( 2.5%)
   Execution time upper quantile : 106.071144 ns (97.5%)
                   Overhead used : 1.708347 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; 1.6.0
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.3353133083866688 % of runtime
Evaluation count : 1829305260 in 60 samples of 30488421 calls.
             Execution time mean : 34.205701 ns
    Execution time std-deviation : 0.379106 ns
   Execution time lower quantile : 33.680636 ns ( 2.5%)
   Execution time upper quantile : 34.990138 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1858100340 in 60 samples of 30968339 calls.
             Execution time mean : 30.401309 ns
    Execution time std-deviation : 0.213878 ns
   Execution time lower quantile : 30.095976 ns ( 2.5%)
   Execution time upper quantile : 30.871497 ns (97.5%)
                   Overhead used : 1.718257 ns
Evaluation count : 1592932200 in 60 samples of 26548870 calls.
             Execution time mean : 36.292934 ns
    Execution time std-deviation : 0.333512 ns
   Execution time lower quantile : 35.795063 ns ( 2.5%)
   Execution time upper quantile : 36.918183 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
One PHM and Class/Character/Var results with the new patch (no extra murmur step in the default case):
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.258952964663877 % of runtime
Evaluation count : 1007768460 in 60 samples of 16796141 calls.
             Execution time mean : 58.195608 ns
    Execution time std-deviation : 0.482804 ns
   Execution time lower quantile : 57.655857 ns ( 2.5%)
   Execution time upper quantile : 59.154655 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
Evaluation count : 647944080 in 60 samples of 10799068 calls.
             Execution time mean : 91.275863 ns
    Execution time std-deviation : 0.659943 ns
   Execution time lower quantile : 90.330980 ns ( 2.5%)
   Execution time upper quantile : 92.711120 ns (97.5%)
                   Overhead used : 1.567532 ns
Evaluation count : 699506160 in 60 samples of 11658436 calls.
             Execution time mean : 84.564131 ns
    Execution time std-deviation : 0.517071 ns
   Execution time lower quantile : 83.765607 ns ( 2.5%)
   Execution time upper quantile : 85.569206 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 594919980 in 60 samples of 9915333 calls.
             Execution time mean : 100.336792 ns
    Execution time std-deviation : 0.811312 ns
   Execution time lower quantile : 99.313490 ns ( 2.5%)
   Execution time upper quantile : 102.167675 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Hide
Michał Marczyk added a comment -

Here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-substring.patch) that takes the outrageous approach of replacing the Iterable/Map/Entry test with a .startsWith("java.util.") on the class name. (I experimented with .getClass().getPackage(), but the performance of that was terrible.) The branch is here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-4

Hash perf on the "fall-through" cases with this patch seems to be very good:

user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.31690036780011 % of runtime
Evaluation count : 1661453640 in 60 samples of 27690894 calls.
             Execution time mean : 35.099750 ns
    Execution time std-deviation : 0.422800 ns
   Execution time lower quantile : 34.454839 ns ( 2.5%)
   Execution time upper quantile : 35.953584 ns (97.5%)
                   Overhead used : 1.556642 ns
Evaluation count : 1630167600 in 60 samples of 27169460 calls.
             Execution time mean : 35.487409 ns
    Execution time std-deviation : 0.309872 ns
   Execution time lower quantile : 35.083030 ns ( 2.5%)
   Execution time upper quantile : 36.190015 ns (97.5%)
                   Overhead used : 1.556642 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 3 (5.0000 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1440434700 in 60 samples of 24007245 calls.
             Execution time mean : 40.894457 ns
    Execution time std-deviation : 0.529510 ns
   Execution time lower quantile : 40.055991 ns ( 2.5%)
   Execution time upper quantile : 41.990985 ns (97.5%)
                   Overhead used : 1.556642 ns
nil
Show
Michał Marczyk added a comment - Here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-substring.patch) that takes the outrageous approach of replacing the Iterable/Map/Entry test with a .startsWith("java.util.") on the class name. (I experimented with .getClass().getPackage(), but the performance of that was terrible.) The branch is here: https://github.com/michalmarczyk/clojure/tree/alien-hasheq-4 Hash perf on the "fall-through" cases with this patch seems to be very good:
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.31690036780011 % of runtime
Evaluation count : 1661453640 in 60 samples of 27690894 calls.
             Execution time mean : 35.099750 ns
    Execution time std-deviation : 0.422800 ns
   Execution time lower quantile : 34.454839 ns ( 2.5%)
   Execution time upper quantile : 35.953584 ns (97.5%)
                   Overhead used : 1.556642 ns
Evaluation count : 1630167600 in 60 samples of 27169460 calls.
             Execution time mean : 35.487409 ns
    Execution time std-deviation : 0.309872 ns
   Execution time lower quantile : 35.083030 ns ( 2.5%)
   Execution time upper quantile : 36.190015 ns (97.5%)
                   Overhead used : 1.556642 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 3 (5.0000 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1440434700 in 60 samples of 24007245 calls.
             Execution time mean : 40.894457 ns
    Execution time std-deviation : 0.529510 ns
   Execution time lower quantile : 40.055991 ns ( 2.5%)
   Execution time upper quantile : 41.990985 ns (97.5%)
                   Overhead used : 1.556642 ns
nil
Hide
Michał Marczyk added a comment -

The new patch (...-substring.patch) returns hashCode for java.util.** classes other than List, Map, Map.Entry and Set, of course, so no behaviour change there.

Here are the benchmarks for repeated PHM lookups (slightly slower than 1.6.0 apparently, though within 1 ns) and the "add three hasheqs" benchmark (66 ns with patch vs. 57 ns without):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5183841240 in 60 samples of 86397354 calls.
             Execution time mean : 10.076893 ns
    Execution time std-deviation : 0.182592 ns
   Execution time lower quantile : 9.838456 ns ( 2.5%)
   Execution time upper quantile : 10.481086 ns (97.5%)
                   Overhead used : 1.565749 ns
Evaluation count : 3090420 in 60 samples of 51507 calls.
             Execution time mean : 19.596627 µs
    Execution time std-deviation : 224.380257 ns
   Execution time lower quantile : 19.288347 µs ( 2.5%)
   Execution time upper quantile : 20.085620 µs (97.5%)
                   Overhead used : 1.565749 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.418253438197936 % of runtime
Evaluation count : 879210900 in 60 samples of 14653515 calls.
             Execution time mean : 66.939309 ns
    Execution time std-deviation : 0.747984 ns
   Execution time lower quantile : 65.667310 ns ( 2.5%)
   Execution time upper quantile : 68.155046 ns (97.5%)
                   Overhead used : 1.724002 ns
nil

It is important to note that I have obtained the no-patch result for the "three hasheqs" benchmarks on a fresh JVM when benchmarking 1.6.0, so that's also how I repeated the benchmark with the patch applied. Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?

Show
Michał Marczyk added a comment - The new patch (...-substring.patch) returns hashCode for java.util.** classes other than List, Map, Map.Entry and Set, of course, so no behaviour change there. Here are the benchmarks for repeated PHM lookups (slightly slower than 1.6.0 apparently, though within 1 ns) and the "add three hasheqs" benchmark (66 ns with patch vs. 57 ns without):
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5183841240 in 60 samples of 86397354 calls.
             Execution time mean : 10.076893 ns
    Execution time std-deviation : 0.182592 ns
   Execution time lower quantile : 9.838456 ns ( 2.5%)
   Execution time upper quantile : 10.481086 ns (97.5%)
                   Overhead used : 1.565749 ns
Evaluation count : 3090420 in 60 samples of 51507 calls.
             Execution time mean : 19.596627 µs
    Execution time std-deviation : 224.380257 ns
   Execution time lower quantile : 19.288347 µs ( 2.5%)
   Execution time upper quantile : 20.085620 µs (97.5%)
                   Overhead used : 1.565749 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.418253438197936 % of runtime
Evaluation count : 879210900 in 60 samples of 14653515 calls.
             Execution time mean : 66.939309 ns
    Execution time std-deviation : 0.747984 ns
   Execution time lower quantile : 65.667310 ns ( 2.5%)
   Execution time upper quantile : 68.155046 ns (97.5%)
                   Overhead used : 1.724002 ns
nil
It is important to note that I have obtained the no-patch result for the "three hasheqs" benchmarks on a fresh JVM when benchmarking 1.6.0, so that's also how I repeated the benchmark with the patch applied. Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?
Hide
Michał Marczyk added a comment -

Here's a new patch (0005-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) that introduces a new isAlien static method that checks for instanceof Map/Map.Entry/Iterable and uses this method to test for "alien collection".

Initial benchmarking results are promising:

;;; "fall-through" benchmark
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.258979068087473 % of runtime
Evaluation count : 1598432100 in 60 samples of 26640535 calls.
             Execution time mean : 36.358882 ns
    Execution time std-deviation : 0.566925 ns
   Execution time lower quantile : 35.718889 ns ( 2.5%)
   Execution time upper quantile : 37.414722 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1626362460 in 60 samples of 27106041 calls.
             Execution time mean : 35.426993 ns
    Execution time std-deviation : 0.294517 ns
   Execution time lower quantile : 35.047064 ns ( 2.5%)
   Execution time upper quantile : 36.058667 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1461423180 in 60 samples of 24357053 calls.
             Execution time mean : 39.541873 ns
    Execution time std-deviation : 0.423707 ns
   Execution time lower quantile : 38.943560 ns ( 2.5%)
   Execution time upper quantile : 40.499433 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; "three hasheqs" benchmark
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.5536755331464491 % of runtime
Evaluation count : 820376460 in 60 samples of 13672941 calls.
             Execution time mean : 71.999365 ns
    Execution time std-deviation : 0.746588 ns
   Execution time lower quantile : 70.869739 ns ( 2.5%)
   Execution time upper quantile : 73.565908 ns (97.5%)
                   Overhead used : 1.738155 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Show
Michał Marczyk added a comment - Here's a new patch (0005-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) that introduces a new isAlien static method that checks for instanceof Map/Map.Entry/Iterable and uses this method to test for "alien collection". Initial benchmarking results are promising:
;;; "fall-through" benchmark
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.258979068087473 % of runtime
Evaluation count : 1598432100 in 60 samples of 26640535 calls.
             Execution time mean : 36.358882 ns
    Execution time std-deviation : 0.566925 ns
   Execution time lower quantile : 35.718889 ns ( 2.5%)
   Execution time upper quantile : 37.414722 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1626362460 in 60 samples of 27106041 calls.
             Execution time mean : 35.426993 ns
    Execution time std-deviation : 0.294517 ns
   Execution time lower quantile : 35.047064 ns ( 2.5%)
   Execution time upper quantile : 36.058667 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1461423180 in 60 samples of 24357053 calls.
             Execution time mean : 39.541873 ns
    Execution time std-deviation : 0.423707 ns
   Execution time lower quantile : 38.943560 ns ( 2.5%)
   Execution time upper quantile : 40.499433 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; "three hasheqs" benchmark
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.5536755331464491 % of runtime
Evaluation count : 820376460 in 60 samples of 13672941 calls.
             Execution time mean : 71.999365 ns
    Execution time std-deviation : 0.746588 ns
   Execution time lower quantile : 70.869739 ns ( 2.5%)
   Execution time upper quantile : 73.565908 ns (97.5%)
                   Overhead used : 1.738155 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Hide
Michał Marczyk added a comment -

Ah, I left out the repeated phm hasheq lookup + hasheq of a java.util.HashMap instance pair of benchmarks from the above – here it is for completeness (no surprises though):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.260853406580491 % of runtime
Evaluation count : 5369135760 in 60 samples of 89485596 calls.
             Execution time mean : 10.380464 ns
    Execution time std-deviation : 3.407284 ns
   Execution time lower quantile : 9.510624 ns ( 2.5%)
   Execution time upper quantile : 11.461485 ns (97.5%)
                   Overhead used : 1.566301 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 96.4408 % Variance is severely inflated by outliers
Evaluation count : 3078180 in 60 samples of 51303 calls.
             Execution time mean : 19.717981 µs
    Execution time std-deviation : 209.896848 ns
   Execution time lower quantile : 19.401811 µs ( 2.5%)
   Execution time upper quantile : 20.180163 µs (97.5%)
                   Overhead used : 1.566301 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Show
Michał Marczyk added a comment - Ah, I left out the repeated phm hasheq lookup + hasheq of a java.util.HashMap instance pair of benchmarks from the above – here it is for completeness (no surprises though):
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.260853406580491 % of runtime
Evaluation count : 5369135760 in 60 samples of 89485596 calls.
             Execution time mean : 10.380464 ns
    Execution time std-deviation : 3.407284 ns
   Execution time lower quantile : 9.510624 ns ( 2.5%)
   Execution time upper quantile : 11.461485 ns (97.5%)
                   Overhead used : 1.566301 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 96.4408 % Variance is severely inflated by outliers
Evaluation count : 3078180 in 60 samples of 51303 calls.
             Execution time mean : 19.717981 µs
    Execution time std-deviation : 209.896848 ns
   Execution time lower quantile : 19.401811 µs ( 2.5%)
   Execution time upper quantile : 20.180163 µs (97.5%)
                   Overhead used : 1.566301 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Hide
Alex Miller added a comment -

Please don't submit any patches that change hashcode for anything other than making Java collections match Clojure collections - any other change is out of scope of this ticket.

In general, I would prefer just the execution time mean report for the moment rather than everything - the full criterium output makes these comments much harder to read and compare.

Show
Alex Miller added a comment - Please don't submit any patches that change hashcode for anything other than making Java collections match Clojure collections - any other change is out of scope of this ticket. In general, I would prefer just the execution time mean report for the moment rather than everything - the full criterium output makes these comments much harder to read and compare.
Hide
Alex Miller added a comment -

Could I get a summary of approaches, and a timing of 1.6.0 vs each patch for a consistent set of tests - say time of hash for Long, PHM, juHM, Class, and the "three hasheqs" test?

Show
Alex Miller added a comment - Could I get a summary of approaches, and a timing of 1.6.0 vs each patch for a consistent set of tests - say time of hash for Long, PHM, juHM, Class, and the "three hasheqs" test?
Hide
Rich Hickey added a comment -

"Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?"

Right - if your benchmarks do not treat this site as megamorphic you will get all sorts of distorted results.

Show
Rich Hickey added a comment - "Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?" Right - if your benchmarks do not treat this site as megamorphic you will get all sorts of distorted results.
Hide
Michał Marczyk added a comment -

Ok, I have what I think is an improved microbenchmark for this: xor of hasheqs for a long, a double, a string, a class, a character and a PHM (single instance, so it'll be a hash lookup). The results are not very encouraging.

Single form including the require to make it convenient to run; also bundled is a j.u.HashMap (128 entries) hasheq benchmark:

(do
  (require '[criterium.core :as c])
  (let [l    41235125123
        d    123.456
        s    "asdf;lkjh"
        k    BigInteger
        c    \S
        phm  (apply hash-map (interleave (range 128) (range 128)))
        juhm (java.util.HashMap. phm)
        f    (fn f []
               (-> (clojure.lang.Util/hasheq l)
                   (bit-xor (clojure.lang.Util/hasheq d))
                   (bit-xor (clojure.lang.Util/hasheq s))
                   (bit-xor (clojure.lang.Util/hasheq k))
                   (bit-xor (clojure.lang.Util/hasheq c))
                   (bit-xor (clojure.lang.Util/hasheq phm))))]
    (c/bench (f))
    (c/bench (hash juhm))))

Mean execution time as reported by Criterium:

version xor (ns) j.u.HM (µs)
unpatched 1.6.0 148.128748 1.701640
0005 patch 272.039667 21.201178
original patch 268.670316 21.169436
-alternative patch 271.747043 20.755397

The substring patch is broken (see below), so I skipped it. The patch I'm describing as the "original" one is attached as 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch.

Decisions common to all the patches:

1. One extra if statement in hasheq just above the default return with a three-way instanceof check.

2. The types tested for are j.u.Iterable, j.u.Map.Entry and j.u.Map.

3. Murmur3.hashOrdered takes Iterable, so that's why it's on the list. Map does not extend Iterable, so it's listed separately. Map.Entry is on the list, because ultimately the way to hash maps is to iterate over and hash their entries.

4. The actual hashing of the "alien" / host types is done by a separate static method – clojure.lang.Util.doalienhasheq – on the theory that this will permit hasheq to be inlined more aggressively and limit the worst of the perf hit to alien collections.

5. doalienhasheq checks for Map, Map.Entry, Set and List; entries are converted to lists for hashing, maps are hashed through entry sets and lists and sets are passed directly to Murmur3.

6. There is also a default case for other Iterable types – we must return hashCode or the result of composing some other function with hashCode for these, since we use equals to test them for equivalence.

The 0005 patch has hasheq call a separate private static method to perform the three-way type check, whereas the others put the check directly in the actual if test. The -alternative patch and the 0005 patch return hashCode in the default case, whereas the original patch composes Murmur3.hashInt with hashCode.

The substring patch only works for java.util.** classes and so doesn't solve the problem (it wouldn't correctly hash Guava collections, for example).

All of the patches change c.l.Util.hasheq and add one or two new static methods to clojure.lang.Util that act as helpers for hasheq. None of them changes anything else. Murmuring hashCode was a performance experiment that appeared to have a slight positive impact on some of the "fast cases" (in fact it's still the best performer among the current three patches in the microbenchmark presented above, although the margin of victory is of course extremely tiny). Thus I think all the current patches are in fact limited in scope to changes directly relevant to the ticket; the -alternative patch and the 0005 patch certainly are.

Show
Michał Marczyk added a comment - Ok, I have what I think is an improved microbenchmark for this: xor of hasheqs for a long, a double, a string, a class, a character and a PHM (single instance, so it'll be a hash lookup). The results are not very encouraging. Single form including the require to make it convenient to run; also bundled is a j.u.HashMap (128 entries) hasheq benchmark:
(do
  (require '[criterium.core :as c])
  (let [l    41235125123
        d    123.456
        s    "asdf;lkjh"
        k    BigInteger
        c    \S
        phm  (apply hash-map (interleave (range 128) (range 128)))
        juhm (java.util.HashMap. phm)
        f    (fn f []
               (-> (clojure.lang.Util/hasheq l)
                   (bit-xor (clojure.lang.Util/hasheq d))
                   (bit-xor (clojure.lang.Util/hasheq s))
                   (bit-xor (clojure.lang.Util/hasheq k))
                   (bit-xor (clojure.lang.Util/hasheq c))
                   (bit-xor (clojure.lang.Util/hasheq phm))))]
    (c/bench (f))
    (c/bench (hash juhm))))
Mean execution time as reported by Criterium:
version xor (ns) j.u.HM (µs)
unpatched 1.6.0 148.128748 1.701640
0005 patch 272.039667 21.201178
original patch 268.670316 21.169436
-alternative patch 271.747043 20.755397
The substring patch is broken (see below), so I skipped it. The patch I'm describing as the "original" one is attached as 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch. Decisions common to all the patches: 1. One extra if statement in hasheq just above the default return with a three-way instanceof check. 2. The types tested for are j.u.Iterable, j.u.Map.Entry and j.u.Map. 3. Murmur3.hashOrdered takes Iterable, so that's why it's on the list. Map does not extend Iterable, so it's listed separately. Map.Entry is on the list, because ultimately the way to hash maps is to iterate over and hash their entries. 4. The actual hashing of the "alien" / host types is done by a separate static method – clojure.lang.Util.doalienhasheq – on the theory that this will permit hasheq to be inlined more aggressively and limit the worst of the perf hit to alien collections. 5. doalienhasheq checks for Map, Map.Entry, Set and List; entries are converted to lists for hashing, maps are hashed through entry sets and lists and sets are passed directly to Murmur3. 6. There is also a default case for other Iterable types – we must return hashCode or the result of composing some other function with hashCode for these, since we use equals to test them for equivalence. The 0005 patch has hasheq call a separate private static method to perform the three-way type check, whereas the others put the check directly in the actual if test. The -alternative patch and the 0005 patch return hashCode in the default case, whereas the original patch composes Murmur3.hashInt with hashCode. The substring patch only works for java.util.** classes and so doesn't solve the problem (it wouldn't correctly hash Guava collections, for example). All of the patches change c.l.Util.hasheq and add one or two new static methods to clojure.lang.Util that act as helpers for hasheq. None of them changes anything else. Murmuring hashCode was a performance experiment that appeared to have a slight positive impact on some of the "fast cases" (in fact it's still the best performer among the current three patches in the microbenchmark presented above, although the margin of victory is of course extremely tiny). Thus I think all the current patches are in fact limited in scope to changes directly relevant to the ticket; the -alternative patch and the 0005 patch certainly are.
Hide
Michał Marczyk added a comment -

For completeness, branching on Map, Set etc. directly in hasheq, as with Jozef's original patch, results in the following timings in the microbenchmark introduced in my previous comment:

xor 315.866626 ns
juhm 18.520133 µs
Show
Michał Marczyk added a comment - For completeness, branching on Map, Set etc. directly in hasheq, as with Jozef's original patch, results in the following timings in the microbenchmark introduced in my previous comment:
xor 315.866626 ns
juhm 18.520133 µs
Hide
Michał Marczyk added a comment - - edited

New patch (0006) that leaves out the Map.Entry check; instead, two methods are introduced in the Murmur3 class to handle j.u.maps.

Java map entries aren't really integrated into Clojure – you can't use them like vectors, can't call seq on them etc. – so I don't think they need to match Clojure map entries in hasheq as long as j.u.maps do.

Timings:

xor 233.341689 ns
juhm 9.104637 µs
Show
Michał Marczyk added a comment - - edited New patch (0006) that leaves out the Map.Entry check; instead, two methods are introduced in the Murmur3 class to handle j.u.maps. Java map entries aren't really integrated into Clojure – you can't use them like vectors, can't call seq on them etc. – so I don't think they need to match Clojure map entries in hasheq as long as j.u.maps do. Timings:
xor 233.341689 ns
juhm 9.104637 µs
Hide
Michał Marczyk added a comment -

Checking for Map/Iterable in-line doesn't seem to affect xor benchmark results very much, but makes juhm hashing quicker. This is rather surprising to me. In any case, here's a new patch (0007) and the timings:

xor 233.062337 ns
juhm 8.629149 µs
Show
Michał Marczyk added a comment - Checking for Map/Iterable in-line doesn't seem to affect xor benchmark results very much, but makes juhm hashing quicker. This is rather surprising to me. In any case, here's a new patch (0007) and the timings:
xor 233.062337 ns
juhm 8.629149 µs
Hide
Alex Miller added a comment -

What are equivalent timings without the patch?

Show
Alex Miller added a comment - What are equivalent timings without the patch?
Hide
Michał Marczyk added a comment -

They're listed in the table in the comment introducing the benchmark – 148.128748 ns for xor, 1.701640 µs for juhm.

Show
Michał Marczyk added a comment - They're listed in the table in the comment introducing the benchmark – 148.128748 ns for xor, 1.701640 µs for juhm.
Hide
Alex Miller added a comment -

What if we override hasheq for different types instead of using instanceof?

Show
Alex Miller added a comment - What if we override hasheq for different types instead of using instanceof?
Hide
Michał Marczyk added a comment -

Overloaded methods are resolved statically, so there's no avoiding testing for type in the Object overload.

A more specific overload could be used to speed up hashing for its parameter type given a type hint or for literals, since the compiler would emit calls to that overload given appropriate compile-time information. There wouldn't be any speed-up in "implicit" hashing during hash map / set ops, however.

Show
Michał Marczyk added a comment - Overloaded methods are resolved statically, so there's no avoiding testing for type in the Object overload. A more specific overload could be used to speed up hashing for its parameter type given a type hint or for literals, since the compiler would emit calls to that overload given appropriate compile-time information. There wouldn't be any speed-up in "implicit" hashing during hash map / set ops, however.
Hide
Daniel Compton added a comment -

This hit me when upgrading Factual/skuld from 1.5.1 to 1.6. clojure.data.fressian serialises c.l.PersistentHashSet sets into java.util.HashSet. This breaks equality checking in https://github.com/Factual/skuld/blob/b720feb142e6d274e85be208dc1d6d8634801719/test/skuld/net_test.clj#L8-L29 as we are comparing a set of maps where the original set contains a PersistentSet and the serialised and deserialised one contains a HashSet.

Show
Daniel Compton added a comment - This hit me when upgrading Factual/skuld from 1.5.1 to 1.6. clojure.data.fressian serialises c.l.PersistentHashSet sets into java.util.HashSet. This breaks equality checking in https://github.com/Factual/skuld/blob/b720feb142e6d274e85be208dc1d6d8634801719/test/skuld/net_test.clj#L8-L29 as we are comparing a set of maps where the original set contains a PersistentSet and the serialised and deserialised one contains a HashSet.
Hide
Daniel Compton added a comment - - edited

This has come up again for me, details are in http://dev.clojure.org/jira/browse/DFRS-7

Show
Daniel Compton added a comment - - edited This has come up again for me, details are in http://dev.clojure.org/jira/browse/DFRS-7

People

Vote (7)
Watch (9)

Dates

  • Created:
    Updated: