<< Back to previous view

[CLJS-754] Assess Murmur3 for Collections Created: 29/Jan/14  Updated: 01/Jul/14  Resolved: 01/Jul/14

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: David Nolen Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: None


 Description   

We should assess the performance implications of adopting Murmur3 for hashing - http://github.com/clojure/clojure/commit/dff9600387b962f16fc78e6477e10e34651fd366



 Comments   
Comment by Francis Avila [ 19/Feb/14 5:19 AM ]

I have a (lightly tested, probably untrustworthy) gist of clj.lang.murmur3 ported to javascript. The two other murmur3 js libs I saw both had flaws which I attempt to address.

Found a jsperf comparing murmur3 libs and added a goog.string.hashCode case as a baseline. This jsperf test isn't very good because:

  1. it only tests a tiny string
  2. the murmur libraries are both flawed:
    1. one of them assumes strings are 8-bit only (consumes 4 chars at a time for hashing)
    2. the other one treats strings as utf-16, but does not account for integer overflows.

However, the results are still informative--string hashing with murmur3 in js will probably be half as fast as goog.string.hashCode, except on firefox where it is about the same.

Not nearly enough data for a true assessment, but I hope it helps.

Comment by Francis Avila [ 19/Feb/14 10:20 AM ]

It occurs to me that the slowdown is probably entirely due to integer multiplication, and that spidermonkey's jit is detecting and optimizing that pattern of code to Math.imul while the others are not. Math.imul is still experimental but supported on many browsers already--it might be better to polyfill it instead of unconditionally inlining the bit-shifting. Will try later.

Comment by David Nolen [ 19/Feb/14 2:23 PM ]

There's very little need to write this logic in JavaScript - it should just be done in ClojureScript. I think modern JS engines can probably handle Murmur3, here are some perf numbers for simulating multiplication of 2 unsigned 32bit ints that I found on StackOverflow http://jsperf.com/bit-multiply

Comment by Francis Avila [ 19/Feb/14 3:14 PM ]

I agree re: implementing in clojurescript. It looked hairier when I started.

You're also right about the bit multiply. I extended that case with larger input values and a comparison against Math.imul and it seems to be compiling to the same code. Maybe inlining the bit-multiply is confusing the jits, or maybe it's something else entirely. I'll give it another shot when I get a chance.

Any advice on setting up tests for this that would work in the clojurescript project's environment? Probably what we want is a generative test which ensures the hash value in cljs is the same as in clj?

Comment by David Nolen [ 19/Feb/14 4:34 PM ]

Generative tests that guarantee the hashes are the same in both CLJS & CLJ is a nice to have and not really a priority. Murmur3 is pretty straightforward looking, I'm sure we can get it right

Comment by David Nolen [ 20/Feb/14 9:57 AM ]

One of the V8 engineer did a more accurate version of the benchmark. It looks like Math.imul is the way to go and we should shim in something for older browsers.

http://jsperf.com/bit-multiply/5

Comment by Francis Avila [ 22/Feb/14 2:20 AM ]

I thought those results looked too good. 32-bit maths in js is always tricky, hence desire for tests, especially if one of the use cases is comparing compile-time and run-time hashes.

Cleaned up js implementation of murmur3 just to get something up quickly to assess murmur3 performance. It uses Math.imul or a shim for integer multiplication. Then created a pair of jsperfs:

In both tests, setup code is the same: pretty-printed closure-advanced-compiled code from my js-murmur3 implementation, and copy-pasted code from cljs's current number and string hashing.

Results are...interesting. So interesting that I am suspicious that something is wrong with my benchmarks or code. Perhaps you can have Mr. Egorov take a look? In summary:

  • murmur3 int hash is about an 8% performance regression in firefox and safari.
  • murmur3 string hash is more than double the speed of goog.string.hashCode on ff and safari for small strings, and even better for large ones.
  • Except in chrome, where murmur3 is abysmal--about an order of magnitude regression on both ints and strings!

All these browsers have a native Math.imul, and Chrome's imul is definitely working. There must be something else making chrome's jit cranky.

I did not expect murmur3 to perform so well or so badly!

Comment by Francis Avila [ 26/Feb/14 3:37 AM ]

Maintaining a fork with murmur3 hashing (see murmur3 branch). So far numbers, strings, and collections (ordered and unordered) hash identically to clojure 1.6-beta1, but symbols and keywords do not. I suspect integer-math differences in cljs's hash-combine vs clojure.lang.Util/hashCombine, but I have not isolated the problem yet.

Comment by Francis Avila [ 26/Feb/14 3:06 PM ]

Nevermind, cljs and clj reverse the order of ns+name hashing and I just didn't notice.

This branch appears to produce hashes identical to clojure 1.6 for the following types:

  • strings
  • keywords and symbols
  • vectors, lists, persistentqueue, seqs
  • maps, sets
  • integers representable with doubles in js (i.e., about 53 bits of signed integer)

Types which do not hash the same (likely incomplete list):

  • uuids
  • instances
  • floating-point types. Clojure 1.6 still uses Java's native hashCode for these, and checking for a non-int is a bit involved in js, so I just hash all js numbers as if they were longs. This might be a bad idea if we have a collection of "real" doubles.
Comment by David Nolen [ 08/Jun/14 12:15 PM ]

Murmur3 is now implemented in the 1.6.0 branch http://github.com/clojure/clojurescript/tree/1.6.0

Comment by David Nolen [ 01/Jul/14 9:25 PM ]

fixed in master

Generated at Wed Nov 26 05:29:09 CST 2014 using JIRA 4.4#649-r158309.