<< Back to previous view

[CLJS-178] PersistentHashMap implementation Created: 11/Apr/12  Updated: 27/Jul/13  Resolved: 19/Apr/12

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Michał Marczyk Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File 0001-PersistentHashMap-improvements-TransientHashMap-Tran.patch     Text File 0001-PersistentHashMap-ported-from-Clojure.patch     Text File 0001-PersistentHashMap-ported-from-Clojure.patch     Text File 0001-PersistentHashMap-ported-from-Clojure.patch     Text File 0001-PersistentHashMap-ported-from-Clojure.patch     Text File 0001-PersistentHashMap-ported-from-Clojure.patch     Text File 0002-Fix-BitmapIndexedNode-s-inode-find-impl-with-not-fou.patch     Text File 0002-Have-cljs.core-hash-map-and-compiler-s-emit-map-use-.patch     Text File 0002-Transient-support-for-PersistentHashMap-ported-from-.patch     Text File 0002-Transient-support-for-PersistentHashMap-ported-from-.patch     Text File 0003-Remove-most-uses-of-js-in-PersistentHashMap-add-bit-.patch     Text File 0004-Remove-INode-proto-impl-methods-via-Object.patch     Text File 0005-PersistentHashMap-rename-added-leaf-to-added-leaf.patch     Text File 0006-Remove-remaining-uses-of-js-in-PersistentHashMap.patch    
Patch: Code

 Description   

0001-PersistentHashMap-ported-from-Clojure.patch contains a port of Clojure's PersistentHashMap.

0002-...-.patch rewires cljs.core/hash-map and cljs.compiler/emit's :map method to use cljs.core.PersistentHashMap. With that in place, the regular cljs test suite runs fine.

As a quick sanity check,

(loop [m cljs.core.PersistentHashMap/EMPTY i 0]
  (if (< i 10000)
    (recur (assoc m i :foo) (inc i))
    (loop [i 0]
      (if (< i 10000)
        (do (assert (= :foo (get m i)))
            (recur (inc i)))))))

runs fine under Rhino. Timing this loop produces the expected results: it takes 1.1s on my machine as written above and 10.5s with cljs.core.HashMap/EMPTY bound to m. With 100000 iterations, PersistentHashMap takes 11.8s, whereas HashMap takes forever.

TODO:

  • iron out any remaining bugs, add in missing protocols (if any) etc.,
  • test HashCollisionNode and related code,
  • use other map types for small maps and switch at some threshold.

See https://groups.google.com/d/topic/clojure-dev/DMcV4QwApuY/discussion.



 Comments   
Comment by Michał Marczyk [ 12/Apr/12 7:50 AM ]

Note that I'm developing this code on a branch at

https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map

The branch may be ahead of the patch here (as it is at the time of this writing); I'll update the patch periodically.

The rewiring gets tested on a separate branch:

https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map-tests

Comment by David Nolen [ 12/Apr/12 10:08 AM ]

Some notes after a quick look. Don't use js*. We have (make-array n) now.

Don't create a protocol for the node interface. Protocols are nice but introduce overhead. Just add those methods via Object.

Comment by Michał Marczyk [ 12/Apr/12 12:14 PM ]

Thanks! Updated both branches with these changes, plus a new bit-shift-right-zero-fill function in core (with accompanying compiler macro).

Comment by David Nolen [ 12/Apr/12 2:44 PM ]

Please attach new patches here with the changes - thanks.

Comment by David Nolen [ 12/Apr/12 2:53 PM ]

I also notice that you're using js* in a couple of other places to inline array constructors. But array is already a compiler macro - again no need for js*.

Comment by Michał Marczyk [ 12/Apr/12 3:00 PM ]

These patches bring master in line with my persistent-hash-map branch; the "rewiring" are coming in a moment.

Comment by Michał Marczyk [ 12/Apr/12 3:05 PM ]

Ah, actually the original rewiring patch still works.

@David Nolen: Thanks for the review! No js* left in PHM now.

Incidentally, I've run a couple of tests of hash collision handling – seems to work fine so far. Also, the way sets are implemented in cljs, they too are persistent with the PHM impl wired for use in cljs.core/hash-map.

One more thing: PHM needs a function to count bits set in a given (presumably integral) number, which I've introduced under the name of cljs.core/bit-count. It is currently not marked private, but it resides in the PHM section of core.cljs. I think it might be generally useful, so I'm inclined to add a docstring and add it to the bit ops section; otherwise it should probably be marked private.

Comment by David Nolen [ 12/Apr/12 3:08 PM ]

Cool. Can you please construct a single patch with all the changes, thanks.

Comment by Michał Marczyk [ 12/Apr/12 3:31 PM ]

@David Nolen: Sure. The new patch includes all changes (in particular it wires up the compiler and hash-map to use PHM) and places bit-count in the bit-ops section (with a docstring).

Comment by David Nolen [ 12/Apr/12 3:37 PM ]

Please remove any code behind #_. Looking more closely at your code, why are you using atoms? Is the added-leaf field only set once per node? If so why not a ^:mutable field for those deftypes.

Comment by Michał Marczyk [ 12/Apr/12 3:46 PM ]

added-leaf? is set at most once per an `assoc` to a PHM to signal that the new PHM needs to have an incremented count in relation to the original. Clojure's impl uses mutable Boxes for this purpose. Looking at this again, I guess a singleton array would make more sense than an atom; I'll make the change.

Comment by David Nolen [ 12/Apr/12 3:52 PM ]

Sounds good.

Comment by Michał Marczyk [ 12/Apr/12 4:38 PM ]

All in one.

Comment by David Nolen [ 12/Apr/12 4:56 PM ]

Looks good! I ran some quick perf tests - access times on small maps vs. ObjMap is actually a significant performance hit. I'll review the patch more thoroughly. I do note that update performance on V8 is pretty stellar.

Fantastic work! I think we're getting very close.

Comment by Michał Marczyk [ 12/Apr/12 11:16 PM ]

I've got an updated version on a branch here:

https://github.com/michalmarczyk/clojurescript/tree/phm-wip-2

Unfortunately I've got to dash right now, so I'll post a new monolithic patch sometime later. I'm closing in on a transient hash map implementation too – it should be ready sometime very soon now.

Many thanks for the review and all the helpful tips, David! The latest branch incorporates some of the changes you suggested, with others due to land next time I'm at the keyboard.

Comment by Michał Marczyk [ 13/Apr/12 7:20 AM ]

The latest changes are available in two branches on my fork, phm and phm-transient.

phm is just the old monolithic commit. phm-transient, on the other hand, introduces TransientHashMap, TransientObjMap, various performance-related fixes (as per David's suggestions; one thing I haven't got around to yet is the alternative bit-count impl – will do that soon) and I believe even a bug fix or two. It's available for viewing at

https://github.com/michalmarczyk/clojurescript/tree/phm-transient

I'll post the patch here in a moment. I'll also rip it apart into single-concern commits if appropriate – I'm afraid I got into the flow on this during a longish train ride, so it's another monolithic commit for now.

Comment by Michał Marczyk [ 13/Apr/12 7:21 AM ]

Meant to be applied on top of the previous monolithic patch.

Comment by David Nolen [ 13/Apr/12 7:26 AM ]

Yes please move the transient work into a separate patch, thanks!

Comment by Michał Marczyk [ 13/Apr/12 7:36 AM ]

Ok, no problem. Will do that in the next batch sometime soon, since again I need to dash right now. Should any bugs surface in the meantime, I'll fix them as soon as possible too.

Comment by Michał Marczyk [ 16/Apr/12 9:32 AM ]

PHM without the transient stuff available at

https://github.com/michalmarczyk/clojurescript/tree/phm2

Will attach patch in a moment.

Comment by Michał Marczyk [ 16/Apr/12 9:35 AM ]

Here's the patch. Applies cleanly onto the current master.

Comment by David Nolen [ 16/Apr/12 9:43 AM ]

Before moving forward with this it would be useful to get a jsperf benchmark comparing performance to ObjMap and HashMap for big/small, access/update, string/non-string keys. Mind putting this together? Once we have that we can pick a strategy that won't slow down existing programs.

Comment by Michał Marczyk [ 16/Apr/12 11:01 AM ]

Here's a first cut at some tests... but I must have something or other mixed up, because the results are crazy. Any hints will be greatly appreciated.

This is the code used by the test cases:

https://github.com/michalmarczyk/cljs-persistent-hash-map-test

And here is the current batch of tests:

http://jsperf.com/cljs-persistent-hash-map-tiny-assoc
http://jsperf.com/cljs-persistent-hash-map-small-assoc
http://jsperf.com/cljs-persistent-hash-map-large-assoc
http://jsperf.com/cljs-persistent-hash-map-access

Comment by Michał Marczyk [ 16/Apr/12 12:12 PM ]

A further pair of test cases for associng non-string keys:

http://jsperf.com/cljs-persistent-hash-map-assoc-non-string
http://jsperf.com/cljs-persistent-hash-map-large-assoc-non-string

Also, to make the earlier comment clearer – the results are crazy, because in Chrome PHM just totally trounces everything, even at small map sizes.

Comment by Michał Marczyk [ 16/Apr/12 1:39 PM ]

I have a new branch based off of the latest PHM code and including transient support for PHM (but not ObjMap – leaving that on the previous phm-transient branch for now).

Here's the branch:

https://github.com/michalmarczyk/clojurescript/tree/phm2-transient

And here's a jsPerf test (with transient/assoc!/persistent! being twice as fast to build a PHM of 10000 entries than assoc in Chrome):

http://jsperf.com/cljs-persistent-hash-map-transient-support

Comment by Michał Marczyk [ 16/Apr/12 1:49 PM ]

Here's the patch (applies cleanly on top of the previous PHM patch).

I've updated the test with slightly modified preparation code (I used a version with cljs.core/hash-map not using transient – no difference in this case, but still, now it's using the actual code from the patch).

Comment by David Nolen [ 16/Apr/12 1:49 PM ]

Please a create a new ticket and put the transient patch there, thanks!

Comment by Michał Marczyk [ 16/Apr/12 1:51 PM ]

Makes sense, will do right now.

Comment by Michał Marczyk [ 16/Apr/12 1:56 PM ]

Done:

http://dev.clojure.org/jira/browse/CLJS-181

Comment by Michał Marczyk [ 16/Apr/12 6:45 PM ]

There is an issue which hasn't occurred to me previously – a lot of code (including TwitterBuzz) will simply break with PHM wired in due to the way in which people use -strobj. Of course nobody ever promised -strobj to work... But a replacement would be nice.

For the sort of uses of -strobj I see at first glance, making js-obj accept varargs like hash-map does (and array!) and construct a JS object with appropriate entries would be an adequate solution; the rest could be expected to use a conversion function.

If breaking -strobj proves to be a major problem, though, I do have code ready (on an older branch previously linked to from here, phm-transient) which causes ObjMaps to be constructed for small maps with string keys (then a PHM is built using a transient when a non-string key is encountered or the OM grows too large).

Comment by Michał Marczyk [ 19/Apr/12 7:18 AM ]

Added a test case for 6 key `assoc` @ http://jsperf.com/cljs-persistent-hash-map-miniscule-assoc. PHM still wins in Chrome and FF1 and loses by a reasonable margin to ObjMap on Iceweasel.

Also, for completeness, here's a 20 key, "mutable is fine" `assoc` / `aset` test with JS object included: http://jsperf.com/cljs-persistent-hash-map-vs-js-object-small, where unsurprisingly JS objects dominate. (Note that transients are a lot faster than PHM for building large maps – see the test linked to in a previous comment, http://jsperf.com/cljs-persistent-hash-map-transient-support.)

Comment by Michał Marczyk [ 19/Apr/12 7:48 AM ]

4 key `assoc`: http://jsperf.com/cljs-persistent-hash-map-minuscule-assoc.
8 key `assoc`: http://jsperf.com/cljs-persistent-hash-map-teeny-assoc.

Same results, basically.

Comment by David Nolen [ 19/Apr/12 8:54 AM ]

This great. For completeness can we compare against defrecord w/ 4 keys? Thanks!

Comment by Michał Marczyk [ 19/Apr/12 9:54 AM ]

Sure, here's a test:

http://jsperf.com/cljs-persistent-hash-map-vs-records-minuscule

Initial test run suggests that a record is significantly faster for {assoc} onto fields and significantly slower if using {_extmap} (NB. here the {_extmap} is itself a PHM).

Comment by David Nolen [ 19/Apr/12 10:41 PM ]

fixed, https://github.com/clojure/clojurescript/commit/ef0d52541b02bcf767f2f54f497c8808c6afa91d

Generated at Sun Sep 21 23:38:27 CDT 2014 using JIRA 4.4#649-r158309.