ClojureScript

PersistentHashMap implementation

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code

Description

0001-PersistentHashMap-ported-from-Clojure.patch contains a port of Clojure's PersistentHashMap.

0002-...-.patch rewires cljs.core/hash-map and cljs.compiler/emit's :map method to use cljs.core.PersistentHashMap. With that in place, the regular cljs test suite runs fine.

As a quick sanity check,

(loop [m cljs.core.PersistentHashMap/EMPTY i 0]
  (if (< i 10000)
    (recur (assoc m i :foo) (inc i))
    (loop [i 0]
      (if (< i 10000)
        (do (assert (= :foo (get m i)))
            (recur (inc i)))))))

runs fine under Rhino. Timing this loop produces the expected results: it takes 1.1s on my machine as written above and 10.5s with cljs.core.HashMap/EMPTY bound to m. With 100000 iterations, PersistentHashMap takes 11.8s, whereas HashMap takes forever.

TODO:

  • iron out any remaining bugs, add in missing protocols (if any) etc.,
  • test HashCollisionNode and related code,
  • use other map types for small maps and switch at some threshold.

See https://groups.google.com/d/topic/clojure-dev/DMcV4QwApuY/discussion.

Activity

Hide
Michał Marczyk added a comment -

Note that I'm developing this code on a branch at

https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map

The branch may be ahead of the patch here (as it is at the time of this writing); I'll update the patch periodically.

The rewiring gets tested on a separate branch:

https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map-tests

Show
Michał Marczyk added a comment - Note that I'm developing this code on a branch at https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map The branch may be ahead of the patch here (as it is at the time of this writing); I'll update the patch periodically. The rewiring gets tested on a separate branch: https://github.com/michalmarczyk/clojurescript/tree/persistent-hash-map-tests
Hide
David Nolen added a comment -

Some notes after a quick look. Don't use js*. We have (make-array n) now.

Don't create a protocol for the node interface. Protocols are nice but introduce overhead. Just add those methods via Object.

Show
David Nolen added a comment - Some notes after a quick look. Don't use js*. We have (make-array n) now. Don't create a protocol for the node interface. Protocols are nice but introduce overhead. Just add those methods via Object.
Hide
Michał Marczyk added a comment -

Thanks! Updated both branches with these changes, plus a new bit-shift-right-zero-fill function in core (with accompanying compiler macro).

Show
Michał Marczyk added a comment - Thanks! Updated both branches with these changes, plus a new bit-shift-right-zero-fill function in core (with accompanying compiler macro).
Hide
David Nolen added a comment -

Please attach new patches here with the changes - thanks.

Show
David Nolen added a comment - Please attach new patches here with the changes - thanks.
Hide
David Nolen added a comment -

I also notice that you're using js* in a couple of other places to inline array constructors. But array is already a compiler macro - again no need for js*.

Show
David Nolen added a comment - I also notice that you're using js* in a couple of other places to inline array constructors. But array is already a compiler macro - again no need for js*.
Hide
Michał Marczyk added a comment -

These patches bring master in line with my persistent-hash-map branch; the "rewiring" are coming in a moment.

Show
Michał Marczyk added a comment - These patches bring master in line with my persistent-hash-map branch; the "rewiring" are coming in a moment.
Hide
Michał Marczyk added a comment -

Ah, actually the original rewiring patch still works.

@David Nolen: Thanks for the review! No js* left in PHM now.

Incidentally, I've run a couple of tests of hash collision handling – seems to work fine so far. Also, the way sets are implemented in cljs, they too are persistent with the PHM impl wired for use in cljs.core/hash-map.

One more thing: PHM needs a function to count bits set in a given (presumably integral) number, which I've introduced under the name of cljs.core/bit-count. It is currently not marked private, but it resides in the PHM section of core.cljs. I think it might be generally useful, so I'm inclined to add a docstring and add it to the bit ops section; otherwise it should probably be marked private.

Show
Michał Marczyk added a comment - Ah, actually the original rewiring patch still works. @David Nolen: Thanks for the review! No js* left in PHM now. Incidentally, I've run a couple of tests of hash collision handling – seems to work fine so far. Also, the way sets are implemented in cljs, they too are persistent with the PHM impl wired for use in cljs.core/hash-map. One more thing: PHM needs a function to count bits set in a given (presumably integral) number, which I've introduced under the name of cljs.core/bit-count. It is currently not marked private, but it resides in the PHM section of core.cljs. I think it might be generally useful, so I'm inclined to add a docstring and add it to the bit ops section; otherwise it should probably be marked private.
Hide
David Nolen added a comment -

Cool. Can you please construct a single patch with all the changes, thanks.

Show
David Nolen added a comment - Cool. Can you please construct a single patch with all the changes, thanks.
Hide
Michał Marczyk added a comment -

@David Nolen: Sure. The new patch includes all changes (in particular it wires up the compiler and hash-map to use PHM) and places bit-count in the bit-ops section (with a docstring).

Show
Michał Marczyk added a comment - @David Nolen: Sure. The new patch includes all changes (in particular it wires up the compiler and hash-map to use PHM) and places bit-count in the bit-ops section (with a docstring).
Hide
David Nolen added a comment -

Please remove any code behind #_. Looking more closely at your code, why are you using atoms? Is the added-leaf field only set once per node? If so why not a ^:mutable field for those deftypes.

Show
David Nolen added a comment - Please remove any code behind #_. Looking more closely at your code, why are you using atoms? Is the added-leaf field only set once per node? If so why not a ^:mutable field for those deftypes.
Hide
Michał Marczyk added a comment -

added-leaf? is set at most once per an `assoc` to a PHM to signal that the new PHM needs to have an incremented count in relation to the original. Clojure's impl uses mutable Boxes for this purpose. Looking at this again, I guess a singleton array would make more sense than an atom; I'll make the change.

Show
Michał Marczyk added a comment - added-leaf? is set at most once per an `assoc` to a PHM to signal that the new PHM needs to have an incremented count in relation to the original. Clojure's impl uses mutable Boxes for this purpose. Looking at this again, I guess a singleton array would make more sense than an atom; I'll make the change.
Hide
David Nolen added a comment -

Sounds good.

Show
David Nolen added a comment - Sounds good.
Hide
Michał Marczyk added a comment -

All in one.

Show
Michał Marczyk added a comment - All in one.
Hide
David Nolen added a comment -

Looks good! I ran some quick perf tests - access times on small maps vs. ObjMap is actually a significant performance hit. I'll review the patch more thoroughly. I do note that update performance on V8 is pretty stellar.

Fantastic work! I think we're getting very close.

Show
David Nolen added a comment - Looks good! I ran some quick perf tests - access times on small maps vs. ObjMap is actually a significant performance hit. I'll review the patch more thoroughly. I do note that update performance on V8 is pretty stellar. Fantastic work! I think we're getting very close.
Hide
Michał Marczyk added a comment -

I've got an updated version on a branch here:

https://github.com/michalmarczyk/clojurescript/tree/phm-wip-2

Unfortunately I've got to dash right now, so I'll post a new monolithic patch sometime later. I'm closing in on a transient hash map implementation too – it should be ready sometime very soon now.

Many thanks for the review and all the helpful tips, David! The latest branch incorporates some of the changes you suggested, with others due to land next time I'm at the keyboard.

Show
Michał Marczyk added a comment - I've got an updated version on a branch here: https://github.com/michalmarczyk/clojurescript/tree/phm-wip-2 Unfortunately I've got to dash right now, so I'll post a new monolithic patch sometime later. I'm closing in on a transient hash map implementation too – it should be ready sometime very soon now. Many thanks for the review and all the helpful tips, David! The latest branch incorporates some of the changes you suggested, with others due to land next time I'm at the keyboard.
Hide
Michał Marczyk added a comment -

The latest changes are available in two branches on my fork, phm and phm-transient.

phm is just the old monolithic commit. phm-transient, on the other hand, introduces TransientHashMap, TransientObjMap, various performance-related fixes (as per David's suggestions; one thing I haven't got around to yet is the alternative bit-count impl – will do that soon) and I believe even a bug fix or two. It's available for viewing at

https://github.com/michalmarczyk/clojurescript/tree/phm-transient

I'll post the patch here in a moment. I'll also rip it apart into single-concern commits if appropriate – I'm afraid I got into the flow on this during a longish train ride, so it's another monolithic commit for now.

Show
Michał Marczyk added a comment - The latest changes are available in two branches on my fork, phm and phm-transient. phm is just the old monolithic commit. phm-transient, on the other hand, introduces TransientHashMap, TransientObjMap, various performance-related fixes (as per David's suggestions; one thing I haven't got around to yet is the alternative bit-count impl – will do that soon) and I believe even a bug fix or two. It's available for viewing at https://github.com/michalmarczyk/clojurescript/tree/phm-transient I'll post the patch here in a moment. I'll also rip it apart into single-concern commits if appropriate – I'm afraid I got into the flow on this during a longish train ride, so it's another monolithic commit for now.
Hide
Michał Marczyk added a comment -

Meant to be applied on top of the previous monolithic patch.

Show
Michał Marczyk added a comment - Meant to be applied on top of the previous monolithic patch.
Hide
David Nolen added a comment -

Yes please move the transient work into a separate patch, thanks!

Show
David Nolen added a comment - Yes please move the transient work into a separate patch, thanks!
Hide
Michał Marczyk added a comment -

Ok, no problem. Will do that in the next batch sometime soon, since again I need to dash right now. Should any bugs surface in the meantime, I'll fix them as soon as possible too.

Show
Michał Marczyk added a comment - Ok, no problem. Will do that in the next batch sometime soon, since again I need to dash right now. Should any bugs surface in the meantime, I'll fix them as soon as possible too.
Hide
Michał Marczyk added a comment -

PHM without the transient stuff available at

https://github.com/michalmarczyk/clojurescript/tree/phm2

Will attach patch in a moment.

Show
Michał Marczyk added a comment - PHM without the transient stuff available at https://github.com/michalmarczyk/clojurescript/tree/phm2 Will attach patch in a moment.
Hide
Michał Marczyk added a comment -

Here's the patch. Applies cleanly onto the current master.

Show
Michał Marczyk added a comment - Here's the patch. Applies cleanly onto the current master.
Hide
David Nolen added a comment - - edited

Before moving forward with this it would be useful to get a jsperf benchmark comparing performance to ObjMap and HashMap for big/small, access/update, string/non-string keys. Mind putting this together? Once we have that we can pick a strategy that won't slow down existing programs.

Show
David Nolen added a comment - - edited Before moving forward with this it would be useful to get a jsperf benchmark comparing performance to ObjMap and HashMap for big/small, access/update, string/non-string keys. Mind putting this together? Once we have that we can pick a strategy that won't slow down existing programs.
Hide
Michał Marczyk added a comment -

Here's a first cut at some tests... but I must have something or other mixed up, because the results are crazy. Any hints will be greatly appreciated.

This is the code used by the test cases:

https://github.com/michalmarczyk/cljs-persistent-hash-map-test

And here is the current batch of tests:

http://jsperf.com/cljs-persistent-hash-map-tiny-assoc
http://jsperf.com/cljs-persistent-hash-map-small-assoc
http://jsperf.com/cljs-persistent-hash-map-large-assoc
http://jsperf.com/cljs-persistent-hash-map-access

Show
Michał Marczyk added a comment - Here's a first cut at some tests... but I must have something or other mixed up, because the results are crazy. Any hints will be greatly appreciated. This is the code used by the test cases: https://github.com/michalmarczyk/cljs-persistent-hash-map-test And here is the current batch of tests: http://jsperf.com/cljs-persistent-hash-map-tiny-assoc http://jsperf.com/cljs-persistent-hash-map-small-assoc http://jsperf.com/cljs-persistent-hash-map-large-assoc http://jsperf.com/cljs-persistent-hash-map-access
Hide
Michał Marczyk added a comment -

A further pair of test cases for associng non-string keys:

http://jsperf.com/cljs-persistent-hash-map-assoc-non-string
http://jsperf.com/cljs-persistent-hash-map-large-assoc-non-string

Also, to make the earlier comment clearer – the results are crazy, because in Chrome PHM just totally trounces everything, even at small map sizes.

Show
Michał Marczyk added a comment - A further pair of test cases for associng non-string keys: http://jsperf.com/cljs-persistent-hash-map-assoc-non-string http://jsperf.com/cljs-persistent-hash-map-large-assoc-non-string Also, to make the earlier comment clearer – the results are crazy, because in Chrome PHM just totally trounces everything, even at small map sizes.
Hide
Michał Marczyk added a comment -

I have a new branch based off of the latest PHM code and including transient support for PHM (but not ObjMap – leaving that on the previous phm-transient branch for now).

Here's the branch:

https://github.com/michalmarczyk/clojurescript/tree/phm2-transient

And here's a jsPerf test (with transient/assoc!/persistent! being twice as fast to build a PHM of 10000 entries than assoc in Chrome):

http://jsperf.com/cljs-persistent-hash-map-transient-support

Show
Michał Marczyk added a comment - I have a new branch based off of the latest PHM code and including transient support for PHM (but not ObjMap – leaving that on the previous phm-transient branch for now). Here's the branch: https://github.com/michalmarczyk/clojurescript/tree/phm2-transient And here's a jsPerf test (with transient/assoc!/persistent! being twice as fast to build a PHM of 10000 entries than assoc in Chrome): http://jsperf.com/cljs-persistent-hash-map-transient-support
Hide
Michał Marczyk added a comment -

Here's the patch (applies cleanly on top of the previous PHM patch).

I've updated the test with slightly modified preparation code (I used a version with cljs.core/hash-map not using transient – no difference in this case, but still, now it's using the actual code from the patch).

Show
Michał Marczyk added a comment - Here's the patch (applies cleanly on top of the previous PHM patch). I've updated the test with slightly modified preparation code (I used a version with cljs.core/hash-map not using transient – no difference in this case, but still, now it's using the actual code from the patch).
Hide
David Nolen added a comment -

Please a create a new ticket and put the transient patch there, thanks!

Show
David Nolen added a comment - Please a create a new ticket and put the transient patch there, thanks!
Hide
Michał Marczyk added a comment -

Makes sense, will do right now.

Show
Michał Marczyk added a comment - Makes sense, will do right now.
Show
Michał Marczyk added a comment - Done: http://dev.clojure.org/jira/browse/CLJS-181
Hide
Michał Marczyk added a comment -

There is an issue which hasn't occurred to me previously – a lot of code (including TwitterBuzz) will simply break with PHM wired in due to the way in which people use -strobj. Of course nobody ever promised -strobj to work... But a replacement would be nice.

For the sort of uses of -strobj I see at first glance, making js-obj accept varargs like hash-map does (and array!) and construct a JS object with appropriate entries would be an adequate solution; the rest could be expected to use a conversion function.

If breaking -strobj proves to be a major problem, though, I do have code ready (on an older branch previously linked to from here, phm-transient) which causes ObjMaps to be constructed for small maps with string keys (then a PHM is built using a transient when a non-string key is encountered or the OM grows too large).

Show
Michał Marczyk added a comment - There is an issue which hasn't occurred to me previously – a lot of code (including TwitterBuzz) will simply break with PHM wired in due to the way in which people use -strobj. Of course nobody ever promised -strobj to work... But a replacement would be nice. For the sort of uses of -strobj I see at first glance, making js-obj accept varargs like hash-map does (and array!) and construct a JS object with appropriate entries would be an adequate solution; the rest could be expected to use a conversion function. If breaking -strobj proves to be a major problem, though, I do have code ready (on an older branch previously linked to from here, phm-transient) which causes ObjMaps to be constructed for small maps with string keys (then a PHM is built using a transient when a non-string key is encountered or the OM grows too large).
Hide
Michał Marczyk added a comment -

Added a test case for 6 key `assoc` @ http://jsperf.com/cljs-persistent-hash-map-miniscule-assoc. PHM still wins in Chrome and FF1 and loses by a reasonable margin to ObjMap on Iceweasel.

Also, for completeness, here's a 20 key, "mutable is fine" `assoc` / `aset` test with JS object included: http://jsperf.com/cljs-persistent-hash-map-vs-js-object-small, where unsurprisingly JS objects dominate. (Note that transients are a lot faster than PHM for building large maps – see the test linked to in a previous comment, http://jsperf.com/cljs-persistent-hash-map-transient-support.)

Show
Michał Marczyk added a comment - Added a test case for 6 key `assoc` @ http://jsperf.com/cljs-persistent-hash-map-miniscule-assoc. PHM still wins in Chrome and FF1 and loses by a reasonable margin to ObjMap on Iceweasel. Also, for completeness, here's a 20 key, "mutable is fine" `assoc` / `aset` test with JS object included: http://jsperf.com/cljs-persistent-hash-map-vs-js-object-small, where unsurprisingly JS objects dominate. (Note that transients are a lot faster than PHM for building large maps – see the test linked to in a previous comment, http://jsperf.com/cljs-persistent-hash-map-transient-support.)
Hide
Michał Marczyk added a comment -
Show
Michał Marczyk added a comment - 4 key `assoc`: http://jsperf.com/cljs-persistent-hash-map-minuscule-assoc. 8 key `assoc`: http://jsperf.com/cljs-persistent-hash-map-teeny-assoc. Same results, basically.
Hide
David Nolen added a comment -

This great. For completeness can we compare against defrecord w/ 4 keys? Thanks!

Show
David Nolen added a comment - This great. For completeness can we compare against defrecord w/ 4 keys? Thanks!
Hide
Michał Marczyk added a comment -

Sure, here's a test:

http://jsperf.com/cljs-persistent-hash-map-vs-records-minuscule

Initial test run suggests that a record is significantly faster for {assoc} onto fields and significantly slower if using {_extmap} (NB. here the {_extmap} is itself a PHM).

Show
Michał Marczyk added a comment - Sure, here's a test: http://jsperf.com/cljs-persistent-hash-map-vs-records-minuscule Initial test run suggests that a record is significantly faster for {assoc} onto fields and significantly slower if using {_extmap} (NB. here the {_extmap} is itself a PHM).

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: