[CLJ-1410] Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?` Created: 26/Apr/14 Updated: 05/May/14
|Attachments:||benchmarks.clj clj1410-bench.txt CLJ-1410.patch|
set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.
Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.
See attached clj1410-bench.txt for test details :
As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.
Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
|Comment by Alex Miller [ 26/Apr/14 10:18 AM ]|
I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.
|Comment by Alex Miller [ 26/Apr/14 10:20 AM ]|
Please add perf difference info in the description. Please also combine the patches into a single patch.
|Comment by Peter Taoussanis [ 26/Apr/14 10:52 AM ]|
Combined earlier patches, removed docstring changes.
|Comment by Peter Taoussanis [ 26/Apr/14 11:39 AM ]|
Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup.
Google Doc times: http://goo.gl/W7EACR
The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible.
The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.
|Comment by Reid McKenzie [ 04/May/14 12:01 PM ]|
Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.
|Comment by Alex Miller [ 05/May/14 10:21 AM ]|
I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.