<< Back to previous view

[CLJ-1410] Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?` Created: 26/Apr/14  Updated: 05/May/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Peter Taoussanis Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: performance

Attachments: File benchmarks.clj     Text File clj1410-bench.txt     Text File CLJ-1410.patch    
Patch: Code
Approval: Triaged

 Description   

set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.

Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.

Performance:

See attached clj1410-bench.txt for test details :

Input/size Function Original Patched Comment
set/10 set 1.452 µs 0.002 µs 726x faster
set/1000 set 248.842 µs 0.006 µs 41473x faster
vector/10 set 1.288 µs 1.323 µs slightly slower
vector/1000 set 222.992 µs 221.116 µs ~same
set/10 vec 0.614 µs 0.592 µs ~same
set/1000 vec 56.876 µs 55.920 µs ~same
vector/10 vec 0.252 µs 0.007 µs 36x faster
vector/1000 vec 24.428 µs 0.007 µs 3500x faster

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

Screened by:

More info:

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY



 Comments   
Comment by Alex Miller [ 26/Apr/14 10:18 AM ]

Re:

*Open question*
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?

I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.

Comment by Alex Miller [ 26/Apr/14 10:20 AM ]

Please add perf difference info in the description. Please also combine the patches into a single patch.

Comment by Peter Taoussanis [ 26/Apr/14 10:52 AM ]

Combined earlier patches, removed docstring changes.

Comment by Peter Taoussanis [ 26/Apr/14 11:39 AM ]

Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup.

Google Doc times: http://goo.gl/W7EACR

The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible.

The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.

Comment by Reid McKenzie [ 04/May/14 12:01 PM ]

Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.

Comment by Alex Miller [ 05/May/14 10:21 AM ]

I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.

Generated at Sat Dec 20 21:28:31 CST 2014 using JIRA 4.4#649-r158309.