Clojure

Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?`

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Triaged

Description

set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.

Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.

Performance:

See attached clj1410-bench.txt for test details :

Input/size Function Original Patched Comment
set/10 set 1.452 µs 0.002 µs 726x faster
set/1000 set 248.842 µs 0.006 µs 41473x faster
vector/10 set 1.288 µs 1.323 µs slightly slower
vector/1000 set 222.992 µs 221.116 µs ~same
set/10 vec 0.614 µs 0.592 µs ~same
set/1000 vec 56.876 µs 55.920 µs ~same
vector/10 vec 0.252 µs 0.007 µs 36x faster
vector/1000 vec 24.428 µs 0.007 µs 3500x faster

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

Screened by:

More info:

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

  1. benchmarks.clj
    26/Apr/14 11:39 AM
    1 kB
    Peter Taoussanis
  2. CLJ-1410.patch
    26/Apr/14 10:52 AM
    2 kB
    Peter Taoussanis
  3. clj1410-bench.txt
    05/May/14 10:21 AM
    0.7 kB
    Alex Miller

Activity

Hide
Alex Miller added a comment -

Re:

*Open question*
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?

I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.

Show
Alex Miller added a comment - Re: *Open question* Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type? I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.
Hide
Alex Miller added a comment -

Please add perf difference info in the description. Please also combine the patches into a single patch.

Show
Alex Miller added a comment - Please add perf difference info in the description. Please also combine the patches into a single patch.
Hide
Peter Taoussanis added a comment -

Combined earlier patches, removed docstring changes.

Show
Peter Taoussanis added a comment - Combined earlier patches, removed docstring changes.
Hide
Peter Taoussanis added a comment -

Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup.

Google Doc times: http://goo.gl/W7EACR

The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible.

The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.

Show
Peter Taoussanis added a comment - Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup. Google Doc times: http://goo.gl/W7EACR The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible. The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.
Hide
Reid McKenzie added a comment -

Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.

Show
Reid McKenzie added a comment - Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.
Hide
Alex Miller added a comment -

I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.

Show
Alex Miller added a comment - I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.

People

Vote (1)
Watch (2)

Dates

  • Created:
    Updated: