Clojure

Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?`

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Triaged

Description

set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.

Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.

Performance:

See attached clj1410-bench.txt for test details :

Input/size Function Original Patched Comment
set/10 set 1.452 µs 0.002 µs 726x faster
set/1000 set 248.842 µs 0.006 µs 41473x faster
vector/10 set 1.288 µs 1.323 µs slightly slower
vector/1000 set 222.992 µs 221.116 µs ~same
set/10 vec 0.614 µs 0.592 µs ~same
set/1000 vec 56.876 µs 55.920 µs ~same
vector/10 vec 0.252 µs 0.007 µs 36x faster
vector/1000 vec 24.428 µs 0.007 µs 3500x faster

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

Screened by:

More info:

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

  1. benchmarks.clj
    26/Apr/14 11:39 AM
    1 kB
    Peter Taoussanis
  2. CLJ-1410.patch
    26/Apr/14 10:52 AM
    2 kB
    Peter Taoussanis
  3. clj1410-bench.txt
    05/May/14 10:21 AM
    0.7 kB
    Alex Miller

Activity

Peter Taoussanis made changes -
Field Original Value New Value
Description Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vector` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vector` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vector?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
Attachment CLJ-1410-vec.patch [ 12958 ]
Attachment CLJ-1410-set.patch [ 12957 ]
Peter Taoussanis made changes -
Summary Optimization: allow `set`/`vector` to pass through colls that satisfy `set?`/`vector?` Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?`
Hide
Alex Miller added a comment -

Re:

*Open question*
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?

I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.

Show
Alex Miller added a comment - Re: *Open question* Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type? I don't think there is any question that relying on abstractions via set?/vec? is better than referring to concrete types.
Alex Miller made changes -
Description Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for set? and vec? and return the coll as is if already of the requested type.

*Performance:*

- TODO: demonstrate perf difference here

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Hide
Alex Miller added a comment -

Please add perf difference info in the description. Please also combine the patches into a single patch.

Show
Alex Miller added a comment - Please add perf difference info in the description. Please also combine the patches into a single patch.
Alex Miller made changes -
Patch Code [ 10001 ]
Approval Triaged [ 10120 ]
Peter Taoussanis made changes -
Attachment CLJ-1410-set.patch [ 12957 ]
Peter Taoussanis made changes -
Attachment CLJ-1410-vec.patch [ 12958 ]
Hide
Peter Taoussanis added a comment -

Combined earlier patches, removed docstring changes.

Show
Peter Taoussanis added a comment - Combined earlier patches, removed docstring changes.
Peter Taoussanis made changes -
Attachment CLJ-1410.patch [ 12959 ]
Hide
Peter Taoussanis added a comment -

Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup.

Google Doc times: http://goo.gl/W7EACR

The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible.

The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.

Show
Peter Taoussanis added a comment - Attached some simple benchmarks. These were run with HotSpot enabled, after a 100k lap warmup. Google Doc times: http://goo.gl/W7EACR The `set` benefit can be substantial, and the overhead in non-benefitial cases is negligible. The effect on `vec` is subtler: the benefit is relatively smaller and the overhead relatively larger.
Peter Taoussanis made changes -
Attachment benchmarks.clj [ 12960 ]
Hide
Reid McKenzie added a comment -

Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.

Show
Reid McKenzie added a comment - Patch looks good to me, and I've reproduced the claimed low "worst case" overhead and significant potential savings numbers to within 1ms. +1.
Alex Miller made changes -
Description `set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for set? and vec? and return the coll as is if already of the requested type.

*Performance:*

- TODO: demonstrate perf difference here

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
{{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

{{set}}:

|*Size*|*Original (10k reps)*|*Input {{not set?}}*|*Input is {{set?}}*|
|10|30 ms|30 ms|1.4 ms|
|1000|4600 ms|4600 ms|1.9 ms|

{{vec}}:

|*Size*|*Original (10k reps)*|*Input {{not vec?}}*|*Input is {{vec?}}*|
|10|4.5 ms|6 ms|1.4 ms|
|1000|370 ms|380 ms|1.8 ms|

Both {{vec}} and {{set}} now have an additional instance? check so they are slightly slower when not passed a vector or a set, but are O(n) faster when they are passed the requested type.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Alex Miller made changes -
Description {{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

{{set}}:

|*Size*|*Original (10k reps)*|*Input {{not set?}}*|*Input is {{set?}}*|
|10|30 ms|30 ms|1.4 ms|
|1000|4600 ms|4600 ms|1.9 ms|

{{vec}}:

|*Size*|*Original (10k reps)*|*Input {{not vec?}}*|*Input is {{vec?}}*|
|10|4.5 ms|6 ms|1.4 ms|
|1000|370 ms|380 ms|1.8 ms|

Both {{vec}} and {{set}} now have an additional instance? check so they are slightly slower when not passed a vector or a set, but are O(n) faster when they are passed the requested type.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
{{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

See attached clj1410-bench.txt for test details :

|*Input/size*|*Function*|*Original*|*Patched*|*Comment*|
|set/10|{{set}}|1.452 µs|0.002 µs|726x faster|
|set/1000|{{set}}|248.842 µs|0.006 µs|41473x faster|
|vector/10|{{set}}|1.288 µs|*1.323 µs*|slightly slower|
|vector/1000|{{set}}|222.992 µs|221.116 µs|~same|
|set/10|{{vec}}|0.614 µs|0.592 µs|~same|
|set/1000|{{vec}}|56.876 µs|55.920 µs|~same|
|vector/10|{{vec}}|0.252 µs|0.007 µs|36x faster|
|vector/1000|{{vec}}|24.428 µs|0.007 µs|3500x faster|

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Attachment clj1410-bench.txt [ 12972 ]
Hide
Alex Miller added a comment -

I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.

Show
Alex Miller added a comment - I added a more extensive set of tests performed using Criterium which should give better insight into single operation performance.

People

Vote (2)
Watch (3)

Dates

  • Created:
    Updated: