Clojure

Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?`

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Triaged

Description

set and vec currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call set/vec on an input to ensure its type, this seems like an easy performance win in a common case.

Proposed: Check for set? in set and vec? in vec and return the coll as is if already of the requested type.

Performance:

See attached clj1410-bench.txt for test details :

Input/size Function Original Patched Comment
set/10 set 1.452 µs 0.002 µs 726x faster
set/1000 set 248.842 µs 0.006 µs 41473x faster
vector/10 set 1.288 µs 1.323 µs slightly slower
vector/1000 set 222.992 µs 221.116 µs ~same
set/10 vec 0.614 µs 0.592 µs ~same
set/1000 vec 56.876 µs 55.920 µs ~same
vector/10 vec 0.252 µs 0.007 µs 36x faster
vector/1000 vec 24.428 µs 0.007 µs 3500x faster

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

Screened by:

More info:

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

  1. benchmarks.clj
    26/Apr/14 11:39 AM
    1 kB
    Peter Taoussanis
  2. CLJ-1410.patch
    26/Apr/14 10:52 AM
    2 kB
    Peter Taoussanis
  3. clj1410-bench.txt
    05/May/14 10:21 AM
    0.7 kB
    Alex Miller

Activity

Peter Taoussanis made changes -
Field Original Value New Value
Description Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vector` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vector` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vector?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
Attachment CLJ-1410-vec.patch [ 12958 ]
Attachment CLJ-1410-set.patch [ 12957 ]
Peter Taoussanis made changes -
Summary Optimization: allow `set`/`vector` to pass through colls that satisfy `set?`/`vector?` Optimization: allow `set`/`vec` to pass through colls that satisfy `set?`/`vector?`
Alex Miller made changes -
Description Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY

`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

**Open question**
Would it be better to pass-through arguments that satisfy the general (`set?`,`vec?`) or concrete (`PersistentHashSet`,`PersistentVector`) type?
`set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for set? and vec? and return the coll as is if already of the requested type.

*Performance:*

- TODO: demonstrate perf difference here

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Alex Miller made changes -
Patch Code [ 10001 ]
Approval Triaged [ 10120 ]
Peter Taoussanis made changes -
Attachment CLJ-1410-set.patch [ 12957 ]
Peter Taoussanis made changes -
Attachment CLJ-1410-vec.patch [ 12958 ]
Peter Taoussanis made changes -
Attachment CLJ-1410.patch [ 12959 ]
Peter Taoussanis made changes -
Attachment benchmarks.clj [ 12960 ]
Alex Miller made changes -
Description `set` and `vec` currently reconstruct their inputs even when they are already of the relevant type. Since it's a pretty common pattern to call `set`/`vec` on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for set? and vec? and return the coll as is if already of the requested type.

*Performance:*

- TODO: demonstrate perf difference here

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
{{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

{{set}}:

|*Size*|*Original (10k reps)*|*Input {{not set?}}*|*Input is {{set?}}*|
|10|30 ms|30 ms|1.4 ms|
|1000|4600 ms|4600 ms|1.9 ms|

{{vec}}:

|*Size*|*Original (10k reps)*|*Input {{not vec?}}*|*Input is {{vec?}}*|
|10|4.5 ms|6 ms|1.4 ms|
|1000|370 ms|380 ms|1.8 ms|

Both {{vec}} and {{set}} now have an additional instance? check so they are slightly slower when not passed a vector or a set, but are O(n) faster when they are passed the requested type.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Alex Miller made changes -
Description {{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

{{set}}:

|*Size*|*Original (10k reps)*|*Input {{not set?}}*|*Input is {{set?}}*|
|10|30 ms|30 ms|1.4 ms|
|1000|4600 ms|4600 ms|1.9 ms|

{{vec}}:

|*Size*|*Original (10k reps)*|*Input {{not vec?}}*|*Input is {{vec?}}*|
|10|4.5 ms|6 ms|1.4 ms|
|1000|370 ms|380 ms|1.8 ms|

Both {{vec}} and {{set}} now have an additional instance? check so they are slightly slower when not passed a vector or a set, but are O(n) faster when they are passed the requested type.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
{{set}} and {{vec}} currently reconstruct their inputs even when they are already of the requested type. Since it's a pretty common pattern to call {{set}}/{{vec}} on an input to ensure its type, this seems like an easy performance win in a common case.

*Proposed:* Check for {{set?}} in {{set}} and {{vec?}} in {{vec}} and return the coll as is if already of the requested type.

*Performance:*

See attached clj1410-bench.txt for test details :

|*Input/size*|*Function*|*Original*|*Patched*|*Comment*|
|set/10|{{set}}|1.452 µs|0.002 µs|726x faster|
|set/1000|{{set}}|248.842 µs|0.006 µs|41473x faster|
|vector/10|{{set}}|1.288 µs|*1.323 µs*|slightly slower|
|vector/1000|{{set}}|222.992 µs|221.116 µs|~same|
|set/10|{{vec}}|0.614 µs|0.592 µs|~same|
|set/1000|{{vec}}|56.876 µs|55.920 µs|~same|
|vector/10|{{vec}}|0.252 µs|0.007 µs|36x faster|
|vector/1000|{{vec}}|24.428 µs|0.007 µs|3500x faster|

As expected, if an instance of the correct type is passed, then the difference is large (with bigger savings for sets which do more work for dupe checking). In cases where the type is different, there is an extra instance? check (which looks to be jit'ed away or negligible). We only see a slower time in the case of passing a small vector to the set function - 3% slower (35 ns). The benefit seems greater than the cost for this change.

*Screened by:*

*More info:*

Group discussion: https://groups.google.com/forum/#!topic/clojure-dev/fg4wtqzu0eY
Attachment clj1410-bench.txt [ 12972 ]

People

Vote (1)
Watch (2)

Dates

  • Created:
    Updated: