Clojure

transducer arities for map-indexed, distinct, and interpose

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

  • with generative tests
  • with examples demonstrating performance

Performance: Details in comments, summary:

(def v (vec (concat (range 1000) (range 1000))))
(into [] (distinct v))            ;; 821.3 µs
(into [] (distinct) v)            ;; 388.2 µs
(into [] (interpose nil v))       ;; 316.0 µs
(into [] (interpose nil) v)       ;; 35.5 µs
(into [] (map-indexed vector v))  ;; 76.8 µs
(into [] (map-indexed vector) v)  ;; 49.4 µs

Patch: clj-1601-4.patch

Screening note: We could use transients to improve performance of the distinct impl, except checking containment in a transient set is broken per CLJ-700 (which is not currently in 1.7). I have a new patch and direction on CLJ-700 that could provide a way to solve that if we want to move it back and push this further. Or we could just wait and refactor when CLJ-700 does go in.

Screened by:

  1. clj-1601.patch
    25/Nov/14 4:22 PM
    5 kB
    Alex Miller
  2. clj-1601-2.patch
    25/Nov/14 4:59 PM
    8 kB
    Alex Miller
  3. clj-1601-3.patch
    17/Dec/14 1:50 PM
    8 kB
    Alex Miller
  4. clj-1601-4.patch
    09/Jan/15 10:27 AM
    8 kB
    Alex Miller
  5. clj-1601-transient-distinct.patch
    21/Dec/14 6:13 AM
    2 kB
    Nikita Prokopov

Activity

Hide
Alex Miller added a comment -

working on this

Show
Alex Miller added a comment - working on this
Hide
Alex Miller added a comment -

Initial patch with impls. Tests and perf still to do.

Show
Alex Miller added a comment - Initial patch with impls. Tests and perf still to do.
Hide
Alex Miller added a comment -

Perf tests, summarized in description:

user=> (use 'criterium.core)
nil
user=> (def v (vec (concat (range 1000) (range 1000))))
#'user/v
user=> (quick-bench (into [] (distinct v)))
WARNING: Final GC required 10.433088780213309 % of runtime
Evaluation count : 744 in 6 samples of 124 calls.
             Execution time mean : 821.339608 µs
    Execution time std-deviation : 11.351053 µs
   Execution time lower quantile : 811.901435 µs ( 2.5%)
   Execution time upper quantile : 837.972000 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (distinct) v))
WARNING: Final GC required 10.78492057474076 % of runtime
Evaluation count : 14028 in 6 samples of 2338 calls.
             Execution time mean : 43.630656 µs
    Execution time std-deviation : 170.185825 ns
   Execution time lower quantile : 43.433193 µs ( 2.5%)
   Execution time upper quantile : 43.853959 µs (97.5%)
                   Overhead used : 1.794010 ns
				   
user=> (quick-bench (into [] (interpose nil v)))
WARNING: Final GC required 10.79555726490133 % of runtime
Evaluation count : 1914 in 6 samples of 319 calls.
             Execution time mean : 316.024853 µs
    Execution time std-deviation : 9.077484 µs
   Execution time lower quantile : 310.139273 µs ( 2.5%)
   Execution time upper quantile : 330.917486 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user=> (quick-bench (into [] (interpose nil) v))
WARNING: Final GC required 10.70401297525592 % of runtime
Evaluation count : 17022 in 6 samples of 2837 calls.
             Execution time mean : 35.592672 µs
    Execution time std-deviation : 560.066138 ns
   Execution time lower quantile : 35.252348 µs ( 2.5%)
   Execution time upper quantile : 36.553414 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

user=> (quick-bench (into [] (map-indexed vector v)))
WARNING: Final GC required 12.45755646853723 % of runtime
Evaluation count : 7338 in 6 samples of 1223 calls.
             Execution time mean : 76.807691 µs
    Execution time std-deviation : 381.019170 ns
   Execution time lower quantile : 76.433202 µs ( 2.5%)
   Execution time upper quantile : 77.170733 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (map-indexed vector) v))
WARNING: Final GC required 11.38700971837483 % of runtime
Evaluation count : 12474 in 6 samples of 2079 calls.
             Execution time mean : 49.458043 µs
    Execution time std-deviation : 620.716737 ns
   Execution time lower quantile : 48.995801 µs ( 2.5%)
   Execution time upper quantile : 50.229507 µs (97.5%)
                   Overhead used : 1.794010 ns
Show
Alex Miller added a comment - Perf tests, summarized in description:
user=> (use 'criterium.core)
nil
user=> (def v (vec (concat (range 1000) (range 1000))))
#'user/v
user=> (quick-bench (into [] (distinct v)))
WARNING: Final GC required 10.433088780213309 % of runtime
Evaluation count : 744 in 6 samples of 124 calls.
             Execution time mean : 821.339608 µs
    Execution time std-deviation : 11.351053 µs
   Execution time lower quantile : 811.901435 µs ( 2.5%)
   Execution time upper quantile : 837.972000 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (distinct) v))
WARNING: Final GC required 10.78492057474076 % of runtime
Evaluation count : 14028 in 6 samples of 2338 calls.
             Execution time mean : 43.630656 µs
    Execution time std-deviation : 170.185825 ns
   Execution time lower quantile : 43.433193 µs ( 2.5%)
   Execution time upper quantile : 43.853959 µs (97.5%)
                   Overhead used : 1.794010 ns
				   
user=> (quick-bench (into [] (interpose nil v)))
WARNING: Final GC required 10.79555726490133 % of runtime
Evaluation count : 1914 in 6 samples of 319 calls.
             Execution time mean : 316.024853 µs
    Execution time std-deviation : 9.077484 µs
   Execution time lower quantile : 310.139273 µs ( 2.5%)
   Execution time upper quantile : 330.917486 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user=> (quick-bench (into [] (interpose nil) v))
WARNING: Final GC required 10.70401297525592 % of runtime
Evaluation count : 17022 in 6 samples of 2837 calls.
             Execution time mean : 35.592672 µs
    Execution time std-deviation : 560.066138 ns
   Execution time lower quantile : 35.252348 µs ( 2.5%)
   Execution time upper quantile : 36.553414 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

user=> (quick-bench (into [] (map-indexed vector v)))
WARNING: Final GC required 12.45755646853723 % of runtime
Evaluation count : 7338 in 6 samples of 1223 calls.
             Execution time mean : 76.807691 µs
    Execution time std-deviation : 381.019170 ns
   Execution time lower quantile : 76.433202 µs ( 2.5%)
   Execution time upper quantile : 77.170733 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (map-indexed vector) v))
WARNING: Final GC required 11.38700971837483 % of runtime
Evaluation count : 12474 in 6 samples of 2079 calls.
             Execution time mean : 49.458043 µs
    Execution time std-deviation : 620.716737 ns
   Execution time lower quantile : 48.995801 µs ( 2.5%)
   Execution time upper quantile : 50.229507 µs (97.5%)
                   Overhead used : 1.794010 ns
Hide
Alex Miller added a comment -

Updated based on comment from Christophe Grand that java.util.HashSet used in distinct impl had different hash/equality semantics than the set used with sequences.

Show
Alex Miller added a comment - Updated based on comment from Christophe Grand that java.util.HashSet used in distinct impl had different hash/equality semantics than the set used with sequences.
Hide
Nikita Prokopov added a comment - - edited

This can be further improved by using transient set instead of persistent one in distinct:

distinct with persistent set, w/o transducers:  904.932406 µs
distinct with transient set,  w/o transducers:  755.338598 µs
distinct with persistent set, with transducers: 452.170600 µs
distinct with transient set,  with transducers: 293.258473 µs

Only caveat is that transient sets do not support contains? for now (see CLJ-700). This can be solved by using (.contains ^clojure.lang.ITransientSet set key)

I’m not sure what’s the best way to attach patch to this, for now attaching a patch that can be applied on top of Alex changes (clj-1601-transient-distinct.patch).

Show
Nikita Prokopov added a comment - - edited This can be further improved by using transient set instead of persistent one in distinct:
distinct with persistent set, w/o transducers:  904.932406 µs
distinct with transient set,  w/o transducers:  755.338598 µs
distinct with persistent set, with transducers: 452.170600 µs
distinct with transient set,  with transducers: 293.258473 µs
Only caveat is that transient sets do not support contains? for now (see CLJ-700). This can be solved by using (.contains ^clojure.lang.ITransientSet set key) I’m not sure what’s the best way to attach patch to this, for now attaching a patch that can be applied on top of Alex changes (clj-1601-transient-distinct.patch).
Hide
Alex Miller added a comment -

Hey Nikita, I'd rather fix CLJ-700 and use the normal functions rather than what you've done in the patch, which is why I hadn't done this before. I'm waiting to check with Rich whether we'll do that in 1.7 or wait till next release.

Show
Alex Miller added a comment - Hey Nikita, I'd rather fix CLJ-700 and use the normal functions rather than what you've done in the patch, which is why I hadn't done this before. I'm waiting to check with Rich whether we'll do that in 1.7 or wait till next release.
Hide
Michael Blume added a comment -

1601-3 no longer applies cleanly to master, I've got a reroll that does, is it ok to attach it even though the ticket is marked 'screened'?

Show
Michael Blume added a comment - 1601-3 no longer applies cleanly to master, I've got a reroll that does, is it ok to attach it even though the ticket is marked 'screened'?
Hide
Alex Miller added a comment -

Updated tests to apply cleanly to current master in -4 patch.

Show
Alex Miller added a comment - Updated tests to apply cleanly to current master in -4 patch.
Hide
Alex Miller added a comment -

Ha, didn't see your comment Michael! I was working on the same thing.

Show
Alex Miller added a comment - Ha, didn't see your comment Michael! I was working on the same thing.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: