<< Back to previous view

[CLJ-1601] transducer arities for map-indexed, distinct, and interpose Created: 25/Nov/14  Updated: 09/Jan/15  Resolved: 09/Jan/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Completed Votes: 0
Labels: transducers

Attachments: Text File clj-1601-2.patch     Text File clj-1601-3.patch     Text File clj-1601-4.patch     Text File clj-1601.patch     Text File clj-1601-transient-distinct.patch    
Patch: Code and Test
Approval: Ok

 Description   
  • with generative tests
  • with examples demonstrating performance

Performance: Details in comments, summary:

(def v (vec (concat (range 1000) (range 1000))))
(into [] (distinct v))            ;; 821.3 µs
(into [] (distinct) v)            ;; 388.2 µs
(into [] (interpose nil v))       ;; 316.0 µs
(into [] (interpose nil) v)       ;; 35.5 µs
(into [] (map-indexed vector v))  ;; 76.8 µs
(into [] (map-indexed vector) v)  ;; 49.4 µs

Patch: clj-1601-4.patch

Screening note: We could use transients to improve performance of the distinct impl, except checking containment in a transient set is broken per CLJ-700 (which is not currently in 1.7). I have a new patch and direction on CLJ-700 that could provide a way to solve that if we want to move it back and push this further. Or we could just wait and refactor when CLJ-700 does go in.

Screened by:



 Comments   
Comment by Alex Miller [ 25/Nov/14 11:54 AM ]

working on this

Comment by Alex Miller [ 25/Nov/14 4:22 PM ]

Initial patch with impls. Tests and perf still to do.

Comment by Alex Miller [ 27/Nov/14 7:09 AM ]

Perf tests, summarized in description:

user=> (use 'criterium.core)
nil
user=> (def v (vec (concat (range 1000) (range 1000))))
#'user/v
user=> (quick-bench (into [] (distinct v)))
WARNING: Final GC required 10.433088780213309 % of runtime
Evaluation count : 744 in 6 samples of 124 calls.
             Execution time mean : 821.339608 µs
    Execution time std-deviation : 11.351053 µs
   Execution time lower quantile : 811.901435 µs ( 2.5%)
   Execution time upper quantile : 837.972000 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (distinct) v))
WARNING: Final GC required 10.78492057474076 % of runtime
Evaluation count : 14028 in 6 samples of 2338 calls.
             Execution time mean : 43.630656 µs
    Execution time std-deviation : 170.185825 ns
   Execution time lower quantile : 43.433193 µs ( 2.5%)
   Execution time upper quantile : 43.853959 µs (97.5%)
                   Overhead used : 1.794010 ns
				   
user=> (quick-bench (into [] (interpose nil v)))
WARNING: Final GC required 10.79555726490133 % of runtime
Evaluation count : 1914 in 6 samples of 319 calls.
             Execution time mean : 316.024853 µs
    Execution time std-deviation : 9.077484 µs
   Execution time lower quantile : 310.139273 µs ( 2.5%)
   Execution time upper quantile : 330.917486 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user=> (quick-bench (into [] (interpose nil) v))
WARNING: Final GC required 10.70401297525592 % of runtime
Evaluation count : 17022 in 6 samples of 2837 calls.
             Execution time mean : 35.592672 µs
    Execution time std-deviation : 560.066138 ns
   Execution time lower quantile : 35.252348 µs ( 2.5%)
   Execution time upper quantile : 36.553414 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

user=> (quick-bench (into [] (map-indexed vector v)))
WARNING: Final GC required 12.45755646853723 % of runtime
Evaluation count : 7338 in 6 samples of 1223 calls.
             Execution time mean : 76.807691 µs
    Execution time std-deviation : 381.019170 ns
   Execution time lower quantile : 76.433202 µs ( 2.5%)
   Execution time upper quantile : 77.170733 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (map-indexed vector) v))
WARNING: Final GC required 11.38700971837483 % of runtime
Evaluation count : 12474 in 6 samples of 2079 calls.
             Execution time mean : 49.458043 µs
    Execution time std-deviation : 620.716737 ns
   Execution time lower quantile : 48.995801 µs ( 2.5%)
   Execution time upper quantile : 50.229507 µs (97.5%)
                   Overhead used : 1.794010 ns
Comment by Alex Miller [ 17/Dec/14 1:50 PM ]

Updated based on comment from Christophe Grand that java.util.HashSet used in distinct impl had different hash/equality semantics than the set used with sequences.

Comment by Nikita Prokopov [ 21/Dec/14 6:13 AM ]

This can be further improved by using transient set instead of persistent one in distinct:

distinct with persistent set, w/o transducers:  904.932406 µs
distinct with transient set,  w/o transducers:  755.338598 µs
distinct with persistent set, with transducers: 452.170600 µs
distinct with transient set,  with transducers: 293.258473 µs

Only caveat is that transient sets do not support contains? for now (see CLJ-700). This can be solved by using (.contains ^clojure.lang.ITransientSet set key)

I’m not sure what’s the best way to attach patch to this, for now attaching a patch that can be applied on top of Alex changes (clj-1601-transient-distinct.patch).

Comment by Alex Miller [ 22/Dec/14 8:32 AM ]

Hey Nikita, I'd rather fix CLJ-700 and use the normal functions rather than what you've done in the patch, which is why I hadn't done this before. I'm waiting to check with Rich whether we'll do that in 1.7 or wait till next release.

Comment by Michael Blume [ 09/Jan/15 10:23 AM ]

1601-3 no longer applies cleanly to master, I've got a reroll that does, is it ok to attach it even though the ticket is marked 'screened'?

Comment by Alex Miller [ 09/Jan/15 10:27 AM ]

Updated tests to apply cleanly to current master in -4 patch.

Comment by Alex Miller [ 09/Jan/15 10:33 AM ]

Ha, didn't see your comment Michael! I was working on the same thing.

Generated at Fri Apr 19 21:24:37 CDT 2019 using JIRA 4.4#649-r158309.