Clojure

partition-by reducer

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  1. reducer-partition-by.diff
    10/May/12 8:08 PM
    3 kB
    Kevin Downey
  2. reducer-partition-by2.diff
    15/Aug/12 2:11 AM
    3 kB
    Kevin Downey
  3. reducer-partition-by3.diff
    15/Aug/12 2:34 PM
    3 kB
    Kevin Downey
  4. reducer-partition-by4.diff
    20/Oct/12 7:07 PM
    3 kB
    Devin Walters

Activity

Hide
Rich Hickey added a comment -

I'd like to see something much faster than this.

Show
Rich Hickey added a comment - I'd like to see something much faster than this.
Hide
Kevin Downey added a comment -

For reference here is a benchmark of a non-reducers (seq based) process that uses partition-by

user=> (def x (vec (range 1e6)))
#'user/x
user=> (bench (reduce + (map count (partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 60
             Execution time mean : 1.072157 sec  95.0% CI: (1.070606 sec, 1.073381 sec)
    Execution time std-deviation : 165.818282 ms  95.0% CI: (163.873585 ms, 168.271261 ms)
         Execution time lower ci : 972.562000 ms  95.0% CI: (972.562000 ms, 973.301850 ms)
         Execution time upper ci : 1.419148 sec  95.0% CI: (1.419148 sec, 1.419148 sec)

Found 7 outliers in 60 samples (11.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 5 (8.3333 %)
 Variance from outliers : 85.8489 % Variance is severely inflated by outliers
nil
user=>

Same again using r/partition-by from reducer-partition-by.diff

user=> (bench (r/reduce + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 60
             Execution time mean : 1.418350 sec  95.0% CI: (1.417738 sec, 1.418948 sec)
    Execution time std-deviation : 66.736477 ms  95.0% CI: (66.186568 ms, 67.610777 ms)
         Execution time lower ci : 1.370419 sec  95.0% CI: (1.370419 sec, 1.370419 sec)
         Execution time upper ci : 1.544151 sec  95.0% CI: (1.544151 sec, 1.544156 sec)

Found 10 outliers in 60 samples (16.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 8 (13.3333 %)
 Variance from outliers : 33.5591 % Variance is moderately inflated by outliers
nil
user=> 

(using https://github.com/hugoduncan/criterium for benchmarking)

Show
Kevin Downey added a comment - For reference here is a benchmark of a non-reducers (seq based) process that uses partition-by
user=> (def x (vec (range 1e6)))
#'user/x
user=> (bench (reduce + (map count (partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 60
             Execution time mean : 1.072157 sec  95.0% CI: (1.070606 sec, 1.073381 sec)
    Execution time std-deviation : 165.818282 ms  95.0% CI: (163.873585 ms, 168.271261 ms)
         Execution time lower ci : 972.562000 ms  95.0% CI: (972.562000 ms, 973.301850 ms)
         Execution time upper ci : 1.419148 sec  95.0% CI: (1.419148 sec, 1.419148 sec)

Found 7 outliers in 60 samples (11.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 5 (8.3333 %)
 Variance from outliers : 85.8489 % Variance is severely inflated by outliers
nil
user=>
Same again using r/partition-by from reducer-partition-by.diff
user=> (bench (r/reduce + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 60
             Execution time mean : 1.418350 sec  95.0% CI: (1.417738 sec, 1.418948 sec)
    Execution time std-deviation : 66.736477 ms  95.0% CI: (66.186568 ms, 67.610777 ms)
         Execution time lower ci : 1.370419 sec  95.0% CI: (1.370419 sec, 1.370419 sec)
         Execution time upper ci : 1.544151 sec  95.0% CI: (1.544151 sec, 1.544156 sec)

Found 10 outliers in 60 samples (16.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 8 (13.3333 %)
 Variance from outliers : 33.5591 % Variance is moderately inflated by outliers
nil
user=> 
(using https://github.com/hugoduncan/criterium for benchmarking)
Hide
Kevin Downey added a comment -

same again for r/partition-by from reducers-partition-by2.diff

user=> (bench (r/reduce + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 180
             Execution time mean : 307.596806 ms  95.0% CI: (307.271339 ms, 307.961550 ms)
    Execution time std-deviation : 34.060809 ms  95.0% CI: (33.613169 ms, 34.416837 ms)
         Execution time lower ci : 285.339333 ms  95.0% CI: (285.339333 ms, 285.339333 ms)
         Execution time upper ci : 385.087950 ms  95.0% CI: (385.087950 ms, 385.087950 ms)

Found 10 outliers in 60 samples (16.6667 %)
	low-severe	 4 (6.6667 %)
	low-mild	 6 (10.0000 %)
 Variance from outliers : 73.8053 % Variance is severely inflated by outliers
nil
user=> 

same again driven using r/fold (for grins) instead of r/reduce

user=> (bench (r/fold + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 360
             Execution time mean : 215.214486 ms  95.0% CI: (214.915417 ms, 215.664236 ms)
    Execution time std-deviation : 36.588464 ms  95.0% CI: (36.305548 ms, 36.847846 ms)
         Execution time lower ci : 185.575000 ms  95.0% CI: (185.575000 ms, 185.575000 ms)
         Execution time upper ci : 287.605175 ms  95.0% CI: (286.547833 ms, 287.605175 ms)

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 3 (5.0000 %)
	low-mild	 3 (5.0000 %)
 Variance from outliers : 87.6303 % Variance is severely inflated by outliers
nil
user=> 

reducers-partition-by2.diff is faster, but I am not wild about introducing a type and a protocol. also not sure about the CollFold impl.

Show
Kevin Downey added a comment - same again for r/partition-by from reducers-partition-by2.diff
user=> (bench (r/reduce + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 180
             Execution time mean : 307.596806 ms  95.0% CI: (307.271339 ms, 307.961550 ms)
    Execution time std-deviation : 34.060809 ms  95.0% CI: (33.613169 ms, 34.416837 ms)
         Execution time lower ci : 285.339333 ms  95.0% CI: (285.339333 ms, 285.339333 ms)
         Execution time upper ci : 385.087950 ms  95.0% CI: (385.087950 ms, 385.087950 ms)

Found 10 outliers in 60 samples (16.6667 %)
	low-severe	 4 (6.6667 %)
	low-mild	 6 (10.0000 %)
 Variance from outliers : 73.8053 % Variance is severely inflated by outliers
nil
user=> 
same again driven using r/fold (for grins) instead of r/reduce
user=> (bench (r/fold + (r/map count (r/partition-by #(or (zero? (mod % 3)) (zero? (mod % 5))) x))))
Evaluation count             : 360
             Execution time mean : 215.214486 ms  95.0% CI: (214.915417 ms, 215.664236 ms)
    Execution time std-deviation : 36.588464 ms  95.0% CI: (36.305548 ms, 36.847846 ms)
         Execution time lower ci : 185.575000 ms  95.0% CI: (185.575000 ms, 185.575000 ms)
         Execution time upper ci : 287.605175 ms  95.0% CI: (286.547833 ms, 287.605175 ms)

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 3 (5.0000 %)
	low-mild	 3 (5.0000 %)
 Variance from outliers : 87.6303 % Variance is severely inflated by outliers
nil
user=> 
reducers-partition-by2.diff is faster, but I am not wild about introducing a type and a protocol. also not sure about the CollFold impl.
Hide
Rich Hickey added a comment -

Let's leave fold out for this first patch, please.

Show
Rich Hickey added a comment - Let's leave fold out for this first patch, please.
Hide
Kevin Downey added a comment -

reducer-partition-by3.diff is a cleaned up version of reducer-partition-by2.diff without fold

Show
Kevin Downey added a comment - reducer-partition-by3.diff is a cleaned up version of reducer-partition-by2.diff without fold
Hide
Devin Walters added a comment -

Per Andy Fingerhut's email reducer-partition-by3.diff was failing to apply. This patch should apply cleanly to current master.

Show
Devin Walters added a comment - Per Andy Fingerhut's email reducer-partition-by3.diff was failing to apply. This patch should apply cleanly to current master.
Hide
Andy Fingerhut added a comment -

Presumptuously changing Approval from Incomplete to None, since the reason for its being marked Incomplete seems to have been addressed with the latest patch.

Show
Andy Fingerhut added a comment - Presumptuously changing Approval from Incomplete to None, since the reason for its being marked Incomplete seems to have been addressed with the latest patch.
Hide
Kevin Downey added a comment -

should this be assigned to someone?

Show
Kevin Downey added a comment - should this be assigned to someone?

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated: