Clojure

Move LazyTransformer to an iterator strategy, extend eduction capabilities

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

  • LazyTransformer does a lot of work to be a seq. Instead, switch to creating a transforming iterator.
  • Change sequence to wrap iterator-seq around the transforming iterator.
  • Change the iterator-seq implementation to be chunked. IteratorSeq will no longer be used but is left in case of regressions for now.
  • Change Eduction to provide iteration directly via the transforming iterator.
  • Extend eduction to support multiple xforms.

Performance:

Compared:

  • alpha5 == before any change
  • alpha6 == after clj-1669-6.patch was applied
  • beta3 == latest, includes range enhancement, expanding mapcat enhancement, etc

;; using java 1.8
(use 'criterium.core)
(def s (range 1000))
(def v (vec s))
(def s50 (range 50))
(def v50 (vec s50))

expr alpha5 s alpha5 v alpha6 s alpha6 v beta3 s beta3 v
non-chunking transform            
(into [] (->> s (interpose 5) (partition-all 2))) 432 us 437 us 413 us 411 us 353 us 414 us
(into [] (->> s (eduction (interpose 5) (partition-all 2)))) * 117 us 118 us 117 us 113 us 116 us 113 us
1 chunking transform            
(into [] (map inc s)) 43 us 45 us 35 us 31 us 32 us 36 us
(into [] (map inc) s) 19 us 19 us 18 us 18 us 18 us 16 us
(into [] (sequence (map inc) s)) 100 us 54 us 97 us 65 us 66 us 64 us
(into [] (eduction (map inc) s)) 24 us 19 us 24 us 20 us 20 us 19 us
(doall (map inc (eduction (map inc) s))) 219 us 203 us 98 us 78 us 93 us 89 us
2 chunking transforms        
(into [] (map inc (map inc s))) 53 us 56 us 53 us 54 us 61 us 58 us
(into [] (comp (map inc) (map inc)) s) 13 us 26 us 30 us 26 us 31 us 31 us
(into [] (sequence (comp (map inc) (map inc)) s)) 111 us 64 us 98 us 73 us 83 us 80 us
(into [] (eduction (map inc) (map inc) s)) * 58 us 31 us 58 us 31 us 30 us 31 us
(doall (map inc (eduction (map inc) (map inc) s))) * 240 us 212 us 114 us 93 us 105 us 102 us
expand transform            
(into [] (mapcat range (map inc s50))) 74 us 73 us 67 us 68 us 37 us 39 us
(into [] (sequence (comp (map inc) (mapcat range)) s50)) 111 us 102 us 166 us 159 us 99 us 98 us
(into [] (eduction (map inc) (mapcat range) s50)) * 65 us 64 us 57 us 56 us 27 us 27 us
materialized eduction            
(sort (eduction (map inc) s)) ERR ERR 99 us 77 us 77 us 77 us
(->> s (filter odd?) (map str) (sort-by last)) 1.10 ms 1.25 ms 1.15 ms 1.19 ms 1.14 ms 1.13 ms
(->> s (eduction (filter odd?) (map str)) (sort-by last)) ERR ERR 1.18 ms 1.15 ms 1.13 ms 1.13 ms
  • used comp to combine xforms as eduction only took one in the before case

Patch: clj-1669-6.patch

Screened by:

  1. clj-1669.patch
    04/Mar/15 11:43 AM
    19 kB
    Alex Miller
  2. clj-1669-2.patch
    06/Mar/15 8:33 AM
    19 kB
    Alex Miller
  3. clj-1669-3.patch
    06/Mar/15 11:37 AM
    20 kB
    Alex Miller
  4. clj-1669-4.patch
    21/Mar/15 7:47 AM
    21 kB
    Alex Miller
  5. clj-1669-5.patch
    21/Mar/15 8:45 AM
    22 kB
    Alex Miller
  6. clj-1669-6.patch
    27/Mar/15 10:15 AM
    22 kB
    Alex Miller

Activity

Hide
Michael Blume added a comment -

Nice, I like the direction on this.

CLJ-1515 currently breaks this patch (LongRange cannot be converted to Iterable), but I imagine that'll get better when it absorbs the changes from CLJ-1603

Show
Michael Blume added a comment - Nice, I like the direction on this. CLJ-1515 currently breaks this patch (LongRange cannot be converted to Iterable), but I imagine that'll get better when it absorbs the changes from CLJ-1603
Hide
Alex Miller added a comment -

Yeah. colls should be mapped through RT.iter() to catch more cases.

Show
Alex Miller added a comment - Yeah. colls should be mapped through RT.iter() to catch more cases.
Hide
Alex Miller added a comment -

To do:

  • remove Seqable from Eduction
  • support Iterable in RT.toArray()
  • more eduction pipeline tests that require realization at end
Show
Alex Miller added a comment - To do:
  • remove Seqable from Eduction
  • support Iterable in RT.toArray()
  • more eduction pipeline tests that require realization at end
Hide
Alex Miller added a comment -

Perf numbers show pretty worse results from sequence, will dig in further.

Show
Alex Miller added a comment - Perf numbers show pretty worse results from sequence, will dig in further.
Hide
Alex Miller added a comment - - edited

For the s timings, we've actually introduced more steps into the stack:

OLD reduce with s:

LazyTransformer
   seq (range) - every transformation is another layer here

NEW reduce with s:

IteratorSeq 
  TransformingIterator (handles N transformations in 1 step)
    SeqIterator
      seq (range)
Show
Alex Miller added a comment - - edited For the s timings, we've actually introduced more steps into the stack: OLD reduce with s:
LazyTransformer
   seq (range) - every transformation is another layer here
NEW reduce with s:
IteratorSeq 
  TransformingIterator (handles N transformations in 1 step)
    SeqIterator
      seq (range)
Hide
Alex Miller added a comment -

Look at perf for:

  • ->> eduction transformation
  • transformation comparison that doesn't support chunking
  • more into vector iteration case
Show
Alex Miller added a comment - Look at perf for:
  • ->> eduction transformation
  • transformation comparison that doesn't support chunking
  • more into vector iteration case
Hide
Alex Miller added a comment -

The -5 patch is same -3 except all uses of IteratorSeq have been replaced with a ChunkedCons that is effectively a chunked version of the old IteratorSeq. While no one calls it, I left IteratorSeq in the code base in case of regression.

Generally, the chunked iterator seq reduces the cost in a number of the worst cases and also is a clear benefit in making seqs over a result of eduction or sequence faster to traverse (as they are now chunked).

I think the one potential issue is that seqs over iterators are now chunked when they were not before which could change programs that expect their stateful iterator to be traversed one at a time. This change could be isolated to just to sequence and seq-iterator and mitigated by not changing RT.seqFrom() and seq-iterator to use the new chunking behavior only in sequence and/or with a new chunked-iterator-seq to make it more explicit. The sequence over xf is new so no possible regression there, everything else would just be opt-in.

Show
Alex Miller added a comment - The -5 patch is same -3 except all uses of IteratorSeq have been replaced with a ChunkedCons that is effectively a chunked version of the old IteratorSeq. While no one calls it, I left IteratorSeq in the code base in case of regression. Generally, the chunked iterator seq reduces the cost in a number of the worst cases and also is a clear benefit in making seqs over a result of eduction or sequence faster to traverse (as they are now chunked). I think the one potential issue is that seqs over iterators are now chunked when they were not before which could change programs that expect their stateful iterator to be traversed one at a time. This change could be isolated to just to sequence and seq-iterator and mitigated by not changing RT.seqFrom() and seq-iterator to use the new chunking behavior only in sequence and/or with a new chunked-iterator-seq to make it more explicit. The sequence over xf is new so no possible regression there, everything else would just be opt-in.
Hide
Rich Hickey added a comment -

push as is but leave unresolved, for perf tweaks

Show
Rich Hickey added a comment - push as is but leave unresolved, for perf tweaks
Hide
Alex Miller added a comment -

clj-1669-6 is identical to clj-1669-5 but removes two commented out debugging lines that were inadvertently included.

Show
Alex Miller added a comment - clj-1669-6 is identical to clj-1669-5 but removes two commented out debugging lines that were inadvertently included.
Hide
Alex Miller added a comment -

updated for beta3 numbers

Show
Alex Miller added a comment - updated for beta3 numbers

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: