Clojure

range should return something that implements IReduce

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

range should implement IReduce for fast reduction.

Patch: clj-1515-14.patch - Java range with long optimizations

Approach: The clj-1515-14 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-14 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized version of ArrayChunk for greater performance.

The special case of (range) is just handled with (iterate inc' 0) (which was further optimized for reduce in CLJ-1603).

Alternatives:

  • CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.
  • clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

Performance

criterium quick-bench with java 1.8.0-b132

code 1.7.0-master Java clj-1515-14 split clj-1515-11
(count (range (* 1024 1024))) 63 ms 0 ms 58 ms
(reduce + (map inc (range (* 1024 1024)))) 50 ms 33 ms 50 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 68 ms 52 ms 67 ms
(count (keep odd? (range (* 1024 1024)))) 73 ms 52 ms 67 ms
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 46 ms 26 ms 34 ms
(reduce + 0 (range (* 2048 1024))) 67 ms 19 ms 40 ms
(reduce + 0 (rest (range (* 2048 1024)))) 67 ms 19 ms 57 ms
(doall (range 0 31)) 1.04 µs 0.75 µs 1.15 µs
(doall (range 0 32)) 1.08 µs 0.76 µs 1.18 µs
(doall (range 0 4096)) 135 µs 96 µs 145 µs
(into [] (map inc (range 31))) 1.81 µs 1.30 µs 1.88 µs
(into [] (map inc) (range 31)) 1.71 µs 0.75 µs 1.02 µs
(into [] (range 128)) 4.82 µs 2.20 µs 3.27 µs
(doall (range 1/2 1000 1/3)) 1.24 ms 1.24 ms 1.28 µs
(doall (range 0.5 1000 0.33)) 126 µs 147 µs 151 µs
(into [] (range 1/2 1000 1/3)) 1.26 ms 1.20 ms 1.26 ms
(into [] (range 0.5 1000 0.33)) 148 µs 91 µs 130 µs
(count (filter odd? (take (* 1024 1024) (range)))) 185 ms 167 ms 176 ms
(transduce (take (* 1024 1024)) + (range)) 67 ms 35 ms 68 ms

Performance notes:

  • 1515-14 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with (range) leverage optimizations from CLJ-1603.
  • 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see (reduce + 0 (rest (range (* 2048 1024))))).

Questions
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.

  1. clj-1515.patch
    09/Dec/14 2:40 AM
    23 kB
    Alex Miller
  2. clj-1515-10.patch
    02/Jan/15 11:24 AM
    21 kB
    Alex Miller
  3. clj-1515-11.patch
    25/Mar/15 12:44 PM
    6 kB
    Alex Miller
  4. clj-1515-12.patch
    25/Mar/15 12:44 PM
    18 kB
    Alex Miller
  5. clj-1515-13.patch
    27/Mar/15 3:44 PM
    18 kB
    Alex Miller
  6. clj-1515-14.patch
    03/Apr/15 10:28 AM
    18 kB
    Alex Miller
  7. clj-1515-2.patch
    10/Dec/14 5:12 PM
    23 kB
    Alex Miller
  8. clj-1515-3.patch
    10/Dec/14 11:34 PM
    22 kB
    Alex Miller
  9. clj-1515-4.patch
    11/Dec/14 12:11 AM
    22 kB
    Alex Miller
  10. clj-1515-5.patch
    11/Dec/14 10:20 AM
    22 kB
    Alex Miller
  11. clj-1515-6.patch
    11/Dec/14 11:31 AM
    22 kB
    Alex Miller
  12. clj-1515-7.patch
    11/Dec/14 11:49 AM
    22 kB
    Alex Miller
  13. clj-1515-8.patch
    11/Dec/14 3:49 PM
    23 kB
    Alex Miller
  14. clj-1515-9.patch
    12/Dec/14 2:11 PM
    22 kB
    Alex Miller
  15. CLJ-1515-deftype.patch
    18/Jan/15 8:27 PM
    13 kB
    Ghadi Shayban
  16. CLJ-1515-deftype2.patch
    22/Jan/15 9:35 AM
    12 kB
    Ghadi Shayban
  17. CLJ-1515-deftype3.patch
    23/Jan/15 3:20 PM
    12 kB
    Ghadi Shayban
  18. CLJ-1515-deftype-nostructural-dup.patch
    18/Jan/15 11:05 PM
    12 kB
    Ghadi Shayban
  19. patch.diff
    29/Aug/14 3:34 PM
    9 kB
    Timothy Baldridge
  20. range-patch3.diff
    19/Sep/14 10:02 AM
    11 kB
    Timothy Baldridge
  21. reified-range4.diff
    31/Oct/14 11:30 AM
    16 kB
    Timothy Baldridge

Activity

Alex Miller made changes -
Field Original Value New Value
Fix Version/s Release 1.7 [ 10250 ]
Approval Vetted [ 10003 ]
Timothy Baldridge made changes -
Summary Range should return something that implements IReduce Reify the result of range
Attachment patch.diff [ 13279 ]
Description For better integration with transducers, it would be nice if range returned an instance of IReduce. Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Patch:* patch.diff

 
Patch Code and Test [ 10002 ]
Hide
Alex Miller added a comment -

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Show
Alex Miller added a comment - 1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable. 2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end 3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable. 4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1? 5) _hash and _hasheq should be transient. 6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once. 7) Why the change in test/clojure/test_clojure/serialization.clj ? 8) Can you squash into a single commit?
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Timothy Baldridge made changes -
Attachment patch.diff [ 13281 ]
Timothy Baldridge made changes -
Attachment patch.diff [ 13279 ]
Hide
Timothy Baldridge added a comment -

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Show
Timothy Baldridge added a comment - 1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this? 2) exception added 3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too. 4) fixed hash init values, defaults to -1 like ASeq 5) hash fields are now transient 6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it? 7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN. 8) squashed.
Hide
Alex Miller added a comment -

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable

Show
Alex Miller added a comment - 6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common 7) yep, reasonable
Édipo L Féderle made changes -
Labels enhancement
Andy Fingerhut made changes -
Labels enhancement
Hide
Andy Fingerhut added a comment - - edited

I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets

Show
Andy Fingerhut added a comment - - edited I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets
Hide
Timothy Baldridge added a comment -

New patch with IReduce directly on Range instead of relying on iterators

Show
Timothy Baldridge added a comment - New patch with IReduce directly on Range instead of relying on iterators
Timothy Baldridge made changes -
Attachment range-patch3.diff [ 13350 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Patch:* patch.diff

 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Patch:* range-patch3.diff

 
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Priority Minor [ 4 ] Major [ 3 ]
Hide
Alex Miller added a comment - - edited

The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is.

Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.

Show
Alex Miller added a comment - - edited The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is. Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Hide
Timothy Baldridge added a comment -

Looks like chunked seqs do make lazy seq code about 5x faster in these tests.

Show
Timothy Baldridge added a comment - Looks like chunked seqs do make lazy seq code about 5x faster in these tests.
Timothy Baldridge made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Patch:* range-patch3.diff

 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 100] (time (dotimes [x 1] (count (range (* 1024 1024))))))
master => 80-110ms
patch => 0.014ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (range (* 1024 1024)))))))
master => 76-87ms
patch => 340-360ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (map inc (range (* 1024 1024))))))))
master => 97-123ms
patch=> 490-577ms



(dotimes [x 100] (time (dotimes [x 1] (count (filter odd? (range (* 1024 1024)))))))
master => 87-104ms
patch => 370-330ms


(dotimes [x 100] (time (dotimes [x 1] (transduce (comp (map inc) (map inc)) + (range (* 1024 1024))))))
master=>76-116ms
patch => 44ms-59ms
{noformat}

*Patch:* range-patch3.diff

 
Hide
Ghadi Shayban added a comment -

I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge.

Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)

Show
Ghadi Shayban added a comment - I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge. Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)
Hide
Timothy Baldridge added a comment -

I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)

Show
Timothy Baldridge added a comment - I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)
Hide
Alex Miller added a comment -

I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty.

On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.

Show
Alex Miller added a comment - I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty. On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.
Hide
Ghadi Shayban added a comment -

Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.

Show
Ghadi Shayban added a comment - Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.
Rich Hickey made changes -
Fix Version/s Release 1.8 [ 10254 ]
Fix Version/s Release 1.7 [ 10250 ]
Alex Miller made changes -
Fix Version/s Release 1.7 [ 10250 ]
Fix Version/s Release 1.8 [ 10254 ]
Hide
Timothy Baldridge added a comment -

Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.

Show
Timothy Baldridge added a comment - Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.
Timothy Baldridge made changes -
Attachment reified-range4.diff [ 13457 ]
Timothy Baldridge made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 100] (time (dotimes [x 1] (count (range (* 1024 1024))))))
master => 80-110ms
patch => 0.014ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (range (* 1024 1024)))))))
master => 76-87ms
patch => 340-360ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (map inc (range (* 1024 1024))))))))
master => 97-123ms
patch=> 490-577ms



(dotimes [x 100] (time (dotimes [x 1] (count (filter odd? (range (* 1024 1024)))))))
master => 87-104ms
patch => 370-330ms


(dotimes [x 100] (time (dotimes [x 1] (transduce (comp (map inc) (map inc)) + (range (* 1024 1024))))))
master=>76-116ms
patch => 44ms-59ms
{noformat}

*Patch:* range-patch3.diff

 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 100] (time (dotimes [x 1] (count (range (* 1024 1024))))))
master => 80-110ms
patch => 0.014ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (range (* 1024 1024)))))))
master => 76-87ms
patch => 340-360ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (map inc (range (* 1024 1024))))))))
master => 97-123ms
patch=> 490-577ms



(dotimes [x 100] (time (dotimes [x 1] (count (filter odd? (range (* 1024 1024)))))))
master => 87-104ms
patch => 370-330ms


(dotimes [x 100] (time (dotimes [x 1] (transduce (comp (map inc) (map inc)) + (range (* 1024 1024))))))
master=>76-116ms
patch => 44ms-59ms
{noformat}

*Patch:* reified-range4.diff (some tests fail)
 
Hide
Nicola Mometto added a comment -

As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.

Show
Nicola Mometto added a comment - As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.
Hide
Alex Miller added a comment -

Still a work in progress...

Show
Alex Miller added a comment - Still a work in progress...
Alex Miller made changes -
Attachment clj-1515.patch [ 13590 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

*Note:* this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

*Note:* this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 100] (time (dotimes [x 1] (count (range (* 1024 1024))))))
master => 80-110ms
patch => 0.014ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (range (* 1024 1024)))))))
master => 76-87ms
patch => 340-360ms


(dotimes [x 100] (time (dotimes [x 1] (reduce + (map inc (map inc (range (* 1024 1024))))))))
master => 97-123ms
patch=> 490-577ms



(dotimes [x 100] (time (dotimes [x 1] (count (filter odd? (range (* 1024 1024)))))))
master => 87-104ms
patch => 370-330ms


(dotimes [x 100] (time (dotimes [x 1] (transduce (comp (map inc) (map inc)) + (range (* 1024 1024))))))
master=>76-116ms
patch => 44ms-59ms
{noformat}

*Patch:* reified-range4.diff (some tests fail)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 50] (time (count (range (* 1024 1024)))))
master => 55 ms
patch => 0 ms

(dotimes [x 50] (time (reduce + (map inc (range (* 1024 1024))))))
master => 38 ms
patch => 34 ms

(dotimes [x 50] (time (reduce + (map inc (map inc (range (* 1024 1024))))) ))
master => 58 ms
patch=> 53 ms

(dotimes [x 50] (time (count (filter odd? (range (* 1024 1024))))))
master => 46 ms
patch => 26 ms

(dotimes [x 50] (time (transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))))
master=> 37 ms
patch=> 16 ms
{noformat}

*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl run via java -jar)

{noformat}

(dotimes [x 50] (time (count (range (* 1024 1024)))))
master => 55 ms
patch => 0 ms

(dotimes [x 50] (time (reduce + (map inc (range (* 1024 1024))))))
master => 38 ms
patch => 34 ms

(dotimes [x 50] (time (reduce + (map inc (map inc (range (* 1024 1024))))) ))
master => 58 ms
patch=> 53 ms

(dotimes [x 50] (time (count (filter odd? (range (* 1024 1024))))))
master => 46 ms
patch => 26 ms

(dotimes [x 50] (time (transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))))
master=> 37 ms
patch=> 16 ms
{noformat}

*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms||
|(reduce + (map inc (range (* 1024 1024))))|38 ms|34 ms|count is now O(1)|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|53 ms||
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|1.7 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.5 ms|1.5 ms||
|(doall (range 0.5 1000 0.33))|0.2 ms|0.5 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.16 ms||

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms||
|(reduce + (map inc (range (* 1024 1024))))|38 ms|34 ms|count is now O(1)|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|53 ms||
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|1.7 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.5 ms|1.5 ms||
|(doall (range 0.5 1000 0.33))|0.2 ms|0.5 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.16 ms||

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|34 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|53 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|1.7 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.5 ms|1.5 ms| |
|(doall (range 0.5 1000 0.33))|0.2 ms|0.5 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.16 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Hide
Nicola Mometto added a comment - - edited

Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit.

Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.

Show
Nicola Mometto added a comment - - edited Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit. Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.
Hide
Alex Miller added a comment -

That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.

Show
Alex Miller added a comment - That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.
Alex Miller made changes -
Attachment clj-1515-2.patch [ 13595 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|34 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|53 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|1.7 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.5 ms|1.5 ms| |
|(doall (range 0.5 1000 0.33))|0.2 ms|0.5 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.16 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time }}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr}}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr}}})

||expr||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|46 ms| 26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|58 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|37 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.5 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.4 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.20 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.21 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Alex Miller made changes -
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515-2.patch (still a work in progress)
 
Hide
Ghadi Shayban added a comment -

I am impressed and have learned a ton through this exercise.

quick review of clj-1515-2
1) withMeta gives the newly formed object the wrong meta.
2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break.
3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl.
4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1
5) s/instanceof Reduced/RT.isReduced
6) is the overflow behavior of "int count()" correct?

Show
Ghadi Shayban added a comment - I am impressed and have learned a ton through this exercise. quick review of clj-1515-2 1) withMeta gives the newly formed object the wrong meta. 2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break. 3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl. 4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1 5) s/instanceof Reduced/RT.isReduced 6) is the overflow behavior of "int count()" correct?
Alex Miller made changes -
Attachment clj-1515-3.patch [ 13597 ]
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range. LongRange implements the specific (extremely common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are slower (but a small percentage of cases). I may make those chunked as well.


*Patch:* clj-1515-2.patch (still a work in progress)
 
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost due to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-3.patch

*Screened by:*
Hide
Alex Miller added a comment -

1) agreed!
2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange.
3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics.
4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better.
5) agreed
6) probably not. ties into 2/3.

Thanks for this, will address.

Show
Alex Miller added a comment - 1) agreed! 2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange. 3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics. 4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better. 5) agreed 6) probably not. ties into 2/3. Thanks for this, will address.
Hide
Alex Miller added a comment -

Added -4 patch that addresses 1,4,5 but not the (range) stuff.

Show
Alex Miller added a comment - Added -4 patch that addresses 1,4,5 but not the (range) stuff.
Alex Miller made changes -
Attachment clj-1515-4.patch [ 13598 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost due to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-3.patch

*Screened by:*
Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost due to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Alex Miller made changes -
Summary Reify the result of range Reify the result of range and add IReduceInit
Attachment clj-1515-5.patch [ 13599 ]
Description Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost due to the reduced number of allocations.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance.

*Note:* The most recent patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra drop in times.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
(timings done at the repl via {{(dotimes [x 50] (time expr))}})

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done at the repl via {{(dotimes [x 50] (time expr))}}

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done at the repl via {{(dotimes [x 50] (time expr))}}

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done at the repl via (dotimes [x 50] (time expr))

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Alex Miller made changes -
Attachment clj-1515-6.patch [ 13600 ]
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done at the repl via (dotimes [x 50] (time expr))

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|250-700 ms|250-700 ms|94 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|47 ms|69 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|57 ms|55 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|40 ms|38 ms|17 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|58 ms|55 ms|35 ms| |
|(count (filter odd? (range (* 1024 1024))))|47 ms|46 ms|26 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|36 ms| 16 ms|reduce much faster|
|(doall (range 1/2 1000 1/3))|1.7 ms|1.7 ms|2.0 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.6 ms|1.5 ms|1.4 ms| |
|(doall (range 0.5 1000 0.33))|0.14 ms|0.14 ms|0.38 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.14 ms|0.13 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-4.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|633 ms|510 ms|406 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|82 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-6.patch

*Screened by:*
Alex Miller made changes -
Attachment clj-1515-7.patch [ 13601 ]
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc 0)}} (which is further optimized in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|633 ms|510 ms|406 ms|w/CLJ-1603: 55 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|82 ms|w/CLJ-1603: 32 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-6.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|633 ms|510 ms|402 ms|w/CLJ-1603: 499 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(doall (take (* 1024 1024) (range)))|633 ms|510 ms|402 ms|w/CLJ-1603: 499 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Hide
Alex Miller added a comment -

Latest -7 patch addresses all feedback and perf #s updated.

Show
Alex Miller added a comment - Latest -7 patch addresses all feedback and perf #s updated.
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms||
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement| |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Alex Miller made changes -
Attachment clj-1515-8.patch [ 13604 ]
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement| |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.81 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.83 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-7.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-8.patch

*Screened by:*
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

Generally, anything with long ranges or reduce is faster. Non-long range sequences are not currently chunked and are a bit slower

*Patch:* clj-1515-8.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-8.patch

*Screened by:*
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-8.patch

*Screened by:*
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-8.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this impl, which seems to be implied by the doc string but was not actually implemented or tested afaict.



Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Alex Miller made changes -
Attachment clj-1515-9.patch [ 13616 ]
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||Note||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|w/CLJ-1603: 182 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|80 ms|w/CLJ-1603: 41 ms|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|count is now O(1)|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|41 ms| |
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|63 ms| |
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|40 ms|Extra drop from filter enhancement|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms| |
|(doall (range 0 31))|1.41 µs|1.51 µs|3.06 µs| |
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.75 ms|non-long seq not chunked|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.80 ms| |
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.37 ms|non-long seq not chunked|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.11 ms| |

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-8.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this impl, which seems to be implied by the doc string but was not actually implemented or tested afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|170 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|170 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|170 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Hide
Ghadi Shayban added a comment -

See CLJ-1572, I believe CollReduce needs to be extended directly to clojure.lang.{LongRange,Range} inside protocols.clj

Show
Ghadi Shayban added a comment - See CLJ-1572, I believe CollReduce needs to be extended directly to clojure.lang.{LongRange,Range} inside protocols.clj
Hide
Ghadi Shayban added a comment -

Seems like a missing benchmark is (range 0 31) without the doall. Current patch (-9) allocates the chunk buffer at range's construction time. Maybe this can be delayed?

Show
Ghadi Shayban added a comment - Seems like a missing benchmark is (range 0 31) without the doall. Current patch (-9) allocates the chunk buffer at range's construction time. Maybe this can be delayed?
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|170 ms|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|46 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|69 ms|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|36 ms|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.02 µs|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.43 µs|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.63 µs|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.38 ms|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.08 ms|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

*Patch:* clj-1515-9.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|1.0|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| ? |17 ns|1.9|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|1.0|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| ? |17 ns|1.9|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0|

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|1.0|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Hide
Alex Miller added a comment -

Ghadi - you are right. I reworked the patch (new -10 version) so that the chunk is only created on demand. Basically, the chunking is only used when traversing via chunked seqs and in normal seq iteration or reduce, no explicit chunks are created. That improved several timings. I also added the bench for (range 0 31) which was greatly improved by lazily creating the chunk, so good point on that.

Show
Alex Miller added a comment - Ghadi - you are right. I reworked the patch (new -10 version) so that the chunk is only created on demand. Basically, the chunking is only used when traversing via chunked seqs and in normal seq iteration or reduce, no explicit chunks are created. That improved several timings. I also added the bench for (range 0 31) which was greatly improved by lazily creating the chunk, so good point on that.
Alex Miller made changes -
Attachment clj-1515-10.patch [ 13704 ]
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms|1.0|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms (w/CLJ-1603: 152 ms)|1.0 (0.8)|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch / 1.6.0||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms (w/CLJ-1603: 152 ms)|1.0 (0.8)|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch time / 1.6.0 time||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms (w/CLJ-1603: 152 ms)|1.0 (0.8)|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Hide
Michael Blume added a comment -

I'm looking at the implementation of equiv() and wondering if it's worth checking whether the other object is also a reified range and comparing the private parameters rather than iterating through the sequences.

Show
Michael Blume added a comment - I'm looking at the implementation of equiv() and wondering if it's worth checking whether the other object is also a reified range and comparing the private parameters rather than iterating through the sequences.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Alex Miller made changes -
Assignee Alex Miller [ alexmiller ]
Hide
Ghadi Shayban added a comment -

Attaching a simplified implementation as a deftype. I benchmarked a billion variants and dumped bytecode. I'm attaching the best performing combination, and it beats out the Java one in most cases.

Approach:
Two deftypes, one specialized to primitive longs, the other generic math.
Implement: Seqable/Counted,IReduce,IReduceInit,Iterable
Also, marker interfaces Sequential and Serialized.

The implementations are very straightforward, but Seqable deserves mention.
'seq' delegates its implementation to the existing lazy-seq based range, which has been stripped of the strange corner cases and now only handles strictly ascending/descending ranges. It has been renamed range*, rather than range1 because all the arguments to range are already numbers. core references to range have been accordingly renamed.

The new range constructor is loaded after reduce & protocols load. It checks types, and delegates to either long-range or generic-range, which handle the wacky argument cases that are not strictly ascending or descending. I'm annoyed with the structural duplication of those conditionals, not sure how to solve it.

Inside the LongRange implementation of IReduce/IReduceInit, boxing is carefully controlled, and was verified through bytecode dumping.

I elaborated the benchmarks for comparison, and also included benchmarking without type specialization.

You discover strange things working on this stuff. Turns out having the comparator close over the 'end' field is beneficial: #(< % end).

Alex, I figured out why the Java versions had a nearly exactly 2x regression on (doall (range 0 31)), and I attached a change to 'dorun'. Instead of calling (seq coll) then (next coll), it effectively calls (next (seq coll)). I think the implementation assumes that calling 'seq' is evaluating the thunk inside LazySeq. This should also help other Seqable impls like Cycle/Iterate/Repeat CLJ-1603.

Results (criterium full runs):

code 1.7.0-alpha5 clj-1515-10.patch (Java) deftype specialized (attached) deftype unspecialized
(count (filter odd? (take (* 1024 1024) (range)))) 187.30 ms 194.92 ms 182.53 ms 184.13 ms
(transduce (take (* 1024 1024)) + (range)) 58.27 ms 89.37 ms 84.69 ms 84.72 ms
(count (range (* 1024 1024))) 62.34 ms 27.11 ns not run not run
(reduce + (map inc (range (* 1024 1024)))) 57.63 ms 46.14 ms 46.17 ms 52.94 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 82.83 ms 68.66 ms 64.36 ms 71.76 ms
(count (keep odd? (range (* 1024 1024)))) 76.09 ms 57.59 ms not run not run
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 52.17 ms 39.71 ms 28.83 ms 42.23 ms
(reduce + 0 (range (* 2048 1024))) 85.93 ms 38.03 ms 26.49 ms 42.43 ms
(doall (range 0 31)) 1.33 µs 2.89 µs 1.03 µs 1.08 µs
(doall (range 0 32)) 1.35 µs 2.97 µs 1.07 µs 1.10 µs
(doall (range 0 128)) 5.27 µs 11.93 µs 4.19 µs 4.29 µs
(doall (range 0 512)) 21.66 µs 47.33 µs 16.95 µs 17.66 µs
(doall (range 0 4096)) 171.30 µs 378.52 µs 135.45 µs 140.27 µs
(into [] (map inc (range 31))) 1.97 µs 1.57 µs 1.87 µs 1.95 µs
(into [] (map inc) (range 31)) 1.66 µs 824.67 ns 891.90 ns 1.11 µs
(into [] (range 128)) 5.11 µs 2.21 µs 2.43 µs 3.36 µs
(doall (range 1/2 1000 1/3)) 1.53 ms 1.67 ms 1.59 ms 1.52 ms
(doall (range 0.5 1000 0.33)) 164.83 µs 382.38 µs 149.38 µs 141.21 µs
(into [] (range 1/2 1000 1/3)) 1.53 ms 1.40 ms 1.43 ms 1.50 ms
(into [] (range 0.5 1000 0.33)) 157.44 µs 108.27 µs 104.18 µs 127.20 µs

Open Questions for screeners of the deftype patch:

1) What to do about the autopromotion at the Long/MAX_VALUE boundary? I've preserved the current behavior of Clojure 1.6.
2) Alex, I did not pull forward the filter chunk tweak you discovered
3) Is the structural duplication of the conditionals in generic-range long-range awful?
4) Are there any other missing interfaces? IMeta comes to mind. Not sure about IHashEq either.

Show
Ghadi Shayban added a comment - Attaching a simplified implementation as a deftype. I benchmarked a billion variants and dumped bytecode. I'm attaching the best performing combination, and it beats out the Java one in most cases. Approach: Two deftypes, one specialized to primitive longs, the other generic math. Implement: Seqable/Counted,IReduce,IReduceInit,Iterable Also, marker interfaces Sequential and Serialized. The implementations are very straightforward, but Seqable deserves mention. 'seq' delegates its implementation to the existing lazy-seq based range, which has been stripped of the strange corner cases and now only handles strictly ascending/descending ranges. It has been renamed range*, rather than range1 because all the arguments to range are already numbers. core references to range have been accordingly renamed. The new range constructor is loaded after reduce & protocols load. It checks types, and delegates to either long-range or generic-range, which handle the wacky argument cases that are not strictly ascending or descending. I'm annoyed with the structural duplication of those conditionals, not sure how to solve it. Inside the LongRange implementation of IReduce/IReduceInit, boxing is carefully controlled, and was verified through bytecode dumping. I elaborated the benchmarks for comparison, and also included benchmarking without type specialization. You discover strange things working on this stuff. Turns out having the comparator close over the 'end' field is beneficial: #(< % end). Alex, I figured out why the Java versions had a nearly exactly 2x regression on (doall (range 0 31)), and I attached a change to 'dorun'. Instead of calling (seq coll) then (next coll), it effectively calls (next (seq coll)). I think the implementation assumes that calling 'seq' is evaluating the thunk inside LazySeq. This should also help other Seqable impls like Cycle/Iterate/Repeat CLJ-1603. Results (criterium full runs):
code 1.7.0-alpha5 clj-1515-10.patch (Java) deftype specialized (attached) deftype unspecialized
(count (filter odd? (take (* 1024 1024) (range)))) 187.30 ms 194.92 ms 182.53 ms 184.13 ms
(transduce (take (* 1024 1024)) + (range)) 58.27 ms 89.37 ms 84.69 ms 84.72 ms
(count (range (* 1024 1024))) 62.34 ms 27.11 ns not run not run
(reduce + (map inc (range (* 1024 1024)))) 57.63 ms 46.14 ms 46.17 ms 52.94 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 82.83 ms 68.66 ms 64.36 ms 71.76 ms
(count (keep odd? (range (* 1024 1024)))) 76.09 ms 57.59 ms not run not run
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 52.17 ms 39.71 ms 28.83 ms 42.23 ms
(reduce + 0 (range (* 2048 1024))) 85.93 ms 38.03 ms 26.49 ms 42.43 ms
(doall (range 0 31)) 1.33 µs 2.89 µs 1.03 µs 1.08 µs
(doall (range 0 32)) 1.35 µs 2.97 µs 1.07 µs 1.10 µs
(doall (range 0 128)) 5.27 µs 11.93 µs 4.19 µs 4.29 µs
(doall (range 0 512)) 21.66 µs 47.33 µs 16.95 µs 17.66 µs
(doall (range 0 4096)) 171.30 µs 378.52 µs 135.45 µs 140.27 µs
(into [] (map inc (range 31))) 1.97 µs 1.57 µs 1.87 µs 1.95 µs
(into [] (map inc) (range 31)) 1.66 µs 824.67 ns 891.90 ns 1.11 µs
(into [] (range 128)) 5.11 µs 2.21 µs 2.43 µs 3.36 µs
(doall (range 1/2 1000 1/3)) 1.53 ms 1.67 ms 1.59 ms 1.52 ms
(doall (range 0.5 1000 0.33)) 164.83 µs 382.38 µs 149.38 µs 141.21 µs
(into [] (range 1/2 1000 1/3)) 1.53 ms 1.40 ms 1.43 ms 1.50 ms
(into [] (range 0.5 1000 0.33)) 157.44 µs 108.27 µs 104.18 µs 127.20 µs
Open Questions for screeners of the deftype patch: 1) What to do about the autopromotion at the Long/MAX_VALUE boundary? I've preserved the current behavior of Clojure 1.6. 2) Alex, I did not pull forward the filter chunk tweak you discovered 3) Is the structural duplication of the conditionals in generic-range long-range awful? 4) Are there any other missing interfaces? IMeta comes to mind. Not sure about IHashEq either.
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype.patch [ 13781 ]
Hide
Ghadi Shayban added a comment -

note with the deftype patch, the transduce over infinite (range) case is the only one where 'master' currently performs best. This is because the implementation was changed to (iterate inc' 0). When CLJ-1603 is applied that situation should improve better.

Show
Ghadi Shayban added a comment - note with the deftype patch, the transduce over infinite (range) case is the only one where 'master' currently performs best. This is because the implementation was changed to (iterate inc' 0). When CLJ-1603 is applied that situation should improve better.
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype.patch [ 13781 ]
Hide
Ghadi Shayban added a comment -

fixup patch CLJ-1515-deftype

Show
Ghadi Shayban added a comment - fixup patch CLJ-1515-deftype
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype.patch [ 13782 ]
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype-nostructural-dup.patch [ 13783 ]
Hide
Ghadi Shayban added a comment - - edited

new patch CLJ-1515-deftype-nostructural-dup.patch with the silly conditional duplication removed. Updated benchmarks including 'count' impls. These benchmarks run on a different machine with the same hardness. Blue ribbon.

code 1.7.0-alpha5 1.7.0-rangejava 1.7.0-rangespecial rangespecial / alpah5
(count (filter odd? (take (* 1024 1024) (range)))) 297.14 ms 333.93 ms 328.62 ms 1.10
(transduce (take (* 1024 1024)) + (range)) 105.55 ms 145.44 ms 164.70 ms 1.56
(count (range (* 1024 1024))) 108.92 ms 61.09 ns 26.61 ns 0
(reduce + (map inc (range (* 1024 1024)))) 97.67 ms 95.41 ms 84.62 ms 0.86
(reduce + (map inc (map inc (range (* 1024 1024))))) 140.21 ms 135.59 ms 116.38 ms 0.83
(count (keep odd? (range (* 1024 1024)))) 121.18 ms 104.63 ms 111.46 ms 0.92
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 100.40 ms 86.28 ms 67.17 ms 0.67
(reduce + 0 (range (* 2048 1024))) 131.77 ms 80.43 ms 63.24 ms 0.48
(doall (range 0 31)) 2.53 µs 4.36 µs 2.24 µs 0.89
(doall (range 0 32)) 2.37 µs 4.00 µs 1.99 µs 0.84
(doall (range 0 128)) 9.20 µs 14.98 µs 8.01 µs 0.87
(doall (range 0 512)) 37.28 µs 59.13 µs 35.16 µs 0.94
(doall (range 0 4096)) 331.28 µs 471.57 µs 291.76 µs 0.88
(into [] (map inc (range 31))) 2.83 µs 2.79 µs 2.67 µs 0.94
(into [] (map inc) (range 31)) 2.21 µs 1.39 µs 1.26 µs 0.57
(into [] (range 128)) 6.72 µs 3.25 µs 3.09 µs 0.46
(doall (range 1/2 1000 1/3)) 3.41 ms 4.04 ms 3.14 ms 0.92
(doall (range 0.5 1000 0.33)) 281.04 µs 530.92 µs 244.14 µs 0.87
(into [] (range 1/2 1000 1/3)) 3.32 ms 3.71 ms 2.99 ms 0.90
(into [] (range 0.5 1000 0.33)) 215.53 µs 165.93 µs 138.86 µs 0.64
Show
Ghadi Shayban added a comment - - edited new patch CLJ-1515-deftype-nostructural-dup.patch with the silly conditional duplication removed. Updated benchmarks including 'count' impls. These benchmarks run on a different machine with the same hardness. Blue ribbon.
code 1.7.0-alpha5 1.7.0-rangejava 1.7.0-rangespecial rangespecial / alpah5
(count (filter odd? (take (* 1024 1024) (range)))) 297.14 ms 333.93 ms 328.62 ms 1.10
(transduce (take (* 1024 1024)) + (range)) 105.55 ms 145.44 ms 164.70 ms 1.56
(count (range (* 1024 1024))) 108.92 ms 61.09 ns 26.61 ns 0
(reduce + (map inc (range (* 1024 1024)))) 97.67 ms 95.41 ms 84.62 ms 0.86
(reduce + (map inc (map inc (range (* 1024 1024))))) 140.21 ms 135.59 ms 116.38 ms 0.83
(count (keep odd? (range (* 1024 1024)))) 121.18 ms 104.63 ms 111.46 ms 0.92
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) 100.40 ms 86.28 ms 67.17 ms 0.67
(reduce + 0 (range (* 2048 1024))) 131.77 ms 80.43 ms 63.24 ms 0.48
(doall (range 0 31)) 2.53 µs 4.36 µs 2.24 µs 0.89
(doall (range 0 32)) 2.37 µs 4.00 µs 1.99 µs 0.84
(doall (range 0 128)) 9.20 µs 14.98 µs 8.01 µs 0.87
(doall (range 0 512)) 37.28 µs 59.13 µs 35.16 µs 0.94
(doall (range 0 4096)) 331.28 µs 471.57 µs 291.76 µs 0.88
(into [] (map inc (range 31))) 2.83 µs 2.79 µs 2.67 µs 0.94
(into [] (map inc) (range 31)) 2.21 µs 1.39 µs 1.26 µs 0.57
(into [] (range 128)) 6.72 µs 3.25 µs 3.09 µs 0.46
(doall (range 1/2 1000 1/3)) 3.41 ms 4.04 ms 3.14 ms 0.92
(doall (range 0.5 1000 0.33)) 281.04 µs 530.92 µs 244.14 µs 0.87
(into [] (range 1/2 1000 1/3)) 3.32 ms 3.71 ms 2.99 ms 0.90
(into [] (range 0.5 1000 0.33)) 215.53 µs 165.93 µs 138.86 µs 0.64
Ghadi Shayban made changes -
Ghadi Shayban made changes -
Comment [ new patch CLJ-1515-deftype-nostructural-dup.patch with the silly conditional duplication removed. Updated benchmarks including 'count' impls. These benchmarks run on a different machine with the same hardness. Blue ribbon.

||code||1.7.0-alpha5 || 1.7.0-rangejava || 1.7.0-rangespecial||
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}297.14 ms{color} | 333.93 ms | 328.62 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}105.55 ms{color} | 145.44 ms | 164.70 ms|
|(count (range (* 1024 1024))) | 108.92 ms | 61.09 ns | {color:green}26.61 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 97.67 ms | 95.41 ms | {color:green}84.62 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 140.21 ms | 135.59 ms | {color:green}116.38 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 121.18 ms | {color:green}104.63 ms{color} | 111.46 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 100.40 ms | 86.28 ms | {color:green}67.17 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 131.77 ms | 80.43 ms | {color:green}63.24 ms{color}|
|(doall (range 0 31)) | 2.53 µs | 4.36 µs | {color:green}2.24 µs{color}|
|(doall (range 0 32)) | 2.37 µs | 4.00 µs | {color:green}1.99 µs{color}|
|(doall (range 0 128)) | 9.20 µs | 14.98 µs | {color:green}8.01 µs{color}|
|(doall (range 0 512)) | 37.28 µs | 59.13 µs | {color:green}35.16 µs{color}|
|(doall (range 0 4096)) | 331.28 µs | 471.57 µs | {color:green}291.76 µs{color}|
|(into [] (map inc (range 31))) | 2.83 µs | 2.79 µs | {color:green}2.67 µs{color}|
|(into [] (map inc) (range 31)) | 2.21 µs | 1.39 µs | {color:green}1.26 µs{color}|
|(into [] (range 128)) | 6.72 µs | 3.25 µs | {color:green}3.09 µs{color}|
|(doall (range 1/2 1000 1/3)) | 3.41 ms | 4.04 ms | {color:green}3.14 ms{color}|
|(doall (range 0.5 1000 0.33)) | 281.04 µs | 530.92 µs | {color:green}244.14 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 3.32 ms | 3.71 ms | {color:green}2.99 ms{color}|
|(into [] (range 0.5 1000 0.33)) | 215.53 µs | 165.93 µs | {color:green}138.86 µs{color}| ]
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype-nostructural-dup.patch [ 13783 ]
Hide
Alex Miller added a comment -

This is looking very good and I think we should move forward with it as the preferred approach. Feel free to update the description appropriately. I'll file a separate ticket with the filter tweak. Some comments on the patch:

1) GenericRange/count - this math is broken if you start to mix infinity in there. I think just (count (seq this)) is safer.
2) GenericRange/iterable - I think if we are IReduceInit and Seqable, we can omit this. I had it in the Java one because I inherited Iterable via ASeq but that's not an issue in this impl.
3) LongRange/reduce - why Long/valueOf? Isn't start a long field?
4) LongRange/iterable - ditto #2
5) print-method - anytime I see print-method, there should probably be print-dup too. For example, this is broken: (binding [*print-dup* true] (println (range 0 10)))
6) Is serialization broken by this patch? Can you justify the test changes?

Show
Alex Miller added a comment - This is looking very good and I think we should move forward with it as the preferred approach. Feel free to update the description appropriately. I'll file a separate ticket with the filter tweak. Some comments on the patch: 1) GenericRange/count - this math is broken if you start to mix infinity in there. I think just (count (seq this)) is safer. 2) GenericRange/iterable - I think if we are IReduceInit and Seqable, we can omit this. I had it in the Java one because I inherited Iterable via ASeq but that's not an issue in this impl. 3) LongRange/reduce - why Long/valueOf? Isn't start a long field? 4) LongRange/iterable - ditto #2 5) print-method - anytime I see print-method, there should probably be print-dup too. For example, this is broken: (binding [*print-dup* true] (println (range 0 10))) 6) Is serialization broken by this patch? Can you justify the test changes?
Hide
Ghadi Shayban added a comment -

regarding Serialization tests currently:

(defn roundtrip
  [v]
  (let [rt (-> v serialize deserialize)
        rt-seq (-> v seq serialize deserialize)]        ;; this
    (and (= v rt)            ;; this fails because the test ^ calls seq first
      (= (seq v) (seq rt))   ;; this passes
      (= (seq v) rt-seq))))  ;; this passes

These new types are merely seqable, so (not= (LongRange. 0 10 1) (seq (LongRange. 0 10 1)))

Not sure how to handle this 100%. Nothing precludes the LongRange itself from roundtripping, but just that calling seq on it returns a separate object.

Show
Ghadi Shayban added a comment - regarding Serialization tests currently:
(defn roundtrip
  [v]
  (let [rt (-> v serialize deserialize)
        rt-seq (-> v seq serialize deserialize)]        ;; this
    (and (= v rt)            ;; this fails because the test ^ calls seq first
      (= (seq v) (seq rt))   ;; this passes
      (= (seq v) rt-seq))))  ;; this passes
These new types are merely seqable, so (not= (LongRange. 0 10 1) (seq (LongRange. 0 10 1))) Not sure how to handle this 100%. Nothing precludes the LongRange itself from roundtripping, but just that calling seq on it returns a separate object.
Ghadi Shayban made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Note:* The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

*Performance:*
timings done via criterium quick-bench

||expr||1.6.0||1.7.0-alpha4||+patch||patch time / 1.6.0 time||
|(count (filter odd? (take (* 1024 1024) (range))))|183 ms|173 ms|188 ms (w/CLJ-1603: 152 ms)|1.0 (0.8)|
|(transduce (take (* 1024 1024)) + (range))|n/a|67 ms|81 ms (w/CLJ-1603: 41 ms)|1.2 (0.6)|
|(count (range (* 1024 1024)))|75 ms|69 ms|0 ms|0.0 ^4^|
|(reduce + (map inc (range (* 1024 1024))))|71 ms|68 ms|43 ms|0.6|
|(reduce + (map inc (map inc (range (* 1024 1024)))))|89 ms|91 ms|65 ms|0.7|
|(count (filter odd? (range (* 1024 1024))))|69 ms|65 ms|43 ms|0.6|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))|n/a|67 ms|33 ms|0.5|
|(range 0 31)|9 ns| 8 ns |13 ns|1.4 ^1^|
|(doall (range 0 31))|1.41 µs|1.51 µs|3.15 µs|2.2 ^2^|
|(into [] (map inc (range 31)))|1.76 µs|1.77 µs|1.24 µs|0.7|
|(into [] (map inc) (range 31))|n/a|1.60 µs|0.57 µs|0.4|
|(doall (range 1/2 1000 1/3))|1.58 ms|1.53 ms|1.66 ms|1.1 ^2 3^|
|(into [] (range 1/2 1000 1/3))|1.52 ms|1.51 ms|1.66 ms|1.1 ^3^|
|(doall (range 0.5 1000 0.33))|0.15 ms|0.14 ms|0.35 ms|2.3 ^2 3^|
|(into [] (range 0.5 1000 0.33))|0.13 ms|0.12 ms|0.12 ms|1.0 ^3^|

^1^ slightly more overhead due to checking more cases, but absolute value is small
^2^ "doall" will walk via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).
^3^ cases other than finite long range (pretty rare, not optimized)
^4^ count on finite long range is now O(1)

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower.

*Patch:* clj-1515-10.patch

*Screened by:*

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Instead of using < and > directly as comparators like the old seq did, it performed much better to close over the range upper bound, e.g. #(< % end). < and > have inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

*Note:* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*performance notes:*
The final two cases will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

The Java implementation incorrectly uses unchecked math (Java's +) rather than clojure Numbers/add. This gives it a bit of an unfair disadvantage. The deftype (while well-tuned) has very competitive performance.

*Patch:* clj-1515-10.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype2.patch [ 13794 ]
Ghadi Shayban made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Instead of using < and > directly as comparators like the old seq did, it performed much better to close over the range upper bound, e.g. #(< % end). < and > have inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

*Note:* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*performance notes:*
The final two cases will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

The Java implementation incorrectly uses unchecked math (Java's +) rather than clojure Numbers/add. This gives it a bit of an unfair disadvantage. The deftype (while well-tuned) has very competitive performance.

*Patch:* clj-1515-10.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Instead of using < and > directly as comparators like the old seq did, it performed much better to close over the range upper bound, e.g. #(< % end). < and > have inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

*Note:* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*performance notes:*
The final two cases will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

The Java implementation incorrectly uses unchecked math (Java's +) rather than clojure Numbers/add. This gives it a bit of an unfair disadvantage. The deftype (while well-tuned) has very competitive performance.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Ghadi Shayban made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Instead of using < and > directly as comparators like the old seq did, it performed much better to close over the range upper bound, e.g. #(< % end). < and > have inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

*Note:* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*performance notes:*
The final two cases will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

The Java implementation incorrectly uses unchecked math (Java's +) rather than clojure Numbers/add. This gives it a bit of an unfair disadvantage. The deftype (while well-tuned) has very competitive performance.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Ghadi Shayban made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium quick-bench
||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Hide
Alex Miller added a comment -

More comments...

1) Instead of extending IReduce and IReduceInit, just extend IReduce and implement both arities (IReduce extends IReduceInit).
2) I'm slightly troubled by the .invokePrim now. Did you look at a macro that does prim type-hinting maybe?
3) On the serialization thing, is the problem really with serialization or with equality? The test that's failing is (= v rt). Because these are not IPersistentCollections, pcequiv won't be used and it's just calling LongRange.equals(), which is not implemented and falls back to identity, right? Probably need to implement the hashCode and equals stuff. Might need IHashEq too.

user=> (= (range 5) (range 5))
false
Show
Alex Miller added a comment - More comments... 1) Instead of extending IReduce and IReduceInit, just extend IReduce and implement both arities (IReduce extends IReduceInit). 2) I'm slightly troubled by the .invokePrim now. Did you look at a macro that does prim type-hinting maybe? 3) On the serialization thing, is the problem really with serialization or with equality? The test that's failing is (= v rt). Because these are not IPersistentCollections, pcequiv won't be used and it's just calling LongRange.equals(), which is not implemented and falls back to identity, right? Probably need to implement the hashCode and equals stuff. Might need IHashEq too.
user=> (= (range 5) (range 5))
false
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Hide
Ghadi Shayban added a comment -

Took care of 1) and 3). Punting on hiding the invokePrim behind a macro. It may be shameful but it works really fast.

Another case found:
(assoc {'(0 1 2 3 4) :foo} (range 5) :bar) needs to properly overwrite keys in the map.

Cause: Might be irrelevant in the face of CLJ-1375. Util/equiv for maps doesn't use IPC/equiv unless the collection is also java.util.Collection or Map. If it is then it checks for IPersistentCollection. I added a check for IPersistentCollection first.

https://github.com/ghadishayban/clojure/commits/for-screening

To get hasheq working, I added back the iterators for use by Murmur3/hashOrdered.

Show
Ghadi Shayban added a comment - Took care of 1) and 3). Punting on hiding the invokePrim behind a macro. It may be shameful but it works really fast. Another case found: (assoc {'(0 1 2 3 4) :foo} (range 5) :bar) needs to properly overwrite keys in the map. Cause: Might be irrelevant in the face of CLJ-1375. Util/equiv for maps doesn't use IPC/equiv unless the collection is also java.util.Collection or Map. If it is then it checks for IPersistentCollection. I added a check for IPersistentCollection first. https://github.com/ghadishayban/clojure/commits/for-screening To get hasheq working, I added back the iterators for use by Murmur3/hashOrdered.
Hide
Ghadi Shayban added a comment -

Handle hash and equality not through IPersistentCollection. w/ test-cases too.

Show
Ghadi Shayban added a comment - Handle hash and equality not through IPersistentCollection. w/ test-cases too.
Ghadi Shayban made changes -
Attachment CLJ-1515-deftype3.patch [ 13802 ]
Ghadi Shayban made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| deftype||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype2.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.


Serialization round-tripping, see the comment "regarding Serialization tests currently" for an explanation
Re-implement Iterable? (was implemented in CLJ-1515-deftype-nostructural-dup.patch)
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| CLJ-1515-deftype2 ||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype3.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| CLJ-1515-deftype2 ||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype3.patch
*Screened by:*

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| CLJ-1515-deftype2 ||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype3.patch
*Screened by:* Alex Miller

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Approval Incomplete [ 10006 ] Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Hide
Alex Miller added a comment -

Ghadi, we need to have range implement IObj too so with-meta works.

Show
Alex Miller added a comment - Ghadi, we need to have range implement IObj too so with-meta works.
Alex Miller made changes -
Approval Ok [ 10007 ] Incomplete [ 10006 ]
Hide
Ghadi Shayban added a comment -

Will add pronto but maybe someone can clarify something: Adding a simple clojure.lang.IObj/withMeta to the deftype results in an AbstractMethodError when trying to print a range. This is because the impl of vary-meta used in the default printer assumes that if all IObjs are also IMetas. Seems like a problem with either vary-meta's impl or the interface separation.

I can fix by:
1) adding a _meta field to Ranges and handling IMeta as well as IObj.

;; vary-meta:
(with-meta obj (apply f (meta obj) args)))

;; default printer
(defmethod print-method :default [o, ^Writer w]
  (if (instance? clojure.lang.IObj o)
    (print-method (vary-meta o #(dissoc % :type)) w)
    (print-simple o w)))
Show
Ghadi Shayban added a comment - Will add pronto but maybe someone can clarify something: Adding a simple clojure.lang.IObj/withMeta to the deftype results in an AbstractMethodError when trying to print a range. This is because the impl of vary-meta used in the default printer assumes that if all IObjs are also IMetas. Seems like a problem with either vary-meta's impl or the interface separation. I can fix by: 1) adding a _meta field to Ranges and handling IMeta as well as IObj.
;; vary-meta:
(with-meta obj (apply f (meta obj) args)))

;; default printer
(defmethod print-method :default [o, ^Writer w]
  (if (instance? clojure.lang.IObj o)
    (print-method (vary-meta o #(dissoc % :type)) w)
    (print-simple o w)))
Hide
Ghadi Shayban added a comment -

Ugh never mind that last bit I fell out of the hammock prematurely. IMeta << IObj.

Alex, CLJ-1603 needs IMeta/meta too.

Show
Ghadi Shayban added a comment - Ugh never mind that last bit I fell out of the hammock prematurely. IMeta << IObj. Alex, CLJ-1603 needs IMeta/meta too.
Hide
Alex Miller added a comment -

current direction is pending results of where CLJ-1603 goes

Show
Alex Miller added a comment - current direction is pending results of where CLJ-1603 goes
Alex Miller made changes -
Attachment clj-1515-12.patch [ 13993 ]
Attachment clj-1515-11.patch [ 13994 ]
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Approach:* this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).

The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.

The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.

The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.

The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.

The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.

*Note* The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.

*Performance*

timings done via criterium full benchmark on Oracle JDK

||code||1.7.0-alpha5 || Java impl clj-1515-10|| CLJ-1515-deftype2 ||
|(count (range (* 1024 1024))) | 61.30 ms | 26.22 ns | {color:green}12.67 ns{color}|
|(reduce + (map inc (range (* 1024 1024)))) | 56.21 ms | 47.80 ms | {color:green}46.87 ms{color}|
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 77.98 ms | 67.84 ms | {color:green}66.91 ms{color}|
|(count (keep odd? (range (* 1024 1024)))) | 73.68 ms | {color:green}57.33 ms{color} | 72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 50.56 ms | 36.12 ms | {color:green}28.46 ms{color}|
|(reduce + 0 (range (* 2048 1024))) | 81.86 ms | 39.22 ms | {color:green}26.44 ms{color}|
|(doall (range 0 31)) | 1.32 µs | 2.89 µs | {color:green}1.08 µs{color}|
|(doall (range 0 32)) | 1.34 µs | 2.96 µs | {color:green}1.12 µs{color}|
|(doall (range 0 128)) | 5.20 µs | 11.92 µs | {color:green}4.31 µs{color}|
|(doall (range 0 512)) | 21.24 µs | 47.71 µs | {color:green}17.91 µs{color}|
|(doall (range 0 4096)) | 167.86 µs | 381.15 µs | {color:green}141.48 µs{color}|
|(into [] (map inc (range 31))) | 2.00 µs | {color:green}1.49 µs{color} | 1.87 µs|
|(into [] (map inc) (range 31)) | 1.65 µs | {color:green}813.78 ns{color} | 929.51 ns|
|(into [] (range 128)) | 5.20 µs | {color:green}2.27 µs{color} | 2.44 µs|
|(doall (range 1/2 1000 1/3)) | {color:green}1.50 ms{color} | 1.69 ms | 1.50 ms|
|(doall (range 0.5 1000 0.33)) | 172.27 µs | 364.35 µs | {color:green}149.51 µs{color}|
|(into [] (range 1/2 1000 1/3)) | 1.52 ms | {color:green}1.42 ms{color} | 1.43 ms|
|(into [] (range 0.5 1000 0.33)) | 163.06 µs | {color:green}99.44 µs{color} | 103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range)))) | {color:green}187.79 ms{color} | 194.07 ms | 189.73 ms|
|(transduce (take (* 1024 1024)) + (range)) | {color:green}53.06 ms{color} | 94.19 ms | 89.34 ms|

*Performance notes:*

doall cases see a substantial benefit.

The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.

The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.

*Patch:* CLJ-1515-deftype3.patch
*Screened by:* Alex Miller

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no change for sequences but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Approach:* The clj-1515-12 patch revives the unused (but previously existing) clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this was done in the prior impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Summary Reify the result of range and add IReduceInit Reify the result of range and add IReduce
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no change for sequences but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Approach:* The clj-1515-12 patch revives the unused (but previously existing) clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this was done in the prior impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused (but previously existing) clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this was done in the prior impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no change for sequences but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Alex Miller made changes -
Summary Reify the result of range and add IReduce range should return something that implements IReduce
Alex Miller made changes -
Description Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused (but previously existing) clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this was done in the prior impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no change for sequences but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
range should implement IReduce for fast reduction.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Fogus made changes -
Assignee Alex Miller [ alexmiller ] Fogus [ fogus ]
Hide
Fogus added a comment -

Screened. Though the amount of duplicated code saddens me, I don't hold it against the implementer. It's a fairly straight-forward counting Impl with a chunk cache.

Show
Fogus added a comment - Screened. Though the amount of duplicated code saddens me, I don't hold it against the implementer. It's a fairly straight-forward counting Impl with a chunk cache.
Fogus made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Hide
Alex Miller added a comment -

New patch -13 removes the array allocation and performs better, still need to update timings and consider the non-long case.

Show
Alex Miller added a comment - New patch -13 removes the array allocation and performs better, still need to update timings and consider the non-long case.
Alex Miller made changes -
Attachment clj-1515-13.patch [ 14003 ]
Alex Miller made changes -
Description range should implement IReduce for fast reduction.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-12 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized long[] version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which is further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-12 (w/CLJ-1603) || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 62 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 49 ms | {color:green}38 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 66 ms | {color:green}60 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 76 ms | {color:green}49 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}29 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 64 ms | {color:green}26 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 60 ms | {color:green}17 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.20 µs | {color:green}0.72 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.22 µs | {color:green}0.82 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 147 µs | {color:green}105 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.80 µs | {color:green}1.32 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.55 µs | {color:green}0.72 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.90 µs | {color:green}2.14 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.29 ms | {color:green}1.24 ms{color} | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 136 µs | {color:orange}151 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.31 ms | {color:green}1.18 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 151 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 194 ms | {color:green}163 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}34 ms{color} | 68 ms |

*Performance notes:*

* 1515-12 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603, which was included for those timings.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
range should implement IReduce for fast reduction.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-13 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which was further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-13 || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 63 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 50 ms | {color:green}33 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 68 ms | {color:green}52 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 73 ms | {color:green}52 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}26 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 67 ms | {color:green}19 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 67 ms | {color:green}19 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.04 µs | {color:green}0.75 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.08 µs | {color:green}0.76 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 135 µs | {color:green}96 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.81 µs | {color:green}1.30 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.71 µs | {color:green}0.75 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.82 µs | {color:green}2.20 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.24 ms | 1.24 ms | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 126 µs | {color:orange}147 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.26 ms | {color:green}1.20 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 148 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 185 ms | {color:green}167 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}35 ms{color} | 68 ms |

*Performance notes:*

* 1515-13 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Alex Miller made changes -
Approval Screened [ 10004 ] Vetted [ 10003 ]
Fogus made changes -
Assignee Fogus [ fogus ]
Alex Miller made changes -
Attachment clj-1515-14.patch [ 14034 ]
Description range should implement IReduce for fast reduction.

*Patch:* clj-1515-12.patch - Java range with long optimizations

*Approach:* The clj-1515-12 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-13 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which was further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-13 || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 63 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 50 ms | {color:green}33 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 68 ms | {color:green}52 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 73 ms | {color:green}52 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}26 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 67 ms | {color:green}19 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 67 ms | {color:green}19 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.04 µs | {color:green}0.75 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.08 µs | {color:green}0.76 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 135 µs | {color:green}96 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.81 µs | {color:green}1.30 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.71 µs | {color:green}0.75 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.82 µs | {color:green}2.20 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.24 ms | 1.24 ms | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 126 µs | {color:orange}147 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.26 ms | {color:green}1.20 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 148 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 185 ms | {color:green}167 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}35 ms{color} | 68 ms |

*Performance notes:*

* 1515-13 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
range should implement IReduce for fast reduction.

*Patch:* clj-1515-14.patch - Java range with long optimizations

*Approach:* The clj-1515-14 patch revives the unused clojure.lang.Range class. This class is similar in implementation to ChunkedCons. It differs by lazily constructing the first chunk (this is done in the existing impl by using a LazySeq wrapper) and by caching the next reference like other seq impls. The latter is done because range is frequently used in both chunked and unchunked traversal.

Additionally, 1515-14 contains an optimized version of range (LongRange) for the extremely common case where start, end, and step are all longs. This version uses primitive longs and primitive math and a customized version of ArrayChunk for greater performance.

The special case of {{(range)}} is just handled with {{(iterate inc' 0)}} (which was further optimized for reduce in CLJ-1603).

*Alternatives:*

* CLJ-1515-deftype3.patch took the approach of using deftype to create a seqable, reducible entity that was not actually a seq. Based on work for CLJ-1603, it was established that due to historical requirements, this is not viable and the entity returned for range should implement the ISeq and Collection interfaces directly.

* clj-1515-11.patch uses the approach of a "split" implementation - a LazySeq that has a fast reduce path. This is a minimal patch that provides no perf difference for sequence usage but a moderate improvement for reduce paths. One important difference vs clj-1515-12 is that the fast reduce path is only obtained on the initial range. Once you walk off the head, the performance will be the same as prior (seq perf).

*Performance*

criterium quick-bench with java 1.8.0-b132

||code||1.7.0-master || Java clj-1515-14 || split clj-1515-11 ||
|(count (range (* 1024 1024))) | 63 ms | {color:green}0 ms{color}| 58 ms |
|(reduce + (map inc (range (* 1024 1024)))) | 50 ms | {color:green}33 ms{color}| 50 ms |
|(reduce + (map inc (map inc (range (* 1024 1024))))) | 68 ms | {color:green}52 ms{color}| 67 ms |
|(count (keep odd? (range (* 1024 1024)))) | 73 ms | {color:green}52 ms{color} | {color:green}67 ms{color} |
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) | 46 ms | {color:green}26 ms{color} | {color:green}34 ms{color} |
|(reduce + 0 (range (* 2048 1024))) | 67 ms | {color:green}19 ms{color} | {color:green}40 ms{color} |
|(reduce + 0 (rest (range (* 2048 1024)))) | 67 ms | {color:green}19 ms{color} | 57 ms |
|(doall (range 0 31)) | 1.04 µs | {color:green}0.75 µs{color} | 1.15 µs |
|(doall (range 0 32)) | 1.08 µs | {color:green}0.76 µs{color} | 1.18 µs |
|(doall (range 0 4096)) | 135 µs | {color:green}96 µs{color} | 145 µs |
|(into [] (map inc (range 31))) | 1.81 µs | {color:green}1.30 µs{color} | 1.88 µs |
|(into [] (map inc) (range 31)) | 1.71 µs | {color:green}0.75 µs{color} | {color:green}1.02 µs{color} |
|(into [] (range 128)) | 4.82 µs | {color:green}2.20 µs{color} | {color:green}3.27 µs{color} |
|(doall (range 1/2 1000 1/3)) | 1.24 ms | 1.24 ms | 1.28 µs |
|(doall (range 0.5 1000 0.33)) | 126 µs | {color:orange}147 µs{color} | 151 µs |
|(into [] (range 1/2 1000 1/3)) | 1.26 ms | {color:green}1.20 ms{color} | 1.26 ms |
|(into [] (range 0.5 1000 0.33)) | 148 µs | {color:green}91 µs{color} | {color:green}130 µs{color} |
|(count (filter odd? (take (* 1024 1024) (range)))) | 185 ms | {color:green}167 ms{color} | {color:green}176 ms{color} |
|(transduce (take (* 1024 1024)) + (range)) | 67 ms | {color:green}35 ms{color} | 68 ms |

*Performance notes:*

* 1515-14 (Java) - significant improvements in virtually all use cases, both seq and reduce. The final two cases with {{(range)}} leverage optimizations from CLJ-1603.

* 1515-11 (split) - a much smaller patch that gives no seq benefits and smaller reduction benefits. Only the head of the range will receive the perf benefits (see {{(reduce + 0 (rest (range (* 2048 1024))))}}).

*Questions*
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.
Hide
Alex Miller added a comment -

The -14 patch just switches to leverage Repeat now that 1603 has been applied.

Show
Alex Miller added a comment - The -14 patch just switches to leverage Repeat now that 1603 has been applied.
Hide
Fogus added a comment -

The latest patch (-14) handles the extraneous allocations from -12. All tests run and timings verified.

Show
Fogus added a comment - The latest patch (-14) handles the extraneous allocations from -12. All tests run and timings verified.
Fogus made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Status Open [ 1 ] Closed [ 6 ]
Resolution Completed [ 1 ]

People

Vote (2)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: