Clojure

Reify the result of range and add IReduceInit

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Vetted

Description

Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

Approach: this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603).

Note: The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

Performance:
timings done via criterium quick-bench

expr 1.6.0 1.7.0-alpha4 +patch
(count (filter odd? (take (* 1024 1024) (range)))) 183 ms 173 ms 170 ms
(transduce (take (* 1024 1024)) + (range)) n/a 67 ms 81 ms (w/CLJ-1603: 41 ms)
(count (range (* 1024 1024))) 75 ms 69 ms 0 ms
(reduce + (map inc (range (* 1024 1024)))) 71 ms 68 ms 46 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 89 ms 91 ms 69 ms
(count (filter odd? (range (* 1024 1024)))) 69 ms 65 ms 43 ms
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) n/a 67 ms 36 ms
(doall (range 0 31)) 1.41 µs 1.51 µs 3.02 µs
(into [] (map inc (range 31))) 1.76 µs 1.77 µs 1.43 µs
(into [] (map inc) (range 31)) n/a 1.60 µs 0.63 µs
(doall (range 1/2 1000 1/3)) 1.58 ms 1.53 ms 1.66 ms
(into [] (range 1/2 1000 1/3)) 1.52 ms 1.51 ms 1.38 ms
(doall (range 0.5 1000 0.33)) 0.15 ms 0.14 ms 0.35 ms
(into [] (range 0.5 1000 0.33)) 0.13 ms 0.12 ms 0.08 ms

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

Patch: clj-1515-9.patch

Screened by:

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.

  1. clj-1515.patch
    09/Dec/14 2:40 AM
    23 kB
    Alex Miller
  2. clj-1515-2.patch
    10/Dec/14 5:12 PM
    23 kB
    Alex Miller
  3. clj-1515-3.patch
    10/Dec/14 11:34 PM
    22 kB
    Alex Miller
  4. clj-1515-4.patch
    11/Dec/14 12:11 AM
    22 kB
    Alex Miller
  5. clj-1515-5.patch
    11/Dec/14 10:20 AM
    22 kB
    Alex Miller
  6. clj-1515-6.patch
    11/Dec/14 11:31 AM
    22 kB
    Alex Miller
  7. clj-1515-7.patch
    11/Dec/14 11:49 AM
    22 kB
    Alex Miller
  8. clj-1515-8.patch
    11/Dec/14 3:49 PM
    23 kB
    Alex Miller
  9. clj-1515-9.patch
    12/Dec/14 2:11 PM
    22 kB
    Alex Miller
  10. patch.diff
    29/Aug/14 3:34 PM
    9 kB
    Timothy Baldridge
  11. range-patch3.diff
    19/Sep/14 10:02 AM
    11 kB
    Timothy Baldridge
  12. reified-range4.diff
    31/Oct/14 11:30 AM
    16 kB
    Timothy Baldridge

Activity

Hide
Alex Miller added a comment -

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Show
Alex Miller added a comment - 1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable. 2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end 3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable. 4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1? 5) _hash and _hasheq should be transient. 6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once. 7) Why the change in test/clojure/test_clojure/serialization.clj ? 8) Can you squash into a single commit?
Hide
Timothy Baldridge added a comment -

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Show
Timothy Baldridge added a comment - 1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this? 2) exception added 3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too. 4) fixed hash init values, defaults to -1 like ASeq 5) hash fields are now transient 6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it? 7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN. 8) squashed.
Hide
Alex Miller added a comment -

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable

Show
Alex Miller added a comment - 6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common 7) yep, reasonable
Hide
Andy Fingerhut added a comment - - edited

I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets

Show
Andy Fingerhut added a comment - - edited I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets
Hide
Timothy Baldridge added a comment -

New patch with IReduce directly on Range instead of relying on iterators

Show
Timothy Baldridge added a comment - New patch with IReduce directly on Range instead of relying on iterators
Hide
Alex Miller added a comment - - edited

The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is.

Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.

Show
Alex Miller added a comment - - edited The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is. Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.
Hide
Timothy Baldridge added a comment -

Looks like chunked seqs do make lazy seq code about 5x faster in these tests.

Show
Timothy Baldridge added a comment - Looks like chunked seqs do make lazy seq code about 5x faster in these tests.
Hide
Ghadi Shayban added a comment -

I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge.

Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)

Show
Ghadi Shayban added a comment - I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge. Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)
Hide
Timothy Baldridge added a comment -

I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)

Show
Timothy Baldridge added a comment - I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)
Hide
Alex Miller added a comment -

I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty.

On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.

Show
Alex Miller added a comment - I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty. On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.
Hide
Ghadi Shayban added a comment -

Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.

Show
Ghadi Shayban added a comment - Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.
Hide
Timothy Baldridge added a comment -

Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.

Show
Timothy Baldridge added a comment - Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.
Hide
Nicola Mometto added a comment -

As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.

Show
Nicola Mometto added a comment - As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.
Hide
Alex Miller added a comment -

Still a work in progress...

Show
Alex Miller added a comment - Still a work in progress...
Hide
Nicola Mometto added a comment - - edited

Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit.

Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.

Show
Nicola Mometto added a comment - - edited Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit. Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.
Hide
Alex Miller added a comment -

That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.

Show
Alex Miller added a comment - That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.
Hide
Ghadi Shayban added a comment -

I am impressed and have learned a ton through this exercise.

quick review of clj-1515-2
1) withMeta gives the newly formed object the wrong meta.
2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break.
3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl.
4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1
5) s/instanceof Reduced/RT.isReduced
6) is the overflow behavior of "int count()" correct?

Show
Ghadi Shayban added a comment - I am impressed and have learned a ton through this exercise. quick review of clj-1515-2 1) withMeta gives the newly formed object the wrong meta. 2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break. 3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl. 4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1 5) s/instanceof Reduced/RT.isReduced 6) is the overflow behavior of "int count()" correct?
Hide
Alex Miller added a comment -

1) agreed!
2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange.
3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics.
4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better.
5) agreed
6) probably not. ties into 2/3.

Thanks for this, will address.

Show
Alex Miller added a comment - 1) agreed! 2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange. 3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics. 4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better. 5) agreed 6) probably not. ties into 2/3. Thanks for this, will address.
Hide
Alex Miller added a comment -

Added -4 patch that addresses 1,4,5 but not the (range) stuff.

Show
Alex Miller added a comment - Added -4 patch that addresses 1,4,5 but not the (range) stuff.
Hide
Alex Miller added a comment -

Latest -7 patch addresses all feedback and perf #s updated.

Show
Alex Miller added a comment - Latest -7 patch addresses all feedback and perf #s updated.

People

Vote (1)
Watch (5)

Dates

  • Created:
    Updated: