Clojure

Reify the result of range

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Minor Minor
  • Resolution: Unresolved
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code and Test
  • Approval:
    Incomplete

Description

Currently range simply returns a lazy seq. If the return value of range were reified into a type (as it is in ClojureScript) we could optimize many functions on that resulting type. Some operations such as count and nth become O(1) in this case, while others such as reduce could receive a performance boost do to the reduced number of allocations.

Approach: this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch uses Numbers.* methods for all math due to the input types to range being unknown. The class also supplies a .iterator() method which allows for allocation free reducing over range.

Note: this code keeps backwards compatibility with the existing range code. This means some parts of the class (mostly relating to a step size of 0) are a bit more complex than desired, but these bits were needed to get all the tests to pass.

Note: this code does not preserve the chunked-seq nature of the original range. The fact that range used to return chunked seqs was not published in the doc strings and so it was removed to allow for simpler code in Range.java.

Patch: range-patch3.diff

  1. patch.diff
    29/Aug/14 3:34 PM
    9 kB
    Timothy Baldridge
  2. range-patch3.diff
    19/Sep/14 10:02 AM
    11 kB
    Timothy Baldridge

Activity

Hide
Alex Miller added a comment -

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Show
Alex Miller added a comment - 1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable. 2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end 3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable. 4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1? 5) _hash and _hasheq should be transient. 6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once. 7) Why the change in test/clojure/test_clojure/serialization.clj ? 8) Can you squash into a single commit?
Hide
Timothy Baldridge added a comment -

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Show
Timothy Baldridge added a comment - 1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this? 2) exception added 3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too. 4) fixed hash init values, defaults to -1 like ASeq 5) hash fields are now transient 6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it? 7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN. 8) squashed.
Hide
Alex Miller added a comment -

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable

Show
Alex Miller added a comment - 6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common 7) yep, reasonable
Hide
Andy Fingerhut added a comment - - edited

I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets

Show
Andy Fingerhut added a comment - - edited I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets
Hide
Timothy Baldridge added a comment -

New patch with IReduce directly on Range instead of relying on iterators

Show
Timothy Baldridge added a comment - New patch with IReduce directly on Range instead of relying on iterators

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated: