Affects Version/s: Release 1.7
Fix Version/s: Release 1.7
Patch:Code and Test
Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.
Approach: this patch implements range as a deftype. The range implementation is specialized for primitive longs (LongRange, a common case), as well one with generic math (GenericRange).
The types implement IReduce,IReduceInit,Counted,Seqable,IHashEq. Sequential and Serializable are marker interfaces.
The existing range lazy-seq was renamed to range* (instead of range1 b/c range's arguments are numbers.) The deftypes defined later delegate their seq implementation to it. Still chunked. To achieve the most performance out of this delegation, it required a tiny tweak to doall/dorun, see the note below. range* handles only ascending or descending sequences.
The implementation was carefully tuned using criterium for benchmarking. Rather than directly using < and > as comparators, it performed far better to close over the range upper bound, e.g. #(< % end). < and > are inlining macros. The boxing/unboxing is minimized within the LongRange implementation.
The range constructor figures out type specialization, calls create-range, which filters out the wacky cases (empty seqs, the repeat case), and then creates the types if necessary. The deftypes handle only ascending or descending nonempty cases.
The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603). The original contract range being infinite was never handled correctly, as it used inc, not inc'. The GenericRange impl does not use autopromotion, and opts for preserving the implementation behavior. The only path for auto-promotion with this patch is through the 0-arity delegation to iterate.
Note The patch also includes a tweak to doall/dorun in order to handle types whose seq implementation is not inline, but in a separate object. As noted in the commit message, dorun currently calls (seq coll), then drops the result, then calls (next coll), resulting in a double allocation of the seq when the collection doesn't cache its seq.
timings done via criterium full benchmark on Oracle JDK
|code||1.7.0-alpha5||Java impl clj-1515-10||CLJ-1515-deftype2|
|(count (range (* 1024 1024)))||61.30 ms||26.22 ns||12.67 ns|
|(reduce + (map inc (range (* 1024 1024))))||56.21 ms||47.80 ms||46.87 ms|
|(reduce + (map inc (map inc (range (* 1024 1024)))))||77.98 ms||67.84 ms||66.91 ms|
|(count (keep odd? (range (* 1024 1024))))||73.68 ms||57.33 ms||72.27 ms|
|(transduce (comp (map inc) (map inc)) + (range (* 1024 1024)))||50.56 ms||36.12 ms||28.46 ms|
|(reduce + 0 (range (* 2048 1024)))||81.86 ms||39.22 ms||26.44 ms|
|(doall (range 0 31))||1.32 µs||2.89 µs||1.08 µs|
|(doall (range 0 32))||1.34 µs||2.96 µs||1.12 µs|
|(doall (range 0 128))||5.20 µs||11.92 µs||4.31 µs|
|(doall (range 0 512))||21.24 µs||47.71 µs||17.91 µs|
|(doall (range 0 4096))||167.86 µs||381.15 µs||141.48 µs|
|(into  (map inc (range 31)))||2.00 µs||1.49 µs||1.87 µs|
|(into  (map inc) (range 31))||1.65 µs||813.78 ns||929.51 ns|
|(into  (range 128))||5.20 µs||2.27 µs||2.44 µs|
|(doall (range 1/2 1000 1/3))||1.50 ms||1.69 ms||1.50 ms|
|(doall (range 0.5 1000 0.33))||172.27 µs||364.35 µs||149.51 µs|
|(into  (range 1/2 1000 1/3))||1.52 ms||1.42 ms||1.43 ms|
|(into  (range 0.5 1000 0.33))||163.06 µs||99.44 µs||103.18 µs|
|(count (filter odd? (take (* 1024 1024) (range))))||187.79 ms||194.07 ms||189.73 ms|
|(transduce (take (* 1024 1024)) + (range))||53.06 ms||94.19 ms||89.34 ms|
doall cases see a substantial benefit.
The deftype has very competitive performance considering it is competing against the Java impl, which uses unchecked math incorrectly.
The final two cases of 0-arity (range) will get better pending the merge of the Iterate deftype (CLJ-1603) or pending validity of the inc' approach.
Screened by: Alex Miller
(range) and supports auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.