take-nth transducer could be faster without rem

Description

The take-nth transducer is calling rem on each index, which is relatively expensive compared to a zero? test. It could just count down from N instead as the step size is fixed.

Environment

Mac OS X 10.10.2, JDK 1.8.0_31

Attachments

1
  • 20 Feb 2015, 06:32 PM

Activity

Show:

Renzo Borgatti March 27, 2018 at 5:07 PM

GIGO case, but rem is also responsible for:

user=> (take-nth 2.5 (range 10)) (0 3 6 9) user=> (sequence (take-nth 2.5) (range 10)) (0 5)

The patch by Steve (CLJ-1665-faster-take-nth-transducer-without-rem.patch) is just missing a cast to int to solve the above:

(defn take-nth [n] (fn [rf] (let [n (int n) iv (volatile! 1)] (fn ([] (rf)) ([result] (rf result)) ([result input] (let [i (vswap! iv dec)] (if (zero? i) (do (vreset! iv n) (rf result input)) result)))))))

Michael Blume February 20, 2015 at 6:55 PM

Nice =)

I would say that the transducer version really ought to match the collection version as closely as possible, but I don't think there's actually a way to write a transducer that transforms a finite sequence into an infinite sequence, so no luck there.

Maybe while we're at it we should change both the transducer and the collection arities to throw on zero?

Steve Miner February 20, 2015 at 6:41 PM

I didn't worry about (take-nth 0) case, but my patch does give a different result. The current implementation gets a divide by zero error (from rem). My patched version returns just the first element once. The regular collection version returns an infinite sequence of the first element. I doubt anyone expects a sensible answer from the 0 case so I didn't try to do anything special with it.

Steve Miner February 20, 2015 at 6:34 PM

Patch attached. It's about 25% faster on a simple test like:

(time (transduce (take-nth 13) + (range 1e7)))

Details

Assignee

Reporter

Approval

Triaged

Patch

Code

Priority

Affects versions

Created February 20, 2015 at 6:31 PM
Updated March 27, 2018 at 5:18 PM