reduce gives a SO for pathological seqs

Description

reduce gives a StackOverflowError on long sequences that contain many transitions between chunked and unchunked:

Such a sequence is well behaved under most other sequence operations, and its underlying structure can even be masked such that reduce succeeds:

I don't think Clojure developers normally worry about mixing chunked and unchunked seqs, so the existence of such a sequence is not at all unreasonable (and indeed this happened to me at work and was very difficult to debug).

Cause: The implementation of reduce can bounce back and forth between CollReduce and InternalReduce, consuming a stack frame in every transition. Given enough transitions, the stack will overflow.

Approach: Degrade to naive impl after first chunk

In the IChunkedSeq implementation, instead of calling internal-reduce when the
sequence stops being chunked, it could have an (inlined?) unoptimized implementation,
ensuring that no further stack space is taken up. This retains the behavior that a
generic seq with a chunked tail will still run in an optimized fashion, but a seq with
two chunked portions would only be optimized the first time.

Rejected alternatives:

1) Use clojure.core/trampoline - This would presumably work, but requires wrapping the normal return values from all implementations of internal-reduce.

2) Create a special type (Unreduced) that signals
an implementation change. The two implementation-change points in internal-reduce
(in the IChunkedSeq impl and the Object impl) are converted to return an instance
of Unreduced instead of a direct call to internal-reduce. Then seq-reduce is converted to check for instances of Unreduced before returning,
and recurs if it finds one.

Patch: CLJ-1237f.patch

Screened by: Alex Miller

Environment

1.5.1

Attachments

5

Activity

Show:

Alex Miller June 5, 2015 at 3:14 PM

Removing incomplete for re-screening

gfredericks June 2, 2015 at 1:27 AM

Uploaded CLJ-1237f.patch containing the latest tactic that Alex Miller cooked up.

gfredericks June 1, 2015 at 2:00 AM

Okay so in particular I should not do the simplification of the Object impl (where it's only naive), and still have it do its class equality checks, but then add in this IReduceInit check at the end once the class equality check fails.

Sounds reasonable to me at first glance.

Alex Miller May 31, 2015 at 7:15 PM

I had another thought about this based on some other variants of the reduce stuff I've played with - what if in the cases where we currently were calling back into coll-reduce, we made an explicit check for IReduceInit and in that case invoked .reduce directly (similar to what we do in transduce), otherwise degrade to naive. In that case, we'd get fast reduce on a persistent collection tail in the quite possibly real case of something like (cons 3 (vec (range 10000))).

gfredericks May 29, 2015 at 12:52 PM

Attaching the call diagram I made while analyzing the code, in case it's helpful to anybody.

Completed

Details

Assignee

Reporter

Approval

Ok

Patch

Code and Test

Priority

Affects versions

Fix versions

Created July 27, 2013 at 11:31 PM
Updated June 17, 2015 at 5:18 PM
Resolved June 17, 2015 at 5:18 PM