Clojure

reduce gives a SO for pathological seqs

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Environment:
    1.5.1
  • Patch:
    Code and Test

Description

reduce gives a StackOverflowError on long sequences that contain many transitions between chunked and unchunked:

(->> (repeat 50000 (cons :x [:y]))
     (apply concat)
     (reduce (constantly nil)))
;; throws StackOverflowError

Such a sequence is well behaved under most other sequence operations, and its underlying structure can even be masked such that reduce succeeds:

(->> (repeat 50000 (cons :x [:y]))
     (apply concat)
     (take 10000000)
     (reduce (constantly nil)))
;; => nil

I don't think Clojure developers normally worry about mixing chunked and unchunked seqs, so the existence of such a sequence is not at all unreasonable (and indeed this happened to me at work and was very difficult to debug).

It seems obvious what causes this given the implementation of reduce – it bounces back and forth between the chunked impl and the unchunked impl, consuming more and more stack as it goes. Without proper tail call optimization, it's not obvious to me what a good fix would be.

Presumed bad solutions

Degrade to naive impl after first chunk

In the IChunkedSeq implementation, instead of calling internal-reduce when the
sequence stops being chunked, it could have an (inlined?) unoptimized implementation,
ensuring that no further stack space is taken up. This retains the behavior that a
generic seq with a chunked tail will still run in an optimized fashion, but a seq with
two chunked portions would only be optimized the first time.

Use clojure.core/trampoline

This would presumably work, but requires wrapping the normal return values from all
implementations of internal-reduce.

Proposed Solution

(attached as CLJ-1237c.patch)

Similar to using trampoline, but create a special type (Unreduced) that signals
an implementation change. The two implementation-change points in internal-reduce
(in the IChunkedSeq impl and the Object impl) are converted to return an instance
of Unreduced instead of a direct call to internal-reduce.

Then seq-reduce is converted to check for instances of Unreduced before returning,
and recurs if it finds one.

Pros

  • Only requires one additional check in most cases
  • Reduces stack usage for existing heterogeneous reductions that weren't extreme enough to crash
  • Should be compatible with 3rd-party implementations of internal-reduce, which can still use the old style (direct recursive calls to internal-reduce) or the optimized style if desired.

Cons

  • internal-reduce is slightly more complicated
  • There's an extra check at the end of seq-reduce – is that a performance worry?

Activity

Hide
Gary Fredericks added a comment -

Added patch.

Show
Gary Fredericks added a comment - Added patch.

People

Vote (1)
Watch (4)

Dates

  • Created:
    Updated: