Clojure

reduce gives a SO for pathological seqs

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
    None
  • Environment:
    1.5.1
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

reduce gives a StackOverflowError on long sequences that contain many transitions between chunked and unchunked:

(->> (repeat 50000 (cons :x [:y]))
     (apply concat)
     (reduce (constantly nil)))
;; throws StackOverflowError

Such a sequence is well behaved under most other sequence operations, and its underlying structure can even be masked such that reduce succeeds:

(->> (repeat 50000 (cons :x [:y]))
     (apply concat)
     (take 10000000)
     (reduce (constantly nil)))
;; => nil

I don't think Clojure developers normally worry about mixing chunked and unchunked seqs, so the existence of such a sequence is not at all unreasonable (and indeed this happened to me at work and was very difficult to debug).

Cause: The implementation of reduce can bounce back and forth between CollReduce and InternalReduce, consuming a stack frame in every transition. Given enough transitions, the stack will overflow.

Approach: Degrade to naive impl after first chunk

In the IChunkedSeq implementation, instead of calling internal-reduce when the
sequence stops being chunked, it could have an (inlined?) unoptimized implementation,
ensuring that no further stack space is taken up. This retains the behavior that a
generic seq with a chunked tail will still run in an optimized fashion, but a seq with
two chunked portions would only be optimized the first time.

Rejected alternatives:

1) Use clojure.core/trampoline - This would presumably work, but requires wrapping the normal return values from all implementations of internal-reduce.

2) Create a special type (Unreduced) that signals
an implementation change. The two implementation-change points in internal-reduce
(in the IChunkedSeq impl and the Object impl) are converted to return an instance
of Unreduced instead of a direct call to internal-reduce. Then seq-reduce is converted to check for instances of Unreduced before returning,
and recurs if it finds one.

Patch: CLJ-1237f.patch

Screened by: Alex Miller

  1. CLJ-1237c.patch
    25/Aug/13 4:13 PM
    3 kB
    Gary Fredericks
  2. CLJ-1237d.patch
    27/May/15 10:27 PM
    3 kB
    Gary Fredericks
  3. CLJ-1237e.patch
    28/May/15 10:23 PM
    3 kB
    Gary Fredericks
  4. CLJ-1237f.patch
    01/Jun/15 7:27 PM
    3 kB
    Gary Fredericks
  5. reduce.svg
    29/May/15 6:52 AM
    17 kB
    Gary Fredericks

Activity

People

Vote (4)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: