Clojure

Reducer (and folder) instances hold onto the head of seqs

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.8
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Vetted

Description

Problem Statement
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

Reproduction steps:
Compare the following calls:

(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))

The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

Cause: #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

Patch
CLJ-1250-08-29.patch

Approach:

Clear the reference to 'this' on the stack just before a tail call occurs

Removes calls to emitClearLocals(), which is a no-op.

When the context is RETURN (indicating a tail call), and the operation
is an InvokeExpr, StaticMethodExpr, or InstanceMethodExpr, clear the
reference to 'this' which is in slot 0 of the locals.

Edge-case: Inside the body of a try block, we cannot clear 'this' at the tail
position as we might need to keep refs around for use in the catch or finally
clauses. Introduces another truthy dynamic binding var to track position being
inside a try block.

Adds two helpers to emitClearThis and inTailCall.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

Alternate Approaches:

1) Reimplement the #'reducer (and #'folder) transformation fns similar to the manner that Christophe proposes here:

(defrecord Reducer [coll xf])

(extend-protocol 
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer) 

(defn rmap [f coll]
  (rreducer coll (fn [g] 
                   (fn [acc x]
                     (g acc (f x))))))

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

2) Alternate approach

from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.

Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

Mitigations
Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.

  1. after-change.txt
    11/Dec/13 11:17 PM
    4 kB
    Ghadi Shayban
  2. before-change.txt
    11/Dec/13 11:17 PM
    4 kB
    Ghadi Shayban
  3. CLJ-1250-08-29.patch
    29/Aug/14 3:06 PM
    15 kB
    Ghadi Shayban
  4. CLJ-1250-08-29-ws.patch
    26/Sep/14 12:51 PM
    14 kB
    Ghadi Shayban
  5. CLJ-1250-20131211.patch
    11/Dec/13 11:08 PM
    4 kB
    Ghadi Shayban
  6. CLJ-1250-AllInvokeSites-20140204.patch
    04/Feb/14 3:50 PM
    14 kB
    Ghadi Shayban
  7. CLJ-1250-AllInvokeSites-20140320.patch
    20/Mar/14 4:39 PM
    15 kB
    Ghadi Shayban

Activity

Alex Miller made changes -
Field Original Value New Value
Labels memory
Gary Fredericks made changes -
Description To reproduce the problem, compare:
{code}
 (time (reduce + 0 (map identity (range 1e8))))
 (time (reduce + 0 (r/map identity (range 1e8))))
{code}

(If reducers are faster than seqs, increase the range.)

The sequence can't be garbage collected because the reducer can't be garbage collected because the first local is never cleared.
Clearing the first local is tricky because the "this" object bears the closed-over locals and misc caches. So its lifetime is longer than expected and harder to compute.

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for more details.

A proposed solution is to clear "this" before tail calls.
To reproduce the problem, compare:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}

(If reducers are faster than seqs, increase the range.)

The sequence can't be garbage collected because the reducer can't be garbage collected because the first local is never cleared.
Clearing the first local is tricky because the "this" object bears the closed-over locals and misc caches. So its lifetime is longer than expected and harder to compute.

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for more details.

A proposed solution is to clear "this" before tail calls.
Alex Miller made changes -
Labels memory memory reducers
Alex Miller made changes -
Approval Triaged [ 10120 ]
Priority Minor [ 4 ] Major [ 3 ]
Alex Miller made changes -
Priority Major [ 3 ] Critical [ 2 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-20131211.patch [ 12531 ]
Ghadi Shayban made changes -
Attachment after-change.txt [ 12533 ]
Attachment before-change.txt [ 12532 ]
Ghadi Shayban made changes -
Patch Code [ 10001 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-AllInvokeSites-20131214.patch [ 12540 ]
Ghadi Shayban made changes -
Description To reproduce the problem, compare:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}

(If reducers are faster than seqs, increase the range.)

The sequence can't be garbage collected because the reducer can't be garbage collected because the first local is never cleared.
Clearing the first local is tricky because the "this" object bears the closed-over locals and misc caches. So its lifetime is longer than expected and harder to compute.

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for more details.

A proposed solution is to clear "this" before tail calls.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Ghadi Shayban made changes -
Attachment CLJ-1250-AllInvokeSites-20140113.patch [ 12702 ]
Ghadi Shayban made changes -
Ghadi Shayban made changes -
Attachment CLJ-1250-AllInvokeSites-20131214.patch [ 12540 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-AllInvokeSites-20140113.patch [ 12702 ]
Ghadi Shayban made changes -
Rich Hickey made changes -
Fix Version/s Release 1.7 [ 10250 ]
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Ghadi Shayban made changes -
Description *Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Patch*
CLJ-1250-AllInvokeSites-20140320.patch takes Approach #2

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-08-22.patch [ 13250 ]
Alex Miller made changes -
Patch Code [ 10001 ] Code and Test [ 10002 ]
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Ghadi Shayban made changes -
Description *Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Patch*
CLJ-1250-AllInvokeSites-20140320.patch takes Approach #2

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Current Patch*
CLJ-1250-08-22.patch

*Chosen Approach*
Approach #2, clearing the 'this' reference at all tail calls.

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Ghadi Shayban made changes -
Attachment CLJ-1250-08-29.patch [ 13278 ]
Ghadi Shayban made changes -
Description *Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Current Patch*
CLJ-1250-08-22.patch

*Chosen Approach*
Approach #2, clearing the 'this' reference at all tail calls.

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Current Patch*
CLJ-1250-08-29.patch

*Chosen Approach*
Approach #2, clearing the 'this' reference at all tail calls.

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Ghadi Shayban made changes -
Attachment CLJ-1250-08-29.patch [ 13280 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-08-29.patch [ 13278 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-08-22.patch [ 13250 ]
Alex Miller made changes -
Description *Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)


*Current Patch*
CLJ-1250-08-29.patch

*Chosen Approach*
Approach #2, clearing the 'this' reference at all tail calls.

*Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

*Advantages*: Relatively non-invasive change.
*Disadvantages*: Not evident code. Additional protocol dispatch, though only incurred once

*2) Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

*Advantages:* Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
*Disadvantages:* A compiler change.

*3) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
*Advantages*: Fine-grained
*Disadvantages*: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Patch*
CLJ-1250-08-29.patch

*Approach:*

*Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

*Alternate Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

*2) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Priority Critical [ 2 ] Major [ 3 ]
Alex Miller made changes -
Description *Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Patch*
CLJ-1250-08-29.patch

*Approach:*

*Clear the reference to 'this' on the stack just before a tail call occurs*

When a callee is in return position, clear the local variable reference to 'this' in the caller's stack frame, which will make the caller and all its closed-overs eligible for reclamation.

Patch 1211 takes this approach for InvokeExpr call sites in return position. Patch 1214 takes the same approach for InvokeExprs and also static and instance interop calls.

Here is the code that performs the clearing excerpted from the 1214 patch:
{code}
void emitClearThis(GeneratorAdapter gen) {
gen.visitInsn(Opcodes.ACONST_NULL);
gen.visitVarInsn(Opcodes.ASTORE, 0);
}
{code}

Tail calls wrapped inside a try/catch/finally clause cannot have 'this' cleared, because closed-overs/locals may need to be emitted for exception handling blocks. Both patches consider and handle this edge case.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

*Alternate Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

*2) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
*Problem Statement*
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

*Reproduction steps:*
Compare the following calls:
{code}
(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))
{code}
The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

*Cause:* #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

*Patch*
CLJ-1250-08-29.patch

*Approach:*

*Clear the reference to 'this' on the stack just before a tail call occurs*

Removes calls to emitClearLocals(), which is a no-op.

When the context is RETURN (indicating a tail call), and the operation
is an InvokeExpr, StaticMethodExpr, or InstanceMethodExpr, clear the
reference to 'this' which is in slot 0 of the locals.

Edge-case: Inside the body of a try block, we cannot clear 'this' at the tail
position as we might need to keep refs around for use in the catch or finally
clauses. Introduces another truthy dynamic binding var to track position being
inside a try block.

Adds two helpers to emitClearThis and inTailCall.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

*Alternate Approaches:*

*1) Reimplement the #'reducer* (and #'folder) transformation fns similar to the manner that Christophe proposes here:
{code}
(defrecord Reducer [coll xf])

(extend-protocol
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer)

(defn rmap [f coll]
  (rreducer coll (fn [g]
                   (fn [acc x]
                     (g acc (f x))))))
{code}

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

*2) Alternate approach*

{quote}
from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.
{quote}
Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

*Mitigations*
  Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Approval Vetted [ 10003 ] Incomplete [ 10006 ]
Ghadi Shayban made changes -
Attachment CLJ-1250-08-29-ws.patch [ 13368 ]
Alex Miller made changes -
Approval Incomplete [ 10006 ] Vetted [ 10003 ]
Alex Miller made changes -
Fix Version/s Release 1.7 [ 10250 ]
Fix Version/s Release 1.8 [ 10254 ]

People

Vote (9)
Watch (13)

Dates

  • Created:
    Updated: