<< Back to previous view

[CLJ-1250] Reducer (and folder) instances hold onto the head of seqs Created: 03/Sep/13  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 9
Labels: memory, reducers

Attachments: Text File after-change.txt     Text File before-change.txt     Text File CLJ-1250-08-29.patch     Text File CLJ-1250-08-29-ws.patch     Text File CLJ-1250-20131211.patch     Text File CLJ-1250-AllInvokeSites-20140204.patch     Text File CLJ-1250-AllInvokeSites-20140320.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Problem Statement
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

Reproduction steps:
Compare the following calls:

(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))

The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

Cause: #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

Patch
CLJ-1250-08-29.patch

Approach:

Clear the reference to 'this' on the stack just before a tail call occurs

Removes calls to emitClearLocals(), which is a no-op.

When the context is RETURN (indicating a tail call), and the operation
is an InvokeExpr, StaticMethodExpr, or InstanceMethodExpr, clear the
reference to 'this' which is in slot 0 of the locals.

Edge-case: Inside the body of a try block, we cannot clear 'this' at the tail
position as we might need to keep refs around for use in the catch or finally
clauses. Introduces another truthy dynamic binding var to track position being
inside a try block.

Adds two helpers to emitClearThis and inTailCall.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

Alternate Approaches:

1) Reimplement the #'reducer (and #'folder) transformation fns similar to the manner that Christophe proposes here:

(defrecord Reducer [coll xf])

(extend-protocol 
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer) 

(defn rmap [f coll]
  (rreducer coll (fn [g] 
                   (fn [acc x]
                     (g acc (f x))))))

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

2) Alternate approach

from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.

Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

Mitigations
Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.



 Comments   
Comment by Gary Fredericks [ 03/Sep/13 8:53 AM ]

Fixed indentation in description.

Comment by Ghadi Shayban [ 11/Dec/13 11:08 PM ]

Adding a patch that clears "this" before tail calls. Verified that Christophe's repro case is fixed.

Will upload a diff of the bytecode soon.

Any reason this juicy bug was taken off 1.6?

Comment by Ghadi Shayban [ 11/Dec/13 11:17 PM ]

Here's the bytecode for the clojure.core.reducers/reducer reify before and after the change... Of course a straight diff isn't useful because all the line numbers changed. Kudos to Gary Trakhman for the no.disassemble lein plugin.

Comment by Christophe Grand [ 12/Dec/13 6:58 AM ]

Ghadi, I'm a bit surprised by this part of the patch: was the local clearing always a no-op here?

-		if(context == C.RETURN)
+		if(shouldClear)
 			{
-			ObjMethod method = (ObjMethod) METHOD.deref();
-			method.emitClearLocals(gen);
+                            gen.visitInsn(Opcodes.ACONST_NULL);
+                            gen.visitVarInsn(Opcodes.ASTORE, 0);
 			}

The problem with this approach (clear this on tail call) is that it adds yet another special case. To me the complexity stem from having to keep this around even if the user code doesn't refer to it.

Comment by Ghadi Shayban [ 12/Dec/13 7:19 AM ]

Thank you - I failed to mention this in the commit message: it appears that emitClearLocals() belonging to both ObjMethod and FnMethod (its child) are empty no-ops. I believe the actual local clearing is on line 4855.

I agree re: another special case in the compiler.

Comment by Alex Miller [ 12/Dec/13 8:56 AM ]

Ghadi re 1.6 - this ticket was never in the 1.6 list, it has not yet been vetted by Rich but is ready to do so when we open up again after 1.6.

Comment by Ghadi Shayban [ 12/Dec/13 8:59 AM ]

Sorry I confused the critical list with the Rel1.6 list.

Comment by Ghadi Shayban [ 14/Dec/13 11:16 AM ]

New patch 20131214 that handles all tail invoke sites (InvokeExpr + StaticMethodExpr + InstanceMethodExpr). 'StaticInvokeExpr' seems like an old remnant that had no active code path, so that was left as-is.

The approach taken is still the same as the original small patch that addressed only InvokeExpr, except that it is now using a couple small helpers. The commit message has more details.

Also a 'try' block with no catch or finally clause now becomes a BodyExpr. Arguably a user error, historically accepted, and still accepted, but now they are a regular BodyExpr, instead of being wrapped by a the no-op try/catch mechanism. This second commit can be optionally discarded.

With this patch on my machine (4/8 core/thread Ivy Bridge) running on bare clojure.main:
Christophe's test cases both run i 3060ms on a artificially constrained 100M max heap, indicating a dominant GC overhead. (But they now both work!)

When max heap is at a comfortable 2G the reducers version outpaces the lazyseq at 2100ms vs 2600ms!

Comment by Ghadi Shayban [ 13/Jan/14 10:48 AM ]

Updating stale patch after latest changes to master. Latest is CLJ-1250-AllInvokeSites-20140113

Comment by Ghadi Shayban [ 04/Feb/14 3:50 PM ]

Updating patch after murmur changes

Comment by Tassilo Horn [ 13/Feb/14 4:52 AM ]

Ghadi, I suffer from the problem of this issue. Therefore, I've applied your patch CLJ-1250-AllInvokeSites-20140204.patch to the current git master. However, then I get lots of "java.lang.NoSuchFieldError: array" errors when the clojure tests are run:

     [java] clojure.test-clojure.clojure-set
     [java] 
     [java] java.lang.NoSuchFieldError: array
     [java] 	at clojure.core.protocols$fn__6026.invoke(protocols.clj:123)
     [java] 	at clojure.core.protocols$fn__5994$G__5989__6003.invoke(protocols.clj:19)
     [java] 	at clojure.core.protocols$fn__6023.invoke(protocols.clj:147)
     [java] 	at clojure.core.protocols$fn__5994$G__5989__6003.invoke(protocols.clj:19)
     [java] 	at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
     [java] 	at clojure.core.protocols$fn__6017.invoke(protocols.clj:48)
     [java] 	at clojure.core.protocols$fn__5968$G__5963__5981.invoke(protocols.clj:13)
     [java] 	at clojure.core$reduce.invoke(core.clj:6213)
     [java] 	at clojure.set$difference.doInvoke(set.clj:61)
     [java] 	at clojure.lang.RestFn.invoke(RestFn.java:442)
     [java] 	at clojure.test_clojure.clojure_set$fn__1050$fn__1083.invoke(clojure_set.clj:109)
     [java] 	at clojure.test_clojure.clojure_set$fn__1050.invoke(clojure_set.clj:109)
     [java] 	at clojure.test$test_var$fn__7123.invoke(test.clj:704)
     [java] 	at clojure.test$test_var.invoke(test.clj:704)
     [java] 	at clojure.test$test_vars$fn__7145$fn__7150.invoke(test.clj:721)
     [java] 	at clojure.test$default_fixture.invoke(test.clj:674)
     [java] 	at clojure.test$test_vars$fn__7145.invoke(test.clj:721)
     [java] 	at clojure.test$default_fixture.invoke(test.clj:674)
     [java] 	at clojure.test$test_vars.invoke(test.clj:718)
     [java] 	at clojure.test$test_all_vars.invoke(test.clj:727)
     [java] 	at clojure.test$test_ns.invoke(test.clj:746)
     [java] 	at clojure.core$map$fn__2665.invoke(core.clj:2515)
     [java] 	at clojure.lang.LazySeq.sval(LazySeq.java:40)
     [java] 	at clojure.lang.LazySeq.seq(LazySeq.java:49)
     [java] 	at clojure.lang.Cons.next(Cons.java:39)
     [java] 	at clojure.lang.RT.boundedLength(RT.java:1655)
     [java] 	at clojure.lang.RestFn.applyTo(RestFn.java:130)
     [java] 	at clojure.core$apply.invoke(core.clj:619)
     [java] 	at clojure.test$run_tests.doInvoke(test.clj:761)
     [java] 	at clojure.lang.RestFn.applyTo(RestFn.java:137)
     [java] 	at clojure.core$apply.invoke(core.clj:617)
     [java] 	at clojure.test.generative.runner$run_all_tests$fn__527.invoke(runner.clj:255)
     [java] 	at clojure.test.generative.runner$run_all_tests$run_with_counts__519$fn__523.invoke(runner.clj:251)
     [java] 	at clojure.test.generative.runner$run_all_tests$run_with_counts__519.invoke(runner.clj:251)
     [java] 	at clojure.test.generative.runner$run_all_tests.invoke(runner.clj:253)
     [java] 	at clojure.test.generative.runner$test_dirs.doInvoke(runner.clj:304)
     [java] 	at clojure.lang.RestFn.applyTo(RestFn.java:137)
     [java] 	at clojure.core$apply.invoke(core.clj:617)
     [java] 	at clojure.test.generative.runner$_main.doInvoke(runner.clj:312)
     [java] 	at clojure.lang.RestFn.invoke(RestFn.java:408)
     [java] 	at user$eval564.invoke(run_tests.clj:3)
     [java] 	at clojure.lang.Compiler.eval(Compiler.java:6657)
     [java] 	at clojure.lang.Compiler.load(Compiler.java:7084)
     [java] 	at clojure.lang.Compiler.loadFile(Compiler.java:7040)
     [java] 	at clojure.main$load_script.invoke(main.clj:274)
     [java] 	at clojure.main$script_opt.invoke(main.clj:336)
     [java] 	at clojure.main$main.doInvoke(main.clj:420)
     [java] 	at clojure.lang.RestFn.invoke(RestFn.java:408)
     [java] 	at clojure.lang.Var.invoke(Var.java:379)
     [java] 	at clojure.lang.AFn.applyToHelper(AFn.java:154)
     [java] 	at clojure.lang.Var.applyTo(Var.java:700)
     [java] 	at clojure.main.main(main.java:37)
Comment by Ghadi Shayban [ 13/Feb/14 8:23 AM ]

Can you give some details about your JVM/environment that can help reproduce? I'm not encountering this error.

Comment by Tassilo Horn [ 13/Feb/14 9:41 AM ]

Sure. It's a 64bit ThinkPad running GNU/Linux.

% java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.5) (ArchLinux build 7.u51_2.4.5-1-x86_64)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
Comment by Ghadi Shayban [ 13/Feb/14 10:19 AM ]

Strange, that is exactly my mail env, OpenJDK7 on Arch, 64-bit. I have also tested on JDK 6/7/8 on OSX mavericks. Are you certain that the git tree is clean besides the patch? (Arch users unite!)

Comment by Tassilo Horn [ 14/Feb/14 1:13 AM ]

Yes, the tree is clean. But now I see that I get the same error also after resetting to origin/master, so it's not caused by your patch at all. Oh, now the error vanished after doing a `mvn clean`! So problem solved.

Comment by Nicola Mometto [ 19/Feb/14 12:32 PM ]

Ghandi, FnExpr.parse should bind IN_TRY_BLOCK to false before analyzing the fn body, consider the case

(try (do something (fn a [] (heap-consuming-op a))) (catch Exception e ..))

Here in the a function the this local will never be cleared even though it's perfectly safe to.
Admittedly this is an edge case but we should cover this possibility too.

Comment by Ghadi Shayban [ 19/Feb/14 2:06 PM ]

You may have auto-corrected my name to Ghandi instead of Ghadi. I wish I were that wise =)

I will update the patch for FnExpr (that seems reasonable), but maybe after 1.6 winds down and the next batch of tickets get scrutiny. It would be nice to get input on a preferred approach from Rich or core after it gets vetted – or quite possibly not vetted.

Comment by Nicola Mometto [ 19/Feb/14 6:11 PM ]

hah, sorry for the typo on the name

Seems reasonable to me, in the meantime I just pushed to tools.analyzer/tools.emitter complete support for "this" clearing, I'll test this a bit in the next few days to make sure it doesn't cause unexpected problems.

Comment by Andy Fingerhut [ 24/Feb/14 12:13 PM ]

Patch CLJ-1250-AllInvokeSites-20140204.patch no longer applies cleanly to latest master as of Feb 23, 2014. It did on Feb 14, 2014. Most likely some of its context lines are changed by the commit to Clojure master on Feb 23, 2014 – I haven't checked in detail.

Comment by Ghadi Shayban [ 20/Mar/14 4:39 PM ]

Added a patch that 1) applies cleanly, 2) binds the IN_TRY_EXPR to false initially when analyzing FnExpr and 3) uses RT.booleanCast

Comment by Alex Miller [ 22/Aug/14 9:31 AM ]

Can you squash the patch and add tests to cover all this stuff?

Comment by Ghadi Shayban [ 22/Aug/14 9:47 AM ]

Sure. Have any ideas for how to test proper behavior of reference clearing? Know of some prior art in the test suite?

Comment by Alex Miller [ 22/Aug/14 10:25 AM ]

Something like the test in the summary would be a place to start. I don't know of any test that actually inspects bytecode or anything but that's probably not wise anyways. Need to make that kind of a test but get coverage on the different kinds of scenarios you're covering - try/catch, etc.

Comment by Ghadi Shayban [ 22/Aug/14 12:13 PM ]

Attached new squashed patch with a couple of tests.

Removed (innocuous but out-of-scope) second commit that analyzed try blocks missing a catch or finally clause as BodyExprs

Comment by Ghadi Shayban [ 29/Aug/14 11:43 AM ]

Rebased to latest master. Current patch CLJ-1250-08-29

Comment by Jozef Wagner [ 29/Aug/14 2:40 PM ]

CLJ-1250-08-29.patch is fishy, 87k size and it includes many unrelated commits

Comment by Alex Miller [ 29/Aug/14 2:44 PM ]

Agreed, Ghadi that last rebase looks wrong.

Comment by Ghadi Shayban [ 29/Aug/14 3:06 PM ]

Oops. Used format-patch against the wrong base. Updated.

Apologies that ticket is longer than War & Peace

Comment by Alex Miller [ 08/Sep/14 7:02 PM ]

I have not had enough time to examine all the bytecode diffs that I want to on this yet but preliminary feedback:

Compiler.java

  • need to use tabs instead of spaces to blend into the existing code better
  • why do StaticFieldExpr and InstanceFieldExpr not need this same logic?

compilation.clj

  • has some whitespace diffs that you could get rid of
  • there seem to be more cases in the code than are covered in the tests here?
Comment by Ghadi Shayban [ 08/Sep/14 11:19 PM ]

The germ of the issue is to clear the reference to 'this' (arg 0) when transferring control to another activation frame. StaticFieldExpr and InstanceFieldExpr do not transfer control to another frame. (StaticMethod and InstanceMethod do transfer control, and are covered by the patch)

Comment by Alex Miller [ 25/Sep/14 9:03 AM ]

Makes sense - can you address the tabs and whitespace issues?

Comment by Ghadi Shayban [ 26/Sep/14 12:51 PM ]

Latest patch CLJ-1250-08-29-ws.patch with whitespace issues fixed.

Generated at Sat Nov 22 21:42:37 CST 2014 using JIRA 4.4#649-r158309.