<< Back to previous view

[ASYNC-169] Try/catch block loops eternally inside go block Created: 16/May/16  Updated: 20/Feb/17

Status: Reopened
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Critical
Reporter: Frank Burnham Assignee: Kevin Downey
Resolution: Unresolved Votes: 9
Labels: try-catch

Clojure 1.7.0; core.async 0.2.374; java 1.8

Attachments: Text File 0001-overhaul-exceptions-in-clojure-go-macro-ASYNC-169.patch     Text File 0005-Turn-tag-class-in-to-symbol-ASYNC-169.patch     Text File 0006-Turn-tag-class-in-to-string-ASYNC-169.patch     Text File 0010-overhaul-exceptions-in-clojure-go-macro-ASYNC-169.patch     Text File ASYNC-169-4.patch    
Patch: Code and Test
Approval: Ok

    (a/go-loop []
        (a/<! (throw (Exception. "Ex")))
        (catch clojure.lang.ExceptionInfo ei

The above example appears to loop forever when I expected it to return nil.

    (a/go-loop []
        (throw (Exception. "Ex"))
        (catch clojure.lang.ExceptionInfo ei

This example returns nil.

Related tickets:

Patch: 0010-overhaul-exceptions-in-clojure-go-macro-ASYNC-169.patch

Comment by Kevin Downey [ 16/May/16 3:50 PM ]

it looks like maybe "process-exception" in clojure.core.async.impl.ioc-macros is missing an :else (throw exception) clause at the end.

Comment by Frank Burnham [ 17/May/16 7:12 AM ]
user> (a/<!!
       (a/go-loop []
                     (a/<! (throw (Exception. "Ex")))
                   (catch clojure.lang.ExceptionInfo ei
                   (catch Throwable t :drop))))

That example is also odd to me. I expected to see :drop returned.

Comment by Kevin Downey [ 17/May/16 12:07 PM ]

looking at the macro expansion for the two catches, the emitted code is only setting up an exception frame for the ExceptionInfo case, and not for the Throwable case.

I suspect the exception handling code is all sawdust and broken promises.

Comment by Kevin Downey [ 17/May/16 12:12 PM ]

this looks like it is ignoring everything except the first catch https://github.com/clojure/core.async/blob/master/src/main/clojure/clojure/core/async/impl/ioc_macros.clj#L709

Comment by Kevin Downey [ 17/May/16 7:33 PM ]

this patch seems to make everything work, needs more tests of course

Comment by Kevin Downey [ 17/May/16 7:55 PM ]

looks like this might be the same issue as http://dev.clojure.org/jira/browse/ASYNC-100, which also has a patch attached

Comment by Alexey Aristov [ 10/Jul/16 11:36 AM ]

are there any plans/estimations regarding this bug? we were recently hit by it, and was a very special pleasure to analyze the issue.

Comment by Kevin Downey [ 20/Sep/16 11:38 AM ]

new patch, includes tests

Comment by Roman Pearah [ 09/Nov/16 9:33 PM ]

Wow, this is huge. My try block was just replaying over and over for no discernible reason, all without anything new coming off the channel (or anything prior to the try block executing). Stumped me for a while until I found this. Switching to catching Throwable instead of Exception worked as a quick fix.

Comment by Alex Miller [ 13/Feb/17 2:17 PM ]

Kevin Downey Hey, I've been looking at this, had a couple questions.

1) Does something similar need to be done for cljs?

2) There are a couple minor whitespace issues in the tests - you'll see the warning if you 'git apply' the patch - just need to remove trailing whitespace in a couple places.

3) I see these new reflection warnings from the new tests. Can these be avoided by typehinting to the type of the exception being caught inside the ?

Reflection warning, clojure/core/async/ioc_macros_test.clj:71:18 - reference to field getMessage can't be resolved.
Reflection warning, clojure/core/async/ioc_macros_test.clj:69:51 - reference to field getMessage can't be resolved.
Reflection warning, clojure/core/async/ioc_macros_test.clj:67:53 - reference to field getMessage can't be resolved.
Comment by Alex Miller [ 13/Feb/17 2:34 PM ]

The example in ASYNC-173 is another cases that causes a reflection warning on .printStackTrace.

Comment by Kevin Downey [ 13/Feb/17 3:54 PM ]

Latest patch fixes reflection on caught exceptions

Comment by Alex Miller [ 13/Feb/17 3:56 PM ]

thanks - for future reference, please add a -2 or something on the end so the patches aren't the same name (too easy to apply the wrong one). No need to do anything here though.

Comment by Kevin Downey [ 13/Feb/17 3:57 PM ]

Latest patch fixes trailing whitespace errors

Comment by Kevin Downey [ 13/Feb/17 4:00 PM ]

As for cljs, I am not sure, a quick glance has not filled me with confidence. I suppose it should be put through its paces with a similar set of tests. Off the bat it looks like the code only looks for one catch clause, which could be a problem, but maybe something up the line is turning multi-clause catches in to single clause catches for it.

Comment by Alex Miller [ 13/Feb/17 4:51 PM ]

For the examples up in the description:

  • the first example invokes <! with no channel, so seems broken to start (with patch applied, it throws)
  • the second example after patch is applied, throws as well.

I did not expect either example to throw.

Comment by Kevin Downey [ 13/Feb/17 5:26 PM ]

I don't follow

When I run the example I get a nil result (taking from a closed channel with nothing on it because the go-loop threw an exception). I get no new exception bound to *e, because the exception happened in the go loop thread, not the repl thread. And I get a print out of the exception thrown on the core.async thread because of core.async's default exception handling.

The exception bubbles out of the go block because the catch is for ExceptionInfo, but Exception is being thrown.

Is some part of that incorrect, or does it not match what you get when you run the example?

user=> (require '[clojure.core.async :as a])
user=> (a/<!!
  #_=>     (a/go-loop []
  #_=>       (try
  #_=>         (a/<! (throw (Exception. "Ex")))
  #_=>         (catch clojure.lang.ExceptionInfo ei
  #_=>           :retry))))
Exception in thread "async-dispatch-1" java.lang.Error: java.lang.Exception: Ex
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Ex
        at user$eval8675$fn__8693$state_machine__6725__auto____8694$fn__8697.invoke(form-init491495474467240539.clj:2)
        at user$eval8675$fn__8693$state_machine__6725__auto____8694.invoke(form-init491495474467240539.clj:2)
        at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:972)
        at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:976)
        at user$eval8675$fn__8693.invoke(form-init491495474467240539.clj:2)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        ... 2 more
user=> *1
user=> *e
Comment by Alex Miller [ 14/Feb/17 10:01 AM ]

Got it. This example from ASYNC-180 doesn't seem to be doing the right thing now:

(def t3
        (<! (timeout 100))

(println (<!! t3))
;; expect: 1
;; got: IllegalArgumentException No implementation of method: :take! of protocol: #'clojure.core.async.impl.protocols/ReadPort found for class: clojure.lang.Var$Unbound
Comment by Alex Miller [ 14/Feb/17 10:08 AM ]

Never mind, I must have had something weird referred in the midst of my example set.

Comment by Alex Miller [ 14/Feb/17 10:11 AM ]


Comment by Kevin Downey [ 14/Feb/17 1:41 PM ]

great news, thanks!

Comment by Kevin Downey [ 16/Feb/17 2:12 PM ]

I re-opened this issue due to a bug discussed on slack

    (catch Exception e
      (async/<! 1)

doesn't compile

Comment by Kevin Downey [ 16/Feb/17 2:14 PM ]

The latest patch (0003-Don-t-clobber-existing-metadata-ASYNC-169.patch) fixes the issue in my previous comment, and should apply cleanly to master.

Comment by Alex Miller [ 17/Feb/17 9:04 AM ]

When I apply the patch, tests don't run:

Exception in thread "main" clojure.lang.ExceptionInfo: Could not resolve var: ne {:var ne, :file "clojure/core/async/ioc_macros_test.clj", :column 9, :line 94}, compiling:(clojure/core/async/ioc_macros_test.clj:91:6)
Comment by Alex Miller [ 17/Feb/17 9:18 AM ]

Updated last patch to fix typo

Comment by Alex Miller [ 17/Feb/17 9:24 AM ]

I'm seeing a reflection warning here that does not seem expected (example from ASYNC-173):

(require '[clojure.core.async :as a :refer [chan go go-loop <! >! <!! >!! take! put! timeout]])
(defn endless-fun [f]
  (let [in-ch (a/to-chan (range 1 10))]
         (loop []
           (when-let [value (a/<! in-ch)]
             (f value)
         (catch Exception e
           (.printStackTrace e)
           (throw e))))

Reflection warning, /private/var/folders/7r/_1fj0f517rgcxwx79mn79mfc0000gn/T/form-init4526179122097363504.clj:11:12 - reference to field printStackTrace can't be resolved.
Comment by Kevin Downey [ 17/Feb/17 10:42 AM ]

0005-Turn-tag-class-in-to-symbol-ASYNC-169.patch applies cleanly to master(v0.2.395-19-g6e36258), and the tests pass so no typos. As I mentioned in slack, in the previous patch I changed how the tag metadata is attached, and ended up attaching a class instead of a symbol

Comment by Kevin Downey [ 17/Feb/17 11:29 AM ]

0006-Turn-tag-class-in-to-string-ASYNC-169.patch replaces the 0005 patch with improvements suggested by Bronsa in slack

Comment by Kevin Downey [ 17/Feb/17 12:39 PM ]

as discussed in slack, 0010-overhaul-exceptions-in-clojure-go-macro-ASYNC-169.patch combines all previous changes and applies cleanly against master (v0.2.395-22-g3e37e0b, all 169 changes reverted)

[ASYNC-185] `thread` prevents clearing of body locals Created: 17/Feb/17  Updated: 17/Feb/17

Status: Open
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: memory

Attachments: Text File 0001-ASYNC-185-use-fn-once-for-thread-thunk.patch    
Patch: Code


`thread` wraps its body in a `(fn [] ~@body)`, it should wrap it in `(^:once fn* [] ~@body)` instead to allow clearing of closed over locals

Patch: 0001-ASYNC-185-use-fn-once-for-thread-thunk.patch

[ASYNC-138] Go blocks leak memory Created: 10/Aug/15  Updated: 17/Feb/17

Status: Reopened
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Critical
Reporter: Brian Lubeski Assignee: Unassigned
Resolution: Unresolved Votes: 14
Labels: go, memory

clojure 1.7.0
core.async 0.1.346.0-17112a-alpha
Java HotSpot(TM) Client VM 1.8.0_31-b13

Attachments: Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v2-nested.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v2.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v3.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v4.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v5.patch     Text File 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v6.patch    
Patch: Code and Test
Approval: Vetted


The following example, after running for a few minutes, generates an OutOfMemoryError.

(let [c (chan)]
  (go (while (<! c)))
  (let [vs (range)]
      (doseq [v vs]
        (>! c v)))))

By contrast, the following example will run indefinitely without causing an OutOfMemoryError.

(let [c (chan)]
  (go (while (<! c)))
    (let [vs (range)] 
      (doseq [v vs]
        (>! c v)))))

The only significant difference I see between the two examples is that the (range) is created outside the go block in the first example but is created inside the go block in the second example. It appears that the go block in the first example is referencing vs in such a way as to prevent if from being garbage-collected.

This behavior is also be the cause of ASYNC-32.

The attached patch applies a transformation from

(let [a large-seq] (loop [..] .. a ..))
where the loop exemplifies the state machine loop created by `go`, into
(let [a large-seq a' (^:once fn* [] a)] (loop [..] .. (let [a (a')] ..) ..))

which allows the closed over locals to cross into the state machine loop without getting held. While this might look unsafe, because of how the `go` loop state machine works, we're guaranteed that `(a')` will only be invoked on the first iteration of the loop.
The patch looks way more complex than it actually is, because it needs to deal with: renaming locals so that shadowing of special symbols works properly, preserving locals type hints and handle nested go blocks, where the value of `&env` will be provided by tools.analyzer rather than from Compiler.java

There are two different type propagations that we need to do, one in case of non primitive objects and one in the case of primitive values:

  • if we're propagating type info for a non primitive object, the transformation is as follow:
    (let [a ^Klass my-obj] (loop [..] .. a ..))
    (let [a ^Klass my-obj a' (^:once fn* [] a)] (loop [..] .. (let [^Klass a (a')] ..) ..))
  • if we're propagating type info for a primitive local, a type hint won't do – we actually need to unbox the value out of the thunk, so the transformation will be:
(let [a (int x)] (loop [..] .. a ..))
(let [a (int x) a' (^:once fn* [] a)] (loop [..] .. (let [a (clojure.core/int (a'))] ..) ..))

Patch: 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v6.patch

Comment by Nicola Mometto [ 15/Dec/15 9:34 AM ]

attached is a WIP patch to fix this issue, would appreciate some testing on this

Comment by Nicola Mometto [ 15/Dec/15 5:52 PM ]

CLJ-1744 contributes to this bug

Comment by Nicola Mometto [ 15/Dec/15 6:23 PM ]

Updated patch to preserve type info

Comment by Nicola Mometto [ 17/Dec/15 10:16 AM ]

Note that the current patch breaks nested `go` blocks at compile time since the nested `go` block will be macroexpanded by tools.analyzer.jvm rather than Compiler.java and this &env will be different.

Is this something we want to support? There seem to be a lot of other cases that break when nesting `go` blocks.

If this is something that we do want to support 0001-ASYNC-138-allow-for-clearing-of-closed-over-locals-v2-nested.patch fixes this

Comment by Jan Rychter [ 21/Dec/15 6:18 AM ]

This just bit me badly when doing data processing work — I never expected that (async/to-chan (line-seq rdr)) can be the culprit of my heap getting exhausted in 30 seconds. Took quite some time to narrow it down. Thanks to Nicola for letting me know about this bug on Slack.

Comment by Alex Miller [ 13/Feb/17 3:31 PM ]

Nicola, not sure how to read your comments above re the patches. Most of the other cases of nested go blocks failing that I see in the tracker are related to the exception handling bugs in ASYNC-169. Do you have any hesitancy over your patch for the nested stuff?

Comment by Nicola Mometto [ 13/Feb/17 3:40 PM ]

Alex, I think the only reason why I proposed the -nested patch as an alternative rather than as a main patch is that it adds complexity to the patch to handle an unlikely scenario. I think we should go with the -nested patch, but if understanding the patch for screeners is more important than handling weird edge cases, the other patch is easier

Comment by Alex Miller [ 13/Feb/17 3:55 PM ]

Tests don't pass with -nested patch.

Caused by: java.lang.ClassCastException: clojure.lang.Symbol cannot be cast to java.util.Map$Entry
	at clojure.lang.APersistentMap$ValSeq.first(APersistentMap.java:232)
	at clojure.lang.RT.first(RT.java:653)
	at clojure.core$first__4110.invoke(core.clj:55)
	at clojure.core.async.impl.ioc_macros$state_machine.invoke(ioc_macros.clj:1107)
	at clojure.core.async.ioc_macros_test$runner.doInvoke(ioc_macros_test.clj:24)
Comment by Nicola Mometto [ 13/Feb/17 5:04 PM ]

updated patch making all the tests run

Comment by Nicola Mometto [ 14/Feb/17 4:49 AM ]

updated patch with an additional testcase making sure nested go blocks work properly after the transformation, also bumping tools.analyzer.jvm to the last version as that includes a fix for nested t.a calls

Comment by Alex Miller [ 14/Feb/17 10:40 AM ]


Comment by Alex Miller [ 15/Feb/17 3:36 PM ]

I am reopening this and reverting the change for the moment. This example fails when using Clojure 1.9.0-alpha12+ (anything after the commit for CLJ-1224).

(require '[clojure.core.async :refer (go)])
(defprotocol P
  (x [p]))

(defrecord R [z]
  (x [_]
      (loop []

CompilerException java.lang.UnsupportedOperationException: Can't type hint a primitive local

I believe this is due to the code in the patch related to applying meta to the local binding, specifically this comes into interaction with the code now generated in hasheq after CLJ-1224.

Comment by Nicola Mometto [ 15/Feb/17 4:58 PM ]

Updated the patch to handle with primitive type hints.
In case of primitive type hints, rather than emitting

(let [^String a (a')] ..)
we emit
(let [a (clojure.core/int (a'))] ..)
as hinting is not valid and proper unboxing is required

Comment by Alex Miller [ 15/Feb/17 8:12 PM ]

Did you mean to have ^String and clojure.core/int in that example?

Comment by Nicola Mometto [ 16/Feb/17 3:12 AM ]

yes, to showcase the two different transformations we do in case of primitive or non primitive type propagation. I'll expand on that in the ticket description so it's clearer what i meant

Comment by Alex Miller [ 17/Feb/17 9:10 AM ]

Applied v5 version.

Generated at Tue Feb 21 19:58:33 CST 2017 using JIRA 4.4#649-r158309.