<< Back to previous view

[CLJ-1058] StackOverflowError on exception in reducef for PersistentHashMap fold Created: 02/Sep/12  Updated: 07/Feb/14  Resolved: 07/Feb/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.6

Type: Defect Priority: Major
Reporter: Tom Jack Assignee: Bruce Adams
Resolution: Completed Votes: 1
Labels: reducers
Environment:

Clojure 1.5.0-alpha4, Sun Java 1.6.0_35, with [org.codehaus.jsr166-mirror/jsr166y "1.7.0"]


Attachments: File sneakyThrow-clj-1058.diff    
Patch: Code and Test
Approval: Ok

 Description   

If reducef throws an exception, the exception is swallowed sometimes (that is: not propagated up to the caller of r/fold). This can lead to infinite recursion, causing a stack overflow which masks the problem.

user> (require '[clojure.core.reducers :as r])
nil
user> (r/fold (fn ([]) ([ret k v] (+ 3 "foo") ret)) (into {} (map (juxt identity identity) (range 10000))))
StackOverflowError   clojure.lang.Numbers.add (Numbers.java:126)

This results in a stack like: https://raw.github.com/gist/3bab917287a7fd635a84/f38bfe3e270556e467f3fc02062af7ea10781390/gistfile1.txt

Cause: The problem area in the code catches Exception and swallows it - this is commented "aargh" in PersistentHashMap.java line 444 (as of 412a51d).
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashMap.java#L444

Approach: Throw the Exception and allow it to propagate instead of swallowing it.

Patch: sneakyThrow-clj-1058.diff

Screened by: Alex Miller



 Comments   
Comment by Timothy Baldridge [ 30/Nov/12 3:40 PM ]

Verified as a bug.

Comment by Dave Sann [ 30/Jul/13 3:41 AM ]

Some comments here:

https://groups.google.com/d/msg/clojure-dev/kwJEw9YjeGc/VKxlixGVnMIJ

in short
//aargh

can be replaced by

throw new RuntimeException(e);

or maybe: Util.sneakyThrow(e);

which will surface the exception rather than swallow it.

I have questions about clean-up - in the thread - I am not very familiar with the fork join framework.

Comment by Bruce Adams [ 17/Sep/13 7:12 AM ]

Don't ignore exceptions thrown by tasks

Comment by Bruce Adams [ 18/Sep/13 11:24 AM ]

This change removes the problematic
try ... catch
and enhances "foldTasks" and "fold" method declarations by adding "throws Exception".

Comment by Andy Fingerhut [ 26/Sep/13 9:28 PM ]

Bruce, others can comment better than I can on this comment, but a large number of places in the Clojure implementation avoid the need for declaring checked exceptions by using Util.sneakyThrow. Search for that in the Java code for examples, and how it is implemented if you are curious. It might be that this is another place this should be done.

Comment by Bruce Adams [ 28/Sep/13 8:38 PM ]

I don't yet know the Clojure code base very well. Doing a quick count, there are slightly more methods with a declared "throws" clause then there are "sneakyThrows" calls. I am assuming (based on other Java experience) that sneakyThrows is only used when declaring a throws clause is problematic. In my proposed fix, throws declarations are needed only on four methods, all in the same module. The actual exceptions appear to be dealt with cleanly.

What I haven't yet figured out is how to write relevant tests for this. The clearly broken thing about the existing code is falling out of the if count==1 block into code that assumes count>1. Even just returning a null at least avoids the infinite recursion problem (and also swallows the exception, which seems wrong). So a test for once-and-only-once seems relevant. Also, a test for exception behavior seems called for.

(Also, I don't know protocol for assigning bugs. I'm tempted to hit the "Assign To Me" button in JIRA, but I'm not sure what it means in the larger scheme of things.)

Comment by Alex Miller [ 11/Oct/13 4:21 PM ]

"Assign to Me" means that you are currently working on something for that ticket and others should probably check with you before investing time. From my perspective, anyone is welcome to assign to themselves with the caveat that someone else may still create a patch and/or interpret non-progress as an invitation to work on it.

Is this ticket in a state where it should be screened or do you feel it is incomplete? I am trying to judge whether I should invest time in screening it.

Comment by Bruce Adams [ 11/Oct/13 4:42 PM ]

I believe that declare-throws-clj-1058.diff correctly deals with the problem. It certainly behaves better than code it replaces. I can imagine you screening, and then Rich accepting, it as is.

However, I think we will all be happier with tests demonstrating behavior. I've been struggling with writing tests, especially tests that cleanly fail on the old code, instead of hard crashing the test suite with a stack overflow.

Comment by Bruce Adams [ 17/Oct/13 1:21 PM ]

The code fix in "patch-with-tests-clj-1058.diff" is identical to my earlier "declare-throws-clj-1058.diff" This new patch includes an additional commit adding a unit test.

This is ready for screening.

Comment by Alex Miller [ 17/Dec/13 11:05 PM ]

I don't think we want to throw Exception out of (at least) fold() and possibly the others as well. Can you rework with sneakyThrow?

Comment by Bruce Adams [ 18/Dec/13 9:21 PM ]

I can use sneakyThrow. The patch I uploaded in September 17 did exactly that. I have since deleted that patch.

I've now had two people I respect, you and Andy Fingerhut, suggest a need for using sneakyThrow in this code. Each of you know the Clojure code far better than I do. I'm very inclined to trust your intuition that sneakyThrow is the right thing here, but I would really like to understand why. In my testing, checked exceptions (non-RuntimeExceptions) out of this code already get wrapped in a RuntimeException. I assume this wrapping happens higher up the Java stack and is implemented via sneakyThrow. Why do we want to add yet another layer of exception wrapping? It adds complexity to this Java code.

I am happy to add tests demonstrating the existing checked exception wrapping behavior.

Can you help me understand why we want sneakyThrow here? I would really appreciate it.

Comment by Alex Miller [ 18/Dec/13 10:01 PM ]

Most of the internal Clojure code eschews checked exceptions (they are an artifact of Java, not a JVM thing) and I think it's unlikely that Rich would accept a patch that explicitly declared it in this case. My expectation is that wrapping the exception at the beginning via sneakyThrow will prevent it from being wrapped again farther up, so I don't know that this change will nest things any more deeply.

Comment by Bruce Adams [ 11/Jan/14 10:57 AM ]

My earlier patch had an extra, unneeded "throws Exception" declaration on a publicly visible "fold" method. This new patch, declarative-clj-1058.diff, only adds "throws Exception" to methods used within src/jvm/clojure/lang/PersistentHashMap.java.

To make this locality clear, I've attached another patch, private-interface-INode.diff, which declares the internal INode interface as "private". I don't think the "private" declaration needs to go into the Clojure code.

These patches do not taking your advice to use sneakyThrow here. Using sneakyThrow still feels wrong to me. I will (attempt to) start a conversation on the Clojure-dev mailing asking people for thoughts about the two approaches.

Comment by Bruce Adams [ 13/Jan/14 5:16 AM ]

I forgot to thank Alex for expressing concerns with my earlier patchset. That got me to think harder about what exactly is happening here and to learn more about the Java code implementing Clojure. I enjoy thinking and learning.

Thank you, Alex!

Comment by Bruce Adams [ 14/Jan/14 5:49 PM ]

Thanks to Kevin Downey on the Clojure-dev mailing list, I now understand sneakyThrow better.

This patch uses sneakyThrow to surface exceptions.

Comment by Alex Miller [ 14/Jan/14 10:08 PM ]

Switching to Vetted so this is available to be screened.

Comment by Alex Miller [ 16/Jan/14 4:16 PM ]

Bruce, what's with the k-fail and def stuff in the tests?

Comment by Alex Miller [ 17/Jan/14 10:16 AM ]

Moving back to Incomplete - I think the tests should be simplified - I don't see why all the internal def stuff is necessary to demonstrate this.

Comment by Bruce Adams [ 17/Jan/14 11:09 AM ]

Yes, I agree. I'll update the patch with simpler tests. It'll take a few days.

Comment by Bruce Adams [ 05/Feb/14 8:10 PM ]

Same fix as before, using sneakyThrow, but with a single, much simpler test.

Generated at Sun Nov 23 08:40:48 CST 2014 using JIRA 4.4#649-r158309.