[CLJ-1322] doseq with several bindings causes "ClassFormatError: Invalid Method Code length" Created: 10/Jan/14 Updated: 28/Jun/14
|Affects Version/s:||Release 1.5|
|Fix Version/s:||Release 1.7|
Clojure 1.5.1, java 1.7.0_25, OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
|Attachments:||doseq-bench.txt doseq.patch script.clj|
Important Perf Note the new impl is faster for collections that are custom-reducible but not chunked, and is also faster for large numbers of bindings. The original implementation is hand tuned for chunked collections, and wins for larger chunked coll/smaller binding count scenarios, presumably due to the fn call/return tracking overhead of reduce. Details are in the comments.
While this example is silly, it's a problem we've hit a couple of times. It's pretty surprising when you have just a couple of lines of code and suddenly you get the code length error.
|Comment by Kevin Downey [ 18/Apr/14 12:20 AM ]|
reproduces with jdk 1.8.0 and clojure 1.6
|Comment by Nicola Mometto [ 22/Apr/14 5:35 PM ]|
A potential fix for this is to make doseq generate intermediate fns like `for` does instead of expanding all the code directly.
|Comment by Ghadi Shayban [ 25/Jun/14 8:39 PM ]|
Existing doseq handles chunked-traversal internally, deciding the
This approach redefs doseq later in core.clj, after protocol-based
It supports the existing :let, :while, and :when modifiers.
New is a stronger assertion that modifiers cannot come before binding
valid: [x coll :when (foo x)]
This implementation does not suffer from the code explosion problem.
Implementing this without destructuring was not a party, luckily reduce
|Comment by Andy Fingerhut [ 26/Jun/14 12:25 AM ]|
For anyone reviewing this patch, note that there are already many tests for correct functionality of doseq in file test/clojure/test_clojure/for.clj. It may not be immediately obvious, but every test for 'for' defined with deftest-both is a test for 'for' and also for 'doseq'.
Regarding the current implementation of doseq: it in't simply that it is too many bytes per binding, it is that the code size doubles with each additional binding. See these results, which measures size of the macroexpanded form rather than byte code size, but those two things should be fairly linearly related to each other here:
Here are results for the same expressions after Ghadi's patch doseq.patch dated June 25 2014:
It would be good to see some performance results with and without this patch, too.
|Comment by Stuart Halloway [ 28/Jun/14 2:21 PM ]|
In the tests below, the new impl is called "doseq2", vs. the original impl "doseq"
|Comment by Ghadi Shayban [ 28/Jun/14 6:23 PM ]|
Hmm, I cannot reproduce your results.
I'm not sure whether you are testing with lein, on what platform, what jvm opts.
Can we test using this little harness instead directly against clojure.jar? I've attached a the harness and two runs of results (one w/ default heap, the other 3GB w/ G1GC)
I added a medium and small (range) too.
Anecdotally, I see doseq2 outperform in all cases except the small range. Using criterium shows a wider performance gap favoring doseq2.
I pasted the results side by side for easier viewing.
|Comment by Ghadi Shayban [ 28/Jun/14 6:47 PM ]|
It's good that the benchmarks contain empty doseq bodies in order to isolate the overhead of traversal. However, that represents 0% of actual real-world code.
At least for the first benchmark (large chunked seq), adding in some tiny amount of work did not change results signifantly. Neither for (map str [a])
nor for intrisics like (inc a)
I still see reduce-based doseq ahead of the original, except for small seqs