[CLJ-1263] Allow static compilation of function invocations Created: 14/Sep/13 Updated: 07/Nov/13
This proposal is to allow metadata on functions to prevent a fully dynamic var deref to be used whenever the function is called.
When the function is invoked, JVM "invokevirtual" instruction will be used, which is faster than the current implementation (var deref + IFn cast + invokinterface) and has less restrictions (no need to predefine interfaces to match the function parameters). The JVM is generally able to compile such invokevirtual instructions into extremely efficient code - effectively as fast as pure Java.
This is intended to pave the way to better support for statically compiled, high performance code. In particular, it allow:
(defn ^:static foo ^int [^String a ^String b]
Existing code / semantics should not be affected
|Comment by Alex Fowler [ 18/Sep/13 5:08 AM ]|
Very nice! That is what would really improve experience with certain tasks. I think it will also make possible to work with primitive arrays without the conversions?
|Comment by Mike Anderson [ 19/Sep/13 5:44 PM ]|
Hi Alex - which aspect of "work with primitive arrays" are you referring to? This feature would certainly help with passing primitive arguments to/from functions that use primitive arrays. It would also potentially help avoid some casts of primitive array arguments themselves. I don't think it helps in any other way - perhaps a separate issue would be appropriate if there is another thing you are trying to do?
|Comment by Kevin Downey [ 29/Oct/13 11:50 AM ]|
this issue is confusing, because there was/is a :static feature in clojure(which seems to be disabled in the compiler at the moment) and this proposal doesn't mention the existing work at all.
I also think this proposal is begging the question, there is no discussion of other possible solutions to the performance problem (whatever that is) that this is trying to solve.
the (var.deref()(IFn)).invoke(...) is pretty fundamental to the feel of clojure, in fact the existing :static keyword seems to be disabled in the compiler exactly because it complicates those semantics. so we should have a very clear proposal (not a wishlist) if we want to change that with some very clear wins.
maybe an optimizing clojure compiler would be a better approach.
|Comment by Mike Anderson [ 30/Oct/13 11:01 PM ]|
This is partly in response to this discussion on Clojure-Dev, where we discussed there are quite a lot of performance issues around the way that Clojure passes arguments currently:
Also I believe it reinstates the original intention of "^:static": I can't find where this is/was officially documented, but Arthur's answer in this SO question suggests that this was the case:
I think the proposal is relatively clear: it's probably the minimal change required to get static/direct (i.e. not via an indirect var reference / IFn) function invocations without affecting any of the semantics of current code.
This is sufficiently important for me that it's preventing me from shifting some performance-critical code to Clojure (even with primitive type hints). e.g. here's a simple case with a small primitive function:
(defn ^long foo [^long x]
(c/quick-bench (dotimes [i 100] (foo i))) ;; c = criterium
(c/quick-bench (dotimes [i 100] (inc i)))
i.e. the indirect function invocation is costing us nearly 170% overhead. In Java the equivalent functions perform identically: the overhead is zero because with static function invocation the JVM JIT is able to eliminate all the function call overhead.
In the long term, I agree that a proper optimising compiler would be the best way forward (perhaps Clojure 2.0/CinC can give us this?) but in the meantime I think this is a pragmatic way to improve performance with minimal impact on existing code. Even with an optimising compiler, I think we' would need some way to specifiy the "optimised" semantics rather than the indirect var deref behaviour, and "^:static" seems like a reasonable way to do so (unless anyone has a better idea?)
|Comment by Kevin Downey [ 04/Nov/13 3:58 PM ]|
have you looked at the definition of int and how it uses :inline/definline to avoid the call overhead?
|Comment by Mike Anderson [ 05/Nov/13 4:27 AM ]|
Good point Kevin - :inline and definline seem like a good approach in many cases (although it's marked as "experimental" - does that mean we can't rely on it to work in future releases?).
This proposal is still somewhat different: the inline solutions and its variants are effectively doing macro expansion to generate code without a function call on the Clojure side. The approach in this proposal would still emit a function call in bytecode, but do so in a way that the JVM can subsequently inline and optimise much more efficiently. Both have their uses, I think?
Commented edited Nov 7 2013 by Andy Fingerhut: Regarding definline marked as experimental, it has been so marked since Clojure 1.0's release, and the plan is to keep it marked that way in the pending Clojure 1.6 release. See discussion thread on
|Comment by Kevin Downey [ 06/Nov/13 2:06 PM ]|
my point is your benchmark above is not a comparison of clojure's current deref + cast + invoke vs. invokevirtual, inc is being inlined in to a static method call there
|Comment by Kevin Downey [ 06/Nov/13 2:32 PM ]|
I've been noodling around this, and it is entirely possible to generate and invoke code in clojure right now without paying the extra deref() cost:
can be written as
now the recursive calls are invokeinterfaces, and the resulting function seems to have mean execution time about 5 times smaller using criterium to bench mark
it is entirely possible to write a macro that translates one in to other, and the weird names in the above are because I have a little proof of concept that does that.
the body of the bytecode for the regular fib function first shown looks something like:
the body of the "optimized" version looks like:
so the calls are not invokevirtual (due to the way clojure compiles stuff, you cannot type anything inside a record as being that record's type), but the interface is unique and only has one instance, so I think the jvm's class hierarchy analysis makes short work of that.
if I have time I may try and complete my macro and release it as a library, but given tools.analyzer.jvm someone should be able to do better than my little proof of concept very quickly.
|Comment by Andy Fingerhut [ 07/Nov/13 12:48 PM ]|
I don't know if my editing of Mike Anderson's Nov 5 2013 comment is notified to people watching this ticket, so adding a new comment so those interested in definline's experimental status can know to go back and re-read it.