<< Back to previous view

[CLJ-1263] Allow static compilation of function invocations Created: 14/Sep/13  Updated: 12/Jan/16  Resolved: 11/Jan/16

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Completed Votes: 4
Labels: compiler


This proposal is to allow metadata on functions to prevent a fully dynamic var deref to be used whenever the function is called.

When the function is invoked, JVM "invokevirtual" instruction will be used, which is faster than the current implementation (var deref + IFn cast + invokinterface) and has less restrictions (no need to predefine interfaces to match the function parameters). The JVM is generally able to compile such invokevirtual instructions into extremely efficient code - effectively as fast as pure Java.

This is intended to pave the way to better support for statically compiled, high performance code. In particular, it allow:

  • Supporting arbitrary JVM primitives (float, int, byte, char etc.) as well as just double/long.
  • Supporting typed return values e.g. "String". This could eliminate many casts and type checks.
  • Supporting typed reference arguments (e.g. String).

Suggested usage:

(defn ^:static foo ^int [^String a ^String b]
(+ (count a) (Integer/parseInt b)))

Existing code / semantics should not be affected

Comment by Alex Fowler [ 18/Sep/13 5:08 AM ]

Very nice! That is what would really improve experience with certain tasks. I think it will also make possible to work with primitive arrays without the conversions?

Comment by Mike Anderson [ 19/Sep/13 5:44 PM ]

Hi Alex - which aspect of "work with primitive arrays" are you referring to? This feature would certainly help with passing primitive arguments to/from functions that use primitive arrays. It would also potentially help avoid some casts of primitive array arguments themselves. I don't think it helps in any other way - perhaps a separate issue would be appropriate if there is another thing you are trying to do?

Comment by Kevin Downey [ 29/Oct/13 11:50 AM ]

this issue is confusing, because there was/is a :static feature in clojure(which seems to be disabled in the compiler at the moment) and this proposal doesn't mention the existing work at all.

I also think this proposal is begging the question, there is no discussion of other possible solutions to the performance problem (whatever that is) that this is trying to solve.

the (var.deref()(IFn)).invoke(...) is pretty fundamental to the feel of clojure, in fact the existing :static keyword seems to be disabled in the compiler exactly because it complicates those semantics. so we should have a very clear proposal (not a wishlist) if we want to change that with some very clear wins.

maybe an optimizing clojure compiler would be a better approach.

Comment by Mike Anderson [ 30/Oct/13 11:01 PM ]

Hi Kevin,

This is partly in response to this discussion on Clojure-Dev, where we discussed there are quite a lot of performance issues around the way that Clojure passes arguments currently:

Also I believe it reinstates the original intention of "^:static": I can't find where this is/was officially documented, but Arthur's answer in this SO question suggests that this was the case:

I think the proposal is relatively clear: it's probably the minimal change required to get static/direct (i.e. not via an indirect var reference / IFn) function invocations without affecting any of the semantics of current code.

This is sufficiently important for me that it's preventing me from shifting some performance-critical code to Clojure (even with primitive type hints). e.g. here's a simple case with a small primitive function:

(defn ^long foo [^long x]
(inc x))

(c/quick-bench (dotimes [i 100] (foo i))) ;; c = criterium
=> Execution time mean : 194.334293 ns

(c/quick-bench (dotimes [i 100] (inc i)))
=> Execution time mean : 71.539048 ns

i.e. the indirect function invocation is costing us nearly 170% overhead. In Java the equivalent functions perform identically: the overhead is zero because with static function invocation the JVM JIT is able to eliminate all the function call overhead.

In the long term, I agree that a proper optimising compiler would be the best way forward (perhaps Clojure 2.0/CinC can give us this?) but in the meantime I think this is a pragmatic way to improve performance with minimal impact on existing code. Even with an optimising compiler, I think we' would need some way to specifiy the "optimised" semantics rather than the indirect var deref behaviour, and "^:static" seems like a reasonable way to do so (unless anyone has a better idea?)

Comment by Kevin Downey [ 04/Nov/13 3:58 PM ]

have you looked at the definition of int and how it uses :inline/definline to avoid the call overhead?

Comment by Mike Anderson [ 05/Nov/13 4:27 AM ]

Good point Kevin - :inline and definline seem like a good approach in many cases (although it's marked as "experimental" - does that mean we can't rely on it to work in future releases?).

This proposal is still somewhat different: the inline solutions and its variants are effectively doing macro expansion to generate code without a function call on the Clojure side. The approach in this proposal would still emit a function call in bytecode, but do so in a way that the JVM can subsequently inline and optimise much more efficiently. Both have their uses, I think?

Commented edited Nov 7 2013 by Andy Fingerhut: Regarding definline marked as experimental, it has been so marked since Clojure 1.0's release, and the plan is to keep it marked that way in the pending Clojure 1.6 release. See discussion thread on CLJ-1281. No plans to remove it that I am aware of.

Comment by Kevin Downey [ 06/Nov/13 2:06 PM ]

my point is your benchmark above is not a comparison of clojure's current deref + cast + invoke vs. invokevirtual, inc is being inlined in to a static method call there

Comment by Kevin Downey [ 06/Nov/13 2:32 PM ]

I've been noodling around this, and it is entirely possible to generate and invoke code in clojure right now without paying the extra deref() cost:

 (defn ^long fib [^long n]
   (case n
     0 0
     1 1
     (+ (fib (dec n))
        (fib  (dec (dec n))))))

can be written as

(declare TheR1798)

(definterface I1797
  (^long fib_Invoke_1 [^long n]))

(deftype R1798 []
  (^long fib_Invoke_1
    [this1799 ^long n]
    (case n
      0 0
      1 1
      (+ (.fib_Invoke_1 this1799 (dec n))
         (.fib_Invoke_1 this1799 (dec (dec n)))))))

(def TheR1798 (new R1798))

(defn ^long fib [^long n]
  (.fib_Invoke_1 TheR1798  n)))

now the recursive calls are invokeinterfaces, and the resulting function seems to have mean execution time about 5 times smaller using criterium to bench mark

it is entirely possible to write a macro that translates one in to other, and the weird names in the above are because I have a little proof of concept that does that.

the body of the bytecode for the regular fib function first shown looks something like:

  public final java.lang.Object invokePrim(long);
       0: lload_1       
       1: lstore_3      
       2: lload_3       
       3: l2i           
       4: tableswitch   { // 0 to 1
                     0: 28
                     1: 40
               default: 52
      28: lconst_0      
      29: lload_3       
      30: lcmp          
      31: ifne          52
      34: getstatic     #33                 // Field const__1:Ljava/lang/Object;
      37: goto          94
      40: lconst_1      
      41: lload_3       
      42: lcmp          
      43: ifne          52
      46: getstatic     #37                 // Field const__3:Ljava/lang/Object;
      49: goto          94
      52: getstatic     #57                 // Field const__5:Lclojure/lang/Var;
      55: invokevirtual #70                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
      58: checkcast     #6                  // class clojure/lang/IFn$LO
      61: lload_1       
      62: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      65: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      70: getstatic     #57                 // Field const__5:Lclojure/lang/Var;
      73: invokevirtual #70                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
      76: checkcast     #6                  // class clojure/lang/IFn$LO
      79: lload_1       
      80: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      83: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      86: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      91: invokestatic  #81                 // Method clojure/lang/Numbers.add:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Number;
      94: areturn       
      line 243: 0
      line 244: 2
      line 247: 52
      line 247: 52
      line 247: 61
      line 248: 70
      line 248: 79
      line 248: 79
      Start  Length  Slot  Name   Signature
             2      92     3 G__301   J
             0      94     0  this   Ljava/lang/Object;
             0      94     1     n   J

  public java.lang.Object invoke(java.lang.Object);
       0: aload_0       
       1: aload_1       
       2: checkcast     #89                 // class java/lang/Number
       5: invokestatic  #93                 // Method clojure/lang/RT.longCast:(Ljava/lang/Object;)J
       8: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      13: areturn       

the body of the "optimized" version looks like:

  public long fib_Invoke_1(long);
       0: lload_1       
       1: lstore_3      
       2: lload_3       
       3: l2i           
       4: tableswitch   { // 0 to 1
                     0: 28
                     1: 38
               default: 48
      28: lconst_0      
      29: lload_3       
      30: lcmp          
      31: ifne          48
      34: lconst_0      
      35: goto          80
      38: lconst_1      
      39: lload_3       
      40: lcmp          
      41: ifne          48
      44: lconst_1      
      45: goto          80
      48: aload_0       
      49: checkcast     #6                  // class com/thelastcitadel/kernel/I2364
      52: lload_1       
      53: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      56: invokeinterface #55,  3           // InterfaceMethod com/thelastcitadel/kernel/I2364.fib_Invoke_1:(J)J
      61: aload_0       
      62: checkcast     #6                  // class com/thelastcitadel/kernel/I2364
      65: lload_1       
      66: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      69: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      72: invokeinterface #55,  3           // InterfaceMethod com/thelastcitadel/kernel/I2364.fib_Invoke_1:(J)J
      77: invokestatic  #59                 // Method clojure/lang/Numbers.add:(JJ)J
      80: lreturn       
      line 251: 0
      line 251: 2
      line 251: 48
      line 251: 48
      line 251: 52
      line 251: 61
      line 251: 65
      line 251: 65
      Start  Length  Slot  Name   Signature
             2      78     3 G__2363   J
             0      80     0  this   Lcom/thelastcitadel/kernel/R2365;
             0      80     1     n   J

so the calls are not invokevirtual (due to the way clojure compiles stuff, you cannot type anything inside a record as being that record's type), but the interface is unique and only has one instance, so I think the jvm's class hierarchy analysis makes short work of that.

if I have time I may try and complete my macro and release it as a library, but given tools.analyzer.jvm someone should be able to do better than my little proof of concept very quickly.

Comment by Andy Fingerhut [ 07/Nov/13 12:48 PM ]

I don't know if my editing of Mike Anderson's Nov 5 2013 comment is notified to people watching this ticket, so adding a new comment so those interested in definline's experimental status can know to go back and re-read it.

Comment by Ghadi Shayban [ 09/Jan/16 5:28 PM ]

Scope of this ticket is all over the place, but at least the static dispatch aspect is covered well by direct linking in 1.8.

Comment by Alex Miller [ 11/Jan/16 8:22 AM ]

1.8 includes direct linking, which allows var function invocations to be replaced with a static method call in the bytecode, which seems to approximate the idea of this ticket.

With respect to the specific proposal, I do not expect that ^:static will come back in its prior form and that work along these lines will follow the direct linking path instead. Whatever future work is done, I expect it will not happen via this jira ticket, so closing.

Comment by Mike Anderson [ 11/Jan/16 11:26 PM ]

Hi Alex, no big deal about closing this ticket, but I'm not 100% sure if direct linking addressed this specific case (i.e. static linking for a specific function and/or call site, rather than a general compiler setting). Is there a description / documentation of direct linking somewhere that I can check against?

This would be very useful for situations where you want to optimise a hotspot in the code but don't want or need full static linking of everything else.

Comment by Alex Miller [ 12/Jan/16 5:15 AM ]

Based on everything I've talked about with Rich, we don't expect to re-implement ^:static as described in this ticket and whatever we do will be in the context of direct linking. There are no docs yet but those will be getting updated with the release.

Generated at Sat Feb 25 16:57:14 CST 2017 using JIRA 4.4#649-r158309.