Clojure

aset-* and aget perform poorly on multi-dimensional arrays even with type hints.

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Environment:
    Clojure 1.5.1.

    Dependencies: criterium
  • Patch:
    Code
  • Approval:
    Triaged

Description

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
             Execution time mean : 9.457308 µs
    Execution time std-deviation : 126.220954 ns
   Execution time lower quantile : 9.344450 µs ( 2.5%)
   Execution time upper quantile : 9.629202 µs (97.5%)
                   Overhead used : 2.477107 ns

One workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
             Execution time mean : 6.999273 ns
    Execution time std-deviation : 0.112703 ns
   Execution time lower quantile : 6.817782 ns ( 2.5%)
   Execution time upper quantile : 7.113845 ns (97.5%)
                   Overhead used : 2.477107 ns

Cause: The inlined version only applies to arity 2, and otherwise it reflects.

  1. CLJ-1289-p1.patch
    13/Feb/14 8:52 PM
    1 kB
    Gary Fredericks
  2. CLJ-1289-p2.patch
    02/Sep/18 11:31 AM
    3 kB
    Alexander Yakushev
  3. CLJ-1289-p2-repl.txt
    02/Sep/18 11:26 AM
    3 kB
    Alexander Yakushev

Activity

Hide
Gary Fredericks added a comment -

A glance at the source makes it obvious that the hypothesis is correct – the inlined version only applies to arity 2, and otherwise it reflects.

I thought this would be as simple as converting the inline function to be variadic (using reduce), but after trying it I realized this is tricky as you have to generate the correct type hints for each step. E.g., given ^"[[D" the inline function needs to type-hint the intermediate result with ^"[D". This isn't difficult if we're just dealing with strings that begin with square brackets, but I don't know for sure that those are the only possibilities.

Show
Gary Fredericks added a comment - A glance at the source makes it obvious that the hypothesis is correct – the inlined version only applies to arity 2, and otherwise it reflects. I thought this would be as simple as converting the inline function to be variadic (using reduce), but after trying it I realized this is tricky as you have to generate the correct type hints for each step. E.g., given ^"[[D" the inline function needs to type-hint the intermediate result with ^"[D". This isn't difficult if we're just dealing with strings that begin with square brackets, but I don't know for sure that those are the only possibilities.
Hide
Yaron Peleg added a comment - - edited

Bump. I just got bitten bad by this.

There are two seperate issues here:
1) (aget 2d-array-doubles 0 0 ) doesn't emit a reflection warning.
2) It seems like the compiler has enough information to avoid the reflective call here.

Note this gets exp. worse as number of dimensions grows, i.e (get doubles3d 0 0 0)
will be 1M slower, etc' Not true, unless you iterate over all elements. it's
simply n_dims*1000x per lookup.

Nasty surprise, especially considering you often go to primitive arrays for speed,
and a common use case is an inner loop(s) that iterate over arrays.

Show
Yaron Peleg added a comment - - edited Bump. I just got bitten bad by this. There are two seperate issues here: 1) (aget 2d-array-doubles 0 0 ) doesn't emit a reflection warning. 2) It seems like the compiler has enough information to avoid the reflective call here. Note this gets exp. worse as number of dimensions grows, i.e (get doubles3d 0 0 0) will be 1M slower, etc' Not true, unless you iterate over all elements. it's simply n_dims*1000x per lookup. Nasty surprise, especially considering you often go to primitive arrays for speed, and a common use case is an inner loop(s) that iterate over arrays.
Hide
Gary Fredericks added a comment -

I can probably take a stab at this.

Show
Gary Fredericks added a comment - I can probably take a stab at this.
Hide
Gary Fredericks added a comment -

I think the reflection warning problem is pretty much impossible to solve without changing code elsewhere in the compiler, because the reflection done in aget is a different kind than normal clojure reflection – it's explicitly in the function body rather than emitted by the compiler. Since the compiler isn't emitting it, it doesn't reasonably know it's even there. So even if aget is fixed for other arities, you still won't get the warning when it's not inlined.

I can imagine some sort of metadata that you could put on a function telling the compiler that it will reflect if not inlined. Or maybe a more generic not-inlined warning?

The global scope of adding another compiler flag seems about balanced by the seriousness of array functions not being able to warn on reflection.

Show
Gary Fredericks added a comment - I think the reflection warning problem is pretty much impossible to solve without changing code elsewhere in the compiler, because the reflection done in aget is a different kind than normal clojure reflection – it's explicitly in the function body rather than emitted by the compiler. Since the compiler isn't emitting it, it doesn't reasonably know it's even there. So even if aget is fixed for other arities, you still won't get the warning when it's not inlined. I can imagine some sort of metadata that you could put on a function telling the compiler that it will reflect if not inlined. Or maybe a more generic not-inlined warning? The global scope of adding another compiler flag seems about balanced by the seriousness of array functions not being able to warn on reflection.
Hide
Gary Fredericks added a comment - - edited

Attached CLJ-1289-p1.patch which simply inlines variadic calls to aget. It assumes that if it sees a :tag on the array arg that is a string beginning with [, it can assume that the return value from one call to aget can be tagged with the same string with the leading [ stripped off.

I'm not a jvm expert, but having read through the spec a little bit I think this is a reasonable assumption.

Show
Gary Fredericks added a comment - - edited Attached CLJ-1289-p1.patch which simply inlines variadic calls to aget. It assumes that if it sees a :tag on the array arg that is a string beginning with [, it can assume that the return value from one call to aget can be tagged with the same string with the leading [ stripped off. I'm not a jvm expert, but having read through the spec a little bit I think this is a reasonable assumption.
Hide
Alex Miller added a comment -

I think this probably is actually true, but a more official way to ask that question would be to get the array class and ask for Class.getComponentType() (and less janky than string munging).

Show
Alex Miller added a comment - I think this probably is actually true, but a more official way to ask that question would be to get the array class and ask for Class.getComponentType() (and less janky than string munging).
Hide
Gary Fredericks added a comment -

How would you get the array class based on the :tag type hint?

Show
Gary Fredericks added a comment - How would you get the array class based on the :tag type hint?
Hide
Gary Fredericks added a comment -

I see (-> s (Class/forName) (.getComponentType) (.getName)) does the same thing – is that route preferred, or is there another one?

Show
Gary Fredericks added a comment - I see (-> s (Class/forName) (.getComponentType) (.getName)) does the same thing – is that route preferred, or is there another one?
Hide
Alexander Yakushev added a comment - - edited

I've taken a stab at the implementation suggested by Alex Miller. The attached patch uses getComponentType instead of string magic, and introduces the following improvements compared to the first patch:

  • aset is now supported as well. All indices but last are unrolled into RT.aget calls and the outermost call is expanded to RT.aset.
  • If the inlined version can't resolve the type of the array at any point, then, instead of unrolling into N RT.aget calls (which will cause N compiler reflection sites) it will generate an invocation of the non-inlined aget call that will use Array.get (which is faster).

On the downside, multi-arity calls to aset now trigger compiler reflection where it used to be Array.set. I wonder if it makes sense to smoothen out those corners too, or whether it's actually better that reflection in such cases is more visible (shows in compiler warnings).

I also attached the REPL log of validation examples (uses clj-java-decompiler). It would be nice to convert it into tests, but I don't yet know how.

On a side note, I had to use a hack ((var aget) ...) to force the non-inlined version to be called. Can't say I like it much, but I haven't come up with anything better. Another way would be to separate non-inlined version into a separate private function, but I don't want to add even more Vars to core namespace.

Show
Alexander Yakushev added a comment - - edited I've taken a stab at the implementation suggested by Alex Miller. The attached patch uses getComponentType instead of string magic, and introduces the following improvements compared to the first patch:
  • aset is now supported as well. All indices but last are unrolled into RT.aget calls and the outermost call is expanded to RT.aset.
  • If the inlined version can't resolve the type of the array at any point, then, instead of unrolling into N RT.aget calls (which will cause N compiler reflection sites) it will generate an invocation of the non-inlined aget call that will use Array.get (which is faster).
On the downside, multi-arity calls to aset now trigger compiler reflection where it used to be Array.set. I wonder if it makes sense to smoothen out those corners too, or whether it's actually better that reflection in such cases is more visible (shows in compiler warnings). I also attached the REPL log of validation examples (uses clj-java-decompiler). It would be nice to convert it into tests, but I don't yet know how. On a side note, I had to use a hack ((var aget) ...) to force the non-inlined version to be called. Can't say I like it much, but I haven't come up with anything better. Another way would be to separate non-inlined version into a separate private function, but I don't want to add even more Vars to core namespace.
Hide
Alexander Yakushev added a comment -

Patch reuploaded: minor fix.

Show
Alexander Yakushev added a comment - Patch reuploaded: minor fix.
Hide
Alexander Yakushev added a comment -

BTW, why does the docstring for aset says it works only on arrays of reference types? Is this comment outdated? We can fix it it along the way.

Show
Alexander Yakushev added a comment - BTW, why does the docstring for aset says it works only on arrays of reference types? Is this comment outdated? We can fix it it along the way.
Hide
Alexander Yakushev added a comment -

Is there a chance to review this for 1.10? Perhaps, the implementation is too complicated and needs more time to consider? I might try to reduce the scope then.

Show
Alexander Yakushev added a comment - Is there a chance to review this for 1.10? Perhaps, the implementation is too complicated and needs more time to consider? I might try to reduce the scope then.
Hide
Alex Miller added a comment -

It's complicated, not sure if its worth this or not. Probably won't look at it for 1.10.

Show
Alex Miller added a comment - It's complicated, not sure if its worth this or not. Probably won't look at it for 1.10.
Hide
Alexander Yakushev added a comment -

Thanks for the answer. In terms of complexity, I can remove the part that "optimizes" reflective calls, which both simplifies the algorithm and makes the reflection sites visible to the compiler (and thus to the user).

Show
Alexander Yakushev added a comment - Thanks for the answer. In terms of complexity, I can remove the part that "optimizes" reflective calls, which both simplifies the algorithm and makes the reflection sites visible to the compiler (and thus to the user).

People

Vote (2)
Watch (3)

Dates

  • Created:
    Updated: