<< Back to previous view

[CLJ-1644] into-array fails for sequences starting with nil Created: 15/Jan/15  Updated: 11/Apr/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Michael Blume Assignee: Michael Blume
Resolution: Unresolved Votes: 0
Labels: arrays, ft

Attachments: Text File CLJ-1644-array-first-nil-v1.patch     Text File CLJ-1644-array-first-nil-v2.patch    
Patch: Code and Test
Approval: Triaged

 Description   

The into-array doc string implies (although you could read "if pressent" in different ways) that into-array will fall back to an Object array.

user=> (doc into-array)
-------------------------
clojure.core/into-array
([aseq] [type aseq])
  Returns an array with components set to the values in aseq. The array's
  component type is type if provided, or the type of the first value in
  aseq if present, or Object. All values in aseq must be compatible with
  the component type. Class objects for the primitive types can be obtained
  using, e.g., Integer/TYPE.
user=> (into-array [nil 1 2])
NullPointerException   clojure.lang.RT.seqToTypedArray (RT.java:1691)

Approach: Check for nil and use Object as the array type.

Patch: CLJ-1644-array-first-nil-v2.patch

Screened by:



 Comments   
Comment by Michael Blume [ 15/Jan/15 1:45 PM ]

uploading patch v1 (adding nil as a cons-able element in CLJ-1643 will also bring this out, but I don't want to make the one patch depend on the other)

Comment by Phill Wolf [ 12/Mar/15 7:06 PM ]

Searching through the sequence for a non-null item is not consistent with into-array's docstring. The docstring says, "The array's component type is type if provided, or the type of the first value in aseq if present, or Object." In keeping with the docstring, wouldn't a null first item suggest an array of Object?

Working harder than that (by searching the sequence) only delays the inevitable: a whole sequence of nulls producing an Object array, instead of an array of the type the programmer expected, and triggering a run-time crash.

In summary: this patch goes farther than necessary, but even so, it does not cure the risk of unexpected results from nulls. A simpler remedy – returning an array of Object if the first item is null – would be consistent with the docstring and avoid raising unfounded expectations. Adding a statement that null is of type Object to the docstring could help programmers avoid falling into the trap.

Comment by Alex Miller [ 12/Mar/15 8:29 PM ]

No search through the sequence will pass screening, please just add a nil check.

Comment by Michael Blume [ 13/Mar/15 4:39 PM ]

done





[CLJ-1289] aset-* and aget perform poorly on multi-dimensional arrays even with type hints. Created: 01/Nov/13  Updated: 14/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Michael O. Church Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: arrays, performance
Environment:

Clojure 1.5.1.

Dependencies: criterium


Attachments: Text File CLJ-1289-p1.patch    
Patch: Code
Approval: Triaged

 Description   

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
             Execution time mean : 9.457308 µs
    Execution time std-deviation : 126.220954 ns
   Execution time lower quantile : 9.344450 µs ( 2.5%)
   Execution time upper quantile : 9.629202 µs (97.5%)
                   Overhead used : 2.477107 ns

One workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
             Execution time mean : 6.999273 ns
    Execution time std-deviation : 0.112703 ns
   Execution time lower quantile : 6.817782 ns ( 2.5%)
   Execution time upper quantile : 7.113845 ns (97.5%)
                   Overhead used : 2.477107 ns

Cause: The inlined version only applies to arity 2, and otherwise it reflects.



 Comments   
Comment by Gary Fredericks [ 08/Dec/13 9:28 PM ]

A glance at the source makes it obvious that the hypothesis is correct – the inlined version only applies to arity 2, and otherwise it reflects.

I thought this would be as simple as converting the inline function to be variadic (using reduce), but after trying it I realized this is tricky as you have to generate the correct type hints for each step. E.g., given ^"[[D" the inline function needs to type-hint the intermediate result with ^"[D". This isn't difficult if we're just dealing with strings that begin with square brackets, but I don't know for sure that those are the only possibilities.

Comment by Yaron Peleg [ 13/Feb/14 4:44 AM ]

Bump. I just got bitten bad by this.

There are two seperate issues here:
1) (aget 2d-array-doubles 0 0 ) doesn't emit a reflection warning.
2) It seems like the compiler has enough information to avoid the reflective call here.

Note this gets exp. worse as number of dimensions grows, i.e (get doubles3d 0 0 0)
will be 1M slower, etc' Not true, unless you iterate over all elements. it's
simply n_dims*1000x per lookup.

Nasty surprise, especially considering you often go to primitive arrays for speed,
and a common use case is an inner loop(s) that iterate over arrays.

Comment by Gary Fredericks [ 13/Feb/14 7:08 AM ]

I can probably take a stab at this.

Comment by Gary Fredericks [ 13/Feb/14 8:34 PM ]

I think the reflection warning problem is pretty much impossible to solve without changing code elsewhere in the compiler, because the reflection done in aget is a different kind than normal clojure reflection – it's explicitly in the function body rather than emitted by the compiler. Since the compiler isn't emitting it, it doesn't reasonably know it's even there. So even if aget is fixed for other arities, you still won't get the warning when it's not inlined.

I can imagine some sort of metadata that you could put on a function telling the compiler that it will reflect if not inlined. Or maybe a more generic not-inlined warning?

The global scope of adding another compiler flag seems about balanced by the seriousness of array functions not being able to warn on reflection.

Comment by Gary Fredericks [ 13/Feb/14 8:52 PM ]

Attached CLJ-1289-p1.patch which simply inlines variadic calls to aget. It assumes that if it sees a :tag on the array arg that is a string beginning with [, it can assume that the return value from one call to aget can be tagged with the same string with the leading [ stripped off.

I'm not a jvm expert, but having read through the spec a little bit I think this is a reasonable assumption.

Comment by Alex Miller [ 14/Feb/14 3:34 PM ]

I think this probably is actually true, but a more official way to ask that question would be to get the array class and ask for Class.getComponentType() (and less janky than string munging).

Comment by Gary Fredericks [ 14/Feb/14 3:40 PM ]

How would you get the array class based on the :tag type hint?

Comment by Gary Fredericks [ 14/Feb/14 7:05 PM ]

I see (-> s (Class/forName) (.getComponentType) (.getName)) does the same thing – is that route preferred, or is there another one?





[CLJ-1288] aset-* and aget: on multi-dimensional arrays (e.g. double[][]) these fn reflect (and, thus, perf. poorly) even with type hints. Created: 01/Nov/13  Updated: 01/Nov/13  Resolved: 01/Nov/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Michael O. Church Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: arrays, performance
Environment:

Clojure 1.5.1.

Dependencies: criterium



 Description   

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
Execution time mean : 9.457308 µs
Execution time std-deviation : 126.220954 ns
Execution time lower quantile : 9.344450 µs ( 2.5%)
Execution time upper quantile : 9.629202 µs (97.5%)
Overhead used : 2.477107 ns
nil

A (n ugly) workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
Execution time mean : 6.999273 ns
Execution time std-deviation : 0.112703 ns
Execution time lower quantile : 6.817782 ns ( 2.5%)
Execution time upper quantile : 7.113845 ns (97.5%)
Overhead used : 2.477107 ns
nil



 Comments   
Comment by Alex Miller [ 01/Nov/13 1:09 PM ]

Dupe of CLJ-1289.





Generated at Sun Apr 19 12:52:35 CDT 2015 using JIRA 4.4#649-r158309.