Completed
Details
Assignee
UnassignedUnassignedReporter
Alex MillerAlex MillerApproval
OkPatch
Code and TestPriority
CriticalAffects versions
Fix versions
Details
Details
Assignee
Unassigned
UnassignedReporter
Alex Miller
Alex MillerApproval
Ok
Patch
Code and Test
Priority
Affects versions
Fix versions
Created October 2, 2014 at 3:34 PM
Updated January 11, 2015 at 2:12 PM
Resolved January 11, 2015 at 2:12 PM
These examples should work but do not:
Something Iterable but not IReduce:
user> (def i (eduction (map inc) (range 100))) #'user/i user> (instance? java.util.Collection i) false user> (instance? Iterable i) true user> (vec i) RuntimeException Unable to convert: class clojure.core.Iteration to Object[]
Something IReduceInit but not Iterable:
user=> (vec (reify clojure.lang.IReduceInit (reduce [_ f start] (reduce f start (range 10))))) RuntimeException Unable to convert: class user$reify__15 to Object[]
Proposal: Add PersistentVector.create(Iterable) and PersistentVector.create(IReduceInit) to efficiently create PVs from those. See also the blog post http://insideclojure.org/2015/01/07/vec-perf/.
For performance, vec has several cases:
1) (vec) if vector?: return new vector w/o meta - this matches prior behavior but has a constant cost of a few ns, rather than linear cost. If not a vector, spill to LazilyPersistentVector.create(Object).
2) (LPV) instanceof IReduceInit: Anything reducible can reduce itself fastest. Right now this has a big benefit for PersistentList. on 1.7.0-alpha4 with list of size 1024, into=28 seconds, vec=18 seconds. After patch, vec=7 seconds. As more things become IReduce, they'll take this path as well. This is also the branch that handles the new Eduction and IReduceInit cases.
3) (LPV) instanceof ISeq: If the coll is a sequence already, best to just walk it rather than build an iterator or array from it. This calls into PersistentVector.create(ISeq). That implementation now contains an optimization to build into an array and construct the PersistentVector directly from the array for sequences <= 32 elements (which is common). Once that threshold is reached, it switches to building with transients. The benchmark shows that the patch makes vec substantially faster for all seqs and even faster than into in some cases.
4) (LPV) instanceof Iterable: For all non-Clojure collections (ArrayList) and current non-IReduce Clojure collections (PHM, PHS), this is fastest path. Iterators are preferred to seqs as they do not cache or hold onto the values as they go by. The PV.create() for Iterable uses transients. Due to slightly more overhead, small maps and sets are slightly slower but this is largely fixed by CLJ-1499 which adds direct iterators. ArrayLists with <= 32 are special-cased - we can toArray() and construct the PV with a seeded node in this case. This type and size is particularly common in real code. Even so, very small ArrayLists are a bit slower than they were due to increased number of conditional checks I think.
5) (LPV) otherwise RT.toArray(): catches Map, String, Object[], primitive array, etc. The important ones here are the arrays - they are slightly slower on small arrays due to overhead of checking more cases above, but big arrays are significantly faster than they were.
In addition, there was one hard-coded path in the Compiler into PersistentVector.create() and I re-routed that through LazilyPersistentVector instead as that code is now the place to choose the fastest path logic.
Patch: clj-1546-6.patch, see numbers.png for perf comparison
Screened by: Stu