Widen vec to take Iterable/IReduce

Description

These examples should work but do not:

Something Iterable but not IReduce:

user> (def i (eduction (map inc) (range 100))) #'user/i user> (instance? java.util.Collection i) false user> (instance? Iterable i) true user> (vec i) RuntimeException Unable to convert: class clojure.core.Iteration to Object[]

Something IReduceInit but not Iterable:

user=> (vec (reify clojure.lang.IReduceInit (reduce [_ f start] (reduce f start (range 10))))) RuntimeException Unable to convert: class user$reify__15 to Object[]

Proposal: Add PersistentVector.create(Iterable) and PersistentVector.create(IReduceInit) to efficiently create PVs from those. See also the blog post http://insideclojure.org/2015/01/07/vec-perf/.

For performance, vec has several cases:
1) (vec) if vector?: return new vector w/o meta - this matches prior behavior but has a constant cost of a few ns, rather than linear cost. If not a vector, spill to LazilyPersistentVector.create(Object).

2) (LPV) instanceof IReduceInit: Anything reducible can reduce itself fastest. Right now this has a big benefit for PersistentList. on 1.7.0-alpha4 with list of size 1024, into=28 seconds, vec=18 seconds. After patch, vec=7 seconds. As more things become IReduce, they'll take this path as well. This is also the branch that handles the new Eduction and IReduceInit cases.

3) (LPV) instanceof ISeq: If the coll is a sequence already, best to just walk it rather than build an iterator or array from it. This calls into PersistentVector.create(ISeq). That implementation now contains an optimization to build into an array and construct the PersistentVector directly from the array for sequences <= 32 elements (which is common). Once that threshold is reached, it switches to building with transients. The benchmark shows that the patch makes vec substantially faster for all seqs and even faster than into in some cases.

4) (LPV) instanceof Iterable: For all non-Clojure collections (ArrayList) and current non-IReduce Clojure collections (PHM, PHS), this is fastest path. Iterators are preferred to seqs as they do not cache or hold onto the values as they go by. The PV.create() for Iterable uses transients. Due to slightly more overhead, small maps and sets are slightly slower but this is largely fixed by CLJ-1499 which adds direct iterators. ArrayLists with <= 32 are special-cased - we can toArray() and construct the PV with a seeded node in this case. This type and size is particularly common in real code. Even so, very small ArrayLists are a bit slower than they were due to increased number of conditional checks I think.

5) (LPV) otherwise RT.toArray(): catches Map, String, Object[], primitive array, etc. The important ones here are the arrays - they are slightly slower on small arrays due to overhead of checking more cases above, but big arrays are significantly faster than they were.

In addition, there was one hard-coded path in the Compiler into PersistentVector.create() and I re-routed that through LazilyPersistentVector instead as that code is now the place to choose the fastest path logic.

Patch: clj-1546-6.patch, see numbers.png for perf comparison

Screened by: Stu

Environment

None

Attachments

7
  • 07 Jan 2015, 06:24 AM
  • 07 Jan 2015, 06:24 AM
  • 25 Nov 2014, 04:07 PM
  • 13 Nov 2014, 10:14 PM
  • 24 Oct 2014, 03:52 PM
  • 14 Oct 2014, 03:56 PM
  • 02 Oct 2014, 03:36 PM

Activity

Show:

Alex Miller January 11, 2015 at 2:12 PM

Thanks for the report! I will get that fixed in the next alpha.

Jonas De Vuyst January 11, 2015 at 12:47 PM

This patch breaks (vec (first {1 2}))
; ClassCastException clojure.lang.MapEntry cannot be cast to clojure.lang.IObj

Alex Miller November 25, 2014 at 3:58 PM

Added benchmark.png showing times (in ns), tested with criterium, for into and vec on different types and sizes on 1.7.0-alpha4 and then vec again after the patch.

Michael Blume November 14, 2014 at 1:31 AM

Apologies, maven just wasn't doing a good job of tracking changes, running mvn clean fixes the build.

Alex Miller November 14, 2014 at 1:28 AM

Did you clean first? I replaced that static method call there with a wider version but if you are cleaning fresh it should be fine.

Completed

Details

Assignee

Reporter

Approval

Ok

Patch

Code and Test

Priority

Affects versions

Fix versions

Created October 2, 2014 at 3:34 PM
Updated January 11, 2015 at 2:12 PM
Resolved January 11, 2015 at 2:12 PM