Clojure

vec function is substantially slower than into function

Details

  • Type: Defect Defect
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Approval:
    Vetted

Description

(vec coll) and (into [] coll) do exactly the same thing. However, due to into using transients, it is substantially faster. On my machine:

(time (dotimes [_ 100] (vec (range 100000))))
"Elapsed time: 732.56 msecs"

(time (dotimes [_ 100] (into [] (range 100000))))
"Elapsed time: 491.411 msecs"

This is consistently repeatable.

Since vec's sole purpose is to transform collections into vectors, it should do so at the maximum speed available.

Activity

Hide
Ghadi Shayban added a comment -

While I agree that there are improvements and possibly low-hanging fruit, FWIW https://github.com/clojure/tools.analyzer/commit/cf7dda81a22f4c9c1fe64c699ca17e7deed61db4#commitcomment-5989545

showed a 5% slowdown from a few callsites in tools.analyzer.

This ticket's benchmark is incomplete in that it covers a single type of argument (chunked range), and flawed as it timing the expense of realizing the range. (That could be a legit benchmark case, but it shouldn't be the only one).

Sorry to rain on a parade. I promise like speed too!

Show
Ghadi Shayban added a comment - While I agree that there are improvements and possibly low-hanging fruit, FWIW https://github.com/clojure/tools.analyzer/commit/cf7dda81a22f4c9c1fe64c699ca17e7deed61db4#commitcomment-5989545 showed a 5% slowdown from a few callsites in tools.analyzer. This ticket's benchmark is incomplete in that it covers a single type of argument (chunked range), and flawed as it timing the expense of realizing the range. (That could be a legit benchmark case, but it shouldn't be the only one). Sorry to rain on a parade. I promise like speed too!
Hide
Gary Fredericks added a comment -

Is there any downside to (defn vec [coll] (into [] coll)) (or the inlined equivalent)?

Show
Gary Fredericks added a comment - Is there any downside to (defn vec [coll] (into [] coll)) (or the inlined equivalent)?
Hide
Alan Malloy added a comment -

I'm pretty sure the difference is that into uses reduce: since reducers were added in 1.5, chunked sequences know how to reduce themselves without creating unnecessary cons cells. PersistentVector/create doesn't use reduce, so it has to allocate a cons cell for each item in the sequence.

Show
Alan Malloy added a comment - I'm pretty sure the difference is that into uses reduce: since reducers were added in 1.5, chunked sequences know how to reduce themselves without creating unnecessary cons cells. PersistentVector/create doesn't use reduce, so it has to allocate a cons cell for each item in the sequence.
Hide
Andy Fingerhut added a comment -

I am pretty sure that Clojure 1.5.1 also uses transient vectors for (vec (range n)) (probably also some earlier versions of Clojure, too).

Look at vec in core.clj. It checks whether its arg is a java.util.Collection, which lazy seqs are, so calls (clojure.lang.LazilyPersistentVector/create coll).

LazilyPersistentVector's create method checks whether its argument is an ISeq, which lazy seqs are, so it calls PersistentVector.create(RT.seq(coll)).

All 3 of PersistentVector's create() methods use transient vectors to build up the result.

I suspect the difference in run times are not because of transients or not, but because of the way into uses reduce, and perhaps may also have something to do with the perhaps-unnecessary call to RT.seq in LazilyPersistentVector's create method (in this case, at least – it is likely needed for other types of arguments).

Show
Andy Fingerhut added a comment - I am pretty sure that Clojure 1.5.1 also uses transient vectors for (vec (range n)) (probably also some earlier versions of Clojure, too). Look at vec in core.clj. It checks whether its arg is a java.util.Collection, which lazy seqs are, so calls (clojure.lang.LazilyPersistentVector/create coll). LazilyPersistentVector's create method checks whether its argument is an ISeq, which lazy seqs are, so it calls PersistentVector.create(RT.seq(coll)). All 3 of PersistentVector's create() methods use transient vectors to build up the result. I suspect the difference in run times are not because of transients or not, but because of the way into uses reduce, and perhaps may also have something to do with the perhaps-unnecessary call to RT.seq in LazilyPersistentVector's create method (in this case, at least – it is likely needed for other types of arguments).

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated: