<< Back to previous view

[CLJS-736] Functions folder and reducer broken for types nil and array + fix for typo Created: 29/Dec/13  Updated: 18/Jan/14

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jonas De Vuyst Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File CLJS-736-alt.patch     Text File CLJS-736.patch     Text File CLJS-736-patch-1-redux.patch     Text File CLJS-alt-satisfies.patch    
Patch: Code and Test

 Description   

1. This currently doesn't work:

(->> nil
(r/map identity)
(r/reduce + 0))
; org.mozilla.javascript.JavaScriptException: Error: No protocol method IReduce.-reduce defined for type null

The reason for this is that reducers created by r/reducer or r/folder, invoke -reduce (of IReduce) directly. They thereby bypass the special case for nil in the function r/reduce.

2. An entirely analogous problem exists for collections of type array.

3. The patch CLJS-700 mistakenly defined coll-fold for the type cljs.core/IPersistentVector. This should have been cljs.core/PersistentVector. (There exists no protocol IPersistentVector in ClojureScript.)

I will shortly attach a patch that addresses all of the above problems by implementing IReduce for nil and array. The patch also includes unit tests.



 Comments   
Comment by Jonas De Vuyst [ 29/Dec/13 2:22 PM ]

Alternative patch in which r/reduce and r/fold treat arrays and nil as special cases – as opposed to having arrays and nil implement IReduce and CollFold.

The functions r/reducer, r/folder, and the protocol methods of r/Cat now call r/reduce and r/fold instead of calling -reduce and coll-fold directly.

This patch also fixes a bug in the coll-fold implementation for Cat, which previously used (reducef) as the initial value rather than (combinef). The new code is copied and pasted from the Clojure implementation and uses the fork-join stubs.

Comment by David Nolen [ 30/Dec/13 8:23 AM ]

The implements? should probably be a satisfies? in the second patch. Have you run any benchmarks of before/after the patch?

Comment by Jonas De Vuyst [ 30/Dec/13 11:24 AM ]

If I understand correctly then (satisfies? x y) is roughly equivalent to (or (implements? x y) (natively-satisfies? x y)).

If native types (nil, array, object currently) are treated as special cases then implements? seems more appropriate.

satisfies? works also, however, so I have attached a new 'alt' patch.

Comment by Jonas De Vuyst [ 30/Dec/13 11:26 AM ]

The first patch is in fact faster when running the following code:

(time (->> (repeat 1000 (vec (range 1000)))
vec
(r/mapcat identity)
(r/map inc)
(r/filter even?)
(r/fold +)))

This takes about 700 msecs. Using the first patch this terminates 100-300 msecs faster. This is after repeated (but informal) testing.

I guess the worry is that the first patch would slow down other random code since it involves extending the types nil, array, and object. I'm not sure what exactly I should test for though.

(Note that the 2nd and 3rd patch also contain a fix for Cat and include more unit tests. The first patch should preferably not be applied as-is.)

Comment by David Nolen [ 30/Dec/13 11:35 AM ]

Yeah you're timing too many things, including vec, range, lazy sequences. Also testing a small N. Take a look at the reducers example on the Mori README - https://github.com/swannodette/mori. Thanks.

Comment by Jonas De Vuyst [ 30/Dec/13 12:52 PM ]

I tried running the following code:

(let [coll (vec (repeat 1000 (vec (range 10))))]
  (time (doseq [n (range 1000)]
               (->> coll
                    (r/mapcat identity)
                    (r/map inc)
                    (r/filter even?)
                    (r/fold +)))))

Some of the last results I got were:

1st patch: 75680 msecs
2nd patch: 76585 msecs

Truth be told, although the first patch seemed to win most of the times, sometimes the second patch was faster.

One other thing I tried was removing the implements?/satisfies? check from the second patch and overriding the protocol method coll-fold for the type object instead (as in the first patch). This 'hybrid' approach generally (but not always) seemed to result in a slowdown.

I'm not sure how I should proceed. Should I perhaps just run both patches simultaneously for several minutes?

Comment by David Nolen [ 30/Dec/13 1:21 PM ]

This is still a bad way to do timing, you're recording the cost of range and seq'ing. Use dotimes.

Comment by Jonas De Vuyst [ 30/Dec/13 4:33 PM ]

Hm. I guess the lazy sequence does lead to a lot of allocations.

Alright, I rewrote my test and ran it a few more times. I now also tested on both vectors and arrays.

Patch 1 needed a slight tweak. When coll-fold is invoked, patch 1 only specifies a fallback for type object (i.e. r/reduce is called). I had to add the same fallback for type array. (This is weird!)

So here are the results.

For vectors:

(let [coll (vec (repeat 100 (vec (range 100))))]
  (time (dotimes [n 3000]
          (->> coll
              (r/mapcat identity)
              (r/map inc)
              (r/filter even?)
              (r/fold +)))))

Patch 1: 205872 msecs
Patch 2: 210756 msecs

For arrays:

(let [coll (into-array (repeat 100 (into-array (range 100))))]
  (time (dotimes [n 3000]
          (->> coll
              (r/mapcat identity)
              (r/map inc)
              (r/filter even?)
              (r/fold +)))))

Patch 1: 123567 msecs
Patch 2: 119704 msecs

I ran my tests a few times and the results were pretty consistent. Patch 1 is faster for vectors and patch 2 is faster for arrays.

This makes sense.

In patch 1 reducer will call -reduce directly. In patch 2, reducer first calls r/reduce, which calls -reduce if the collection is a vector and array-reduce if it's an array. Hence patch 2 contains an extra function call in the case of vectors, but avoids invoking a protocol method on a native type in the case of arrays.

Using macros (or copy and paste) the extra function call can be avoided. Would that be worth trying or is it more important to keep the code clean?

I just realized that patch 2 is semantically slightly different from what Clojure does, although perhaps this is a bug in Clojure: <https://groups.google.com/forum/#!searchin/clojure-dev/kv-reduce/clojure-dev/bEqECvbExGo/iW4B2vEUh8sJ>. My suggestion to use a macro (or copy and paste) to avoid the extra function call in patch 2, could also fix this discrepancy.

Comment by David Nolen [ 30/Dec/13 4:42 PM ]

How are you benchmarking this? With V8? JavaScriptCore? SpiderMonkey? In the browser? What optimization settings, etc.

Comment by Jonas De Vuyst [ 30/Dec/13 4:48 PM ]

I used repljs (Rhino?). I'll test again in a more realistic setting tomorrow.

Comment by David Nolen [ 30/Dec/13 4:54 PM ]

Yeah, benchmarking with Rhino isn't informative.

Comment by Jonas De Vuyst [ 31/Dec/13 1:40 AM ]

I compiled the same code (with n=3000) using cljs with "{:optimizations :advanced}".

I then tested it in the latest stable releases of Firefox, Chrome, and Safari. I closed all my browsers. For each browser I then followed the following procedure:

  • Open the browser
  • Open the developer console
  • Run the benchmark for patch 1
  • Run the benchmark for patch 2
  • Run the benchmark for patch 1 and write down the result
  • Run the benchmark for patch 2 and write down the result
  • Close the browser

Firefox:

  • Patch 1. Vectors: 26057 msecs
  • Patch 1. Arrays: 25026 msecs
  • Patch 2. Vectors: 26258 msecs
  • Patch 2. Arrays: 36653 msecs
  • Summary: Patch 1 is faster for vectors and arrays

Chrome:

  • Patch 1. Vectors: 7804 msecs
  • Patch 1. Arrays: 7092 msecs
  • Patch 2. Vectors: 7754 msecs
  • Patch 2. Arrays: 6768 msecs
  • Summary: Patch 2 is faster for vectors and arrays

Safari:

  • Patch 1. Vectors: 167230 msecs
  • Patch 1. Arrays: 108780 msecs
  • Patch 2. Vectors: 173940 msecs
  • Patch 2. Arrays: 110012 msecs
  • Summary: Patch 1 is faster for vectors and arrays

I'm not sure what to make of this.

Comment by Jonas De Vuyst [ 31/Dec/13 2:47 AM ]

I have attached a new version of the first patch.

This patch fixes an issue with r/Cat. (This issue was also addressed in the second and third patch. A unit test is included.).

This patch also fixes r/fold for arrays.

To summarize, a choice needs to be made between the following patches.

  • CLJS-736-patch-1-redux.patch
  • CLJS-736-alt.patch (uses implements?) / CLJS-alt-satisfies.patch (uses satisfies?)

The implementation details are patch-1-redux is more similar in spirit to the Clojure source code. The alt patches are more similar in spirit to the ClojureScript source code.

As explained above, the alt patches are semantically a bit different from the original Clojure source—but it's not clear which behavior is 'right'.

Comment by David Nolen [ 16/Jan/14 5:27 PM ]

The benchmarks would be more informative if they explained the performance before and after that patch.

Comment by Jonas De Vuyst [ 18/Jan/14 11:55 AM ]

r/reduce previously didn't work for nil or JavaScript arrays.

One reason why I have trouble recommending a patch is that I don't know what use case you would like to optimize for.

Comment by David Nolen [ 18/Jan/14 12:30 PM ]

Yes but now that we have new logic we can at least test the lack of regression on the other types.

Comment by David Nolen [ 18/Jan/14 12:40 PM ]

Ok I tried to apply this patch and run ./script/benchmarks in the repo but the patch will no longer apply. Can we rebase the patch on master. Thanks. If you also want to give the benchmarks a shot follow these instructions to install the JS engines - http://github.com/clojure/clojurescript/wiki/Running-the-tests. Then you can also run the benchmarks at the command line. I see there aren't any reducers benchmarks, I will add some.

Generated at Thu Oct 23 14:10:57 CDT 2014 using JIRA 4.4#649-r158309.