<< Back to previous view

[CLJ-894] Better reduce performance Created: 09/Dec/11  Updated: 13/Apr/12  Resolved: 13/Apr/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Backlog
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Declined Votes: 2
Labels: None

Attachments: Text File clj-894-better-reduce-performance-patch2.txt    
Patch: Code and Test

 Description   

Currently reduce is implemented in Clojure by calling seq on a collection. Construction of an entire new seq is an unnecessary and expensive operation, particularly for the key persistent collections (map, set and vector suffer - list is not a problem because it implements ISeq directly for free).

This patch proposes to improve this by doing the following:
1. Make use of a Java interface for reducible collections (IReduce)
2. Make persistent collections support implementations of reduce
directly by implementing IReduce
3. Modify the internal-reduce protocol to operate on concrete
collections (not just seqs) and make use of IReduce implementations
where available
4. Change reduce itself to call internal-reduce directly rather than
calling seq first



 Comments   
Comment by Mike Anderson [ 09/Dec/11 3:58 AM ]

Link to initial discussion on Clojure Dev Google Group

http://groups.google.com/group/clojure-dev/browse_frm/thread/bdab332c4e47721f?hl=en

Comment by Mike Anderson [ 09/Dec/11 3:59 AM ]

Link to experimental implementation on GitHub:

https://github.com/mikera/clojure/tree/better-reduce

Comment by Mike Anderson [ 09/Dec/11 4:00 AM ]

Performance testing shows approx 2x speedup for relevant operations:

Clojure 1.3.0
(def ms (zipmap (range 100) (range 100)))
#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 646.512414 msecs"
"Elapsed time: 553.264593 msecs"
"Elapsed time: 544.999672 msecs"
"Elapsed time: 510.606868 msecs"
"Elapsed time: 513.192451 msecs"
"Elapsed time: 537.665664 msecs"
"Elapsed time: 524.166514 msecs"
"Elapsed time: 512.966964 msecs"
"Elapsed time: 501.677219 msecs"
"Elapsed time: 496.379194 msecs"

Clojure 1.4.0-alpha2
user=> (def ms (zipmap (range 100) (range 100)))
#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 548.822683 msecs"
"Elapsed time: 469.275299 msecs"
"Elapsed time: 464.742096 msecs"
"Elapsed time: 443.500129 msecs"
"Elapsed time: 431.272138 msecs"
"Elapsed time: 430.514649 msecs"
"Elapsed time: 432.753752 msecs"
"Elapsed time: 431.195876 msecs"
"Elapsed time: 435.254274 msecs"
"Elapsed time: 433.516375 msecs"

Clojure 1.4.0 snapshot (with better reduce modifications)
(def ms (zipmap (range 100) (range 100)))
#'user/ms
user=> (dotimes [i 10] (time (dotimes [i 100000] (reduce (fn [acc [k
v]] (+ acc v)) 0 ms))))
"Elapsed time: 202.29696 msecs"
"Elapsed time: 186.589505 msecs"
"Elapsed time: 179.691805 msecs"
"Elapsed time: 184.191644 msecs"
"Elapsed time: 183.265131 msecs"
"Elapsed time: 180.162578 msecs"
"Elapsed time: 182.323219 msecs"
"Elapsed time: 181.810649 msecs"
"Elapsed time: 182.896285 msecs"
"Elapsed time: 187.30153 msecs"

Comment by Mike Anderson [ 19/Feb/12 4:03 AM ]

better_reduce branch now updated to merge in the latest changes from the clojure/clojure master branch.

Comment by Brian Taylor [ 19/Feb/12 1:29 PM ]

Reviewers:

To make this change easier to review (and for a cleaner history), I mashed all of the commits that make up this change together and rebased that onto the latest from clojure/clojure master.

https://github.com/netguy204/clojure/tree/better-reduce
https://github.com/netguy204/clojure/commit/b262a69c32d26bb6f3c5ba711310361d77609a22

Hope it helps.

Comment by Mike Anderson [ 19/Feb/12 6:24 PM ]

Awesome, thanks Brian! I must admit that my git-fu is not yet good enough to do things like this, so your help is much appreciated!

Comment by Andy Fingerhut [ 23/Feb/12 1:52 AM ]

clj-894-better-reduce-performance-patch1.txt was created from pulling Brian Taylor's Clojure github repo, creating a diff of it, applying it to latest master as of Feb 22, 2012, and making a properly formatted patch from it. It has only a single commit, and has Mike Anderson's name and email address in it, along with a date of 4 Dec 2011, when it appears he made his final commit of this work. It compiles and tests cleanly, and I have run Mike's timing experiments and seen similar speed improvements.

No docstrings need changing that I can think of. Mike and I have both signed CAs. I don't see Brian Taylor's name on the list of those who signed a CA, but as long as the patch contains only code that Mike wrote, it should be clean. Patch status is "Code and Test".

Comment by Andy Fingerhut [ 23/Feb/12 2:00 AM ]

Correcting Mike's name and email address in patch.

Comment by Brian Taylor [ 23/Feb/12 5:08 AM ]

The patch contains only code that Mike wrote.

Comment by Mike Anderson [ 23/Feb/12 7:32 AM ]

Thanks guys I've reviewed the patch and can confirm that it looks like exactly the changes I intended to make. I am happy for them to be committed as per my signed CA.

Comment by Rich Hickey [ 13/Apr/12 8:05 AM ]

Similar objective now achieved via a protocol:
https://github.com/clojure/clojure/commit/1f90942690d5395330347cb31fdb3d69cea1ec56

Generated at Wed Oct 01 21:55:23 CDT 2014 using JIRA 4.4#649-r158309.