<< Back to previous view

[UNIFY-2] Remove reflection warnings Created: 05/Jan/12  Updated: 05/Jan/12  Resolved: 05/Jan/12

Status: Resolved
Project: core.unify
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Fogus Assignee: Fogus
Resolution: Completed Votes: 0
Labels: performance, reflection, unify

Approval: Ok

 Description   
Reflection warning, clojure/core/unify.clj:30 - reference to field getClass can't be resolved.
Reflection warning, clojure/core/unify.clj:30 - reference to field isArray can't be resolved.

These reflective calls occur frequently enough that they should be resolved.



 Comments   
Comment by Fogus [ 05/Jan/12 7:37 AM ]

Fixed in 6b6d1130bf857439d1863f6fc574a7a6541b84b8.





[TRDR-28] cljs.tools.reader performance enhancements Created: 17/Jul/15  Updated: 17/Jul/15  Resolved: 17/Jul/15

Status: Closed
Project: tools.reader
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: David Nolen Assignee: Nicola Mometto
Resolution: Completed Votes: 0
Labels: patch, performance

Attachments: Text File TRDR_28.patch    

 Description   

It's possible to get a near 2 orders of magnitude performance enhancement from the ClojureScript implementation.



 Comments   
Comment by Nicola Mometto [ 17/Jul/15 2:10 PM ]

https://github.com/clojure/tools.reader/commit/ae4518079465fafe2f44685e9c3b85d4cd8f17a9





[TMACRO-8] System Performance Created: 16/May/15  Updated: 16/May/15

Status: Open
Project: tools.macro
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Syed Ibrahimshah Assignee: Konrad Hinsen
Resolution: Unresolved Votes: 0
Labels: performance
Environment:

OS


Attachments: Text File Test file.txt    

 Description   

My system performance is too slow. When i try to open any applications in my system, it is getting stuck.
Kindly look into the issue and do the needful asap.






[TCHECK-66] Make rose trees a deftype Created: 14/Apr/15  Updated: 06/Jun/15  Resolved: 06/Jun/15

Status: Resolved
Project: test.check
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Tom Crayford Assignee: Gary Fredericks
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File deftype_rose_tree.patch     Text File deftype_rose_tree.patch    
Patch: Code

 Description   

This turns rose trees into a custom `deftype` that implements `clojure.lang.Indexed` (for backwards compatability - it means you can still use nth on them without any changes, and means we can interchange vectors with them).

It's just a performance patch - I noticed when profiling the test.check suite I have at Yeller that a lot of time was spent creating clojure vectors (it was the highest sample in the suite).

This produced a noticeable difference (at 95% confidence rating as measured across test runs using ministat) on my test suite, but didn't make such a difference on `test.check`'s (no noticeable difference at 95% confidence). I assume that's because of generator complexity.

I've also minorly changed `root` and `children` in rose-tree.clj - they no longer call `nth` twice, just once. This produced again, a minorly noticeable perf difference. I tried as well overriding them to an interface and having `RoseTree` implement it, but apparently the JIT's inlining is smart enough (even in C1) that there's no noticeable difference between that and children/root being implemented with nth.



 Comments   
Comment by Gary Fredericks [ 15/Apr/15 8:07 AM ]

I think one difference this patch makes is that (nth rt 2) no longer throws an IndexOutOfBoundsException, which probably doesn't make a difference currently but could make future bugs harder to find. Was this an intentional change?

Otherwise this looks pretty straightforward.

Comment by Tom Crayford [ 15/Apr/15 7:02 PM ]

Sent a second patch (it's the one under 15th April in Jira for me).

Comment by Gary Fredericks [ 06/Jun/15 2:47 PM ]

Applied on master, thanks!





[TBENCH-16] Add performance testing for STM Created: 23/Feb/12  Updated: 23/Feb/12

Status: Open
Project: test.benchmark
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Stefan Kamphausen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, test
Environment:

64bit Linux, Quad-core


Attachments: File stm-perf-TBENCH-16.diff    

 Description   

Write a test-suite to detect regressions in STM performance.



 Comments   
Comment by Stefan Kamphausen [ 23/Feb/12 2:38 PM ]

Patch which adds an stm-namespace containing a collection of performance test functions.





[MCOMB-4] Performance enhancement for sorted-numbers? Created: 12/Jul/14  Updated: 19/Jul/14  Resolved: 19/Jul/14

Status: Resolved
Project: math.combinatorics
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Daniel Marjenburgh Assignee: Mark Engelberg
Resolution: Completed Votes: 0
Labels: enhancement, patch, performance

Attachments: File sorted-numbers-perf-enh.diff    
Patch: Code
Approval: Accepted

 Description   

Hi,

Came upon this as I was trying to improve performance. The implementation of sorted-numbers? (only used by permutations) is:
(defn- sorted-numbers?
"Returns true iff s is a sequence of numbers in non-decreasing order"
[s]
(and (every? number? s)
(every? (partial apply <=) (partition 2 1 s))))

<= and similar operators are variadic, so partitioning into pairs is unneeded.
(apply <= s) is a lot faster, but breaks for empty sequences, so an additional check is needed.

The implementation can be changed to:

(and (every? number? s)
(or (empty? s) (apply <= s)))

I benched this to be 10 to 15 times faster. A regression test with test.check was done to verify that the behaviour was not changed under any input.
A patch is also included.

Cheers,

Daniel



 Comments   
Comment by Andy Fingerhut [ 13/Jul/14 7:39 PM ]

Thanks for the submission. I don't know whether Mark would be interested in taking the test.check regression test you developed, too, but if you would be willing to share it as a patch, it might be considered for inclusion as well.

Comment by Daniel Marjenburgh [ 18/Jul/14 5:43 AM ]

Hm, I don't know how I could include the regression test as a patch. Because:
1) The current project does not use external libraries or Leiningen (yet). How do I add dependencies when not using Leiningen?
2) I tested the old implementation vs the new one with generated inputs. If the old implementation is gone, there's no test to add...

Comment by Mark Engelberg [ 19/Jul/14 5:06 PM ]

Since this function is only called once at the beginning of the permutations process, on a sequence that is usually 10 or fewer items, I can't imagine this improvement will have any meaningful impact on the overall running time of the permutations algorithm. Nevertheless, it's an improvement, so I've gone ahead and added it.





[LOGIC-177] binding to a deep nested vector takes too much time Created: 24/Dec/15  Updated: 28/Dec/15

Status: Open
Project: core.logic
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Dmitry Geurkov Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance
Environment:

Ubuntu Linux 12.04 Java: OpenJDK 64-Bit Server VM 1.8.0_45-internal-b14 Clojure: 1.7.0 core.logic "0.8.10"



 Description   

It's probably related to how core.logic is printing out the vector structure

Following is minimal case to reproduce the issue

(ns testclj.logic
(:require [clojure.core.logic :as l]))

(def data [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[:a]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]])

(time (l/run* [q] (l/== q data)))



 Comments   
Comment by Dmitry Geurkov [ 28/Dec/15 6:48 AM ]

I think the problem is related to following piece of code

(defn walk* [s v]
  (let [v (walk s v)]
    (walk-term v
      (fn [x]
        (let [x (walk s x)]
         (if (tree-term? x)
           (walk* s x)
           x))))))

this on the other hand calls

clojure.lang.IPersistentVector
  (walk-term [v f]
    (with-meta
      (loop [v v r (transient [])]
        (if (seq v)
          (recur (next v) (conj! r (walk-term (f (first v)) f)))
          (persistent! r)))
      (meta v)))

What this does is that it recursively calls walk-term for all elements of vector.
and subsequently calls walk-term for each element of vector which in turn results in calling walk* over the
elements of vector which had previously been visited which in turn ends into calling walk-term again on the elements of the vector again going down and until it visits the entire structure multiple times.

This overall results in revisiting sub nodes multiple times

The problem is in walk* function part which is recursively calling walk* for vector elements that had been previously walked by walk-term function call itself

(if (tree-term? x)
           (walk* s x)
           x)

I think there is a huge problem with how this whole part is designed, I'm not sure what to suggest here but I think we need to somehow combine walk-term and walk* into one function that would walk the entire datastructure in one go, any suggestions?

Possible solution is following code but I'm not sure if it's correct one or not

(defn walk* [s v]
  (let [v (walk s v)]
    (walk-term v
        #(walk s %))))




[LOGIC-166] walk* exponential in depth of tree terms? Created: 27/Jan/15  Updated: 03/Feb/15

Status: Open
Project: core.logic
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Tom Jack Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance
Environment:

[org.clojure/clojure "1.7.0-alpha5"]
[org.clojure/core.logic "0.8.9"]


Attachments: Text File 0001-Walk-less.patch    

 Description   

By 'reification' I mean e.g.:

(run 1 [t]
  (== t <some-constant-to-be-reified>))

Demonstration of exponential performance, using nested vectors ([], [[]], [[[]]]...):
https://www.refheap.com/1e30c198d528300fcba9ef24a

Is this expected, or a bug? I'm guessing the former, but hoping the latter. Feel free to close without comment if this is just the way it is.



 Comments   
Comment by Tom Jack [ 28/Jan/15 4:31 AM ]

Attached patch 0001-Walk-less.patch gets me this nice result. It also makes my real case work quite speedily (25ms now, where before it took so long that I never saw it finish). That's a huge relief!

The problem wasn't reification, but walk-term. In the patch I attempt to avoid doing some walking which appears to be redundant. I hope the fact that most of the tests still pass is good evidence that it's redundant.

However, the patch breaks some unifier tests. I noticed that my refheap test 'passes' as of bd65104ec3~, but fails as of bd65104ec3 (the patch for LOGIC-69). It seems the fix made the unifier rely on the current behavior of walk-term.

I'm going to press on with my patched ./checkouts copy for now, since I'm not using the unifier. I may come back to try to fix it sometime.





[LOGIC-50] Rel relation PersistentHashSet becomes LazySeq after issuing a retraction Created: 01/Sep/12  Updated: 28/Jul/13  Resolved: 05/Sep/12

Status: Closed
Project: core.logic
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Aaron Brooks Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File core.logic-rel-001.patch     Text File core.logic-rel-002.patch    
Patch: Code

 Description   

The first retraction of facts from a relation transforms the internal PersistentHashSet -set var+atom, which guarantees uniqueness, into a LazySeq which allows duplicate facts to be added. Once the first retraction occurs, a LazySeq persists for the rest of the life of the relation. Only the value of the primary -set var+atom is affected, not the -index var+atom values.

This issue is not an external correctness issue but does affect performance of subsequent duplicate fact additions which grow the size relation.

Preparation:

user=> (defrel foo x y)
#<user$eval4287$fn__4288 user$eval4287$fn__4288@1a9d489b>
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@52aaf223: #{[:joe :man] [:jane :woman] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet
user=> (retraction foo :jane :woman)
nil

Without patch applied:

user=> foo_2-set
#<Atom@52aaf223: ([:joe :man] [:sue :boy])>
user=> (class @foo_2-set)
clojure.lang.LazySeq
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@52aaf223: ([:sue :boy] [:jane :woman] [:joe :man] [:joe :man] [:sue :boy])>
user=> (class @foo_2-set)
clojure.lang.LazySeq

With patch applied:

user=> foo_2-set
#<Atom@31eb9b15: #{[:joe :man] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@31eb9b15: #{[:joe :man] [:jane :woman] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet

I've filed this as a Minor issue as it does not affect core.logic correctness and degraded performance can be avoided by not re-asserting duplicate facts.

I will also issue a GitHub pull request which can be used or ignored at your convenience.



 Comments   
Comment by David Nolen [ 02/Sep/12 5:45 PM ]

Thanks! Is this meant to be applied to master?

Comment by Aaron Brooks [ 03/Sep/12 6:56 PM ]

(cf. here and GitHub I'll keep this thread on JIRA.) Yes, this is targeted for master. I don't know if this warrants a release unto itself.

Comment by Aaron Brooks [ 04/Sep/12 9:17 AM ]

I forgot to mention that I needed to add "src/test/clojure" to the project.clj :source-paths vector to get lein1/2's test command to run the tests. Is that expected? Is there another way to run the tests that I'm missing? I'm happy to file a separate ticket/patch to address this if project.clj needs to be modified.

Comment by David Nolen [ 04/Sep/12 9:25 AM ]

Thanks for the information. Will apply to master. I actually run tests with "mvn test", I've updated :source-paths so other people can run the tests with lein.

Comment by David Nolen [ 04/Sep/12 9:26 AM ]

I tried to apply the patch but I'm getting a "Patch format detection failed".

Comment by Aaron Brooks [ 04/Sep/12 4:16 PM ]

The attached patch should work fine with git-apply (in either "cat foo.patch |git apply" or "git apply foo.patch" form). I made the GitHub pull request as I figured that would be the easiest path to pull the changes in.

Comment by David Nolen [ 04/Sep/12 6:04 PM ]

Yes the patch is not properly formatted and we're not supposed to take pull requests. The patch is missing attribution info. I normally apply patches with "git am foo.patch". I used the following guide for figuring out how to generate patches that can be applied with "git am", http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git.

Comment by Aaron Brooks [ 05/Sep/12 9:25 AM ]

My apologies for the extra run-around here. I've attached an -002 version of the patch created with git-format-patch which should be amenable to git-am.

Comment by David Nolen [ 05/Sep/12 11:32 AM ]

fixed, http://github.com/clojure/core.logic/commit/9bc6eb42be28bfd2b493657344f6eea8f5ed657c. pushing out a 0.8.0-alpha3 release as well.





[JDBC-152] Reflection in 0.7.0-beta2 Created: 02/Jul/17  Updated: 04/Jul/17  Resolved: 03/Jul/17

Status: Closed
Project: java.jdbc
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Trivial
Reporter: Jeaye Wilkerson Assignee: Sean Corfield
Resolution: Completed Votes: 0
Labels: performance, reflection


 Description   

Reflection warning, clojure/java/jdbc.clj:980:49 - call to method getColumnLabel can't be resolved (target class is unknown).
Reflection warning, clojure/java/jdbc.clj:985:29 - reference to field next can't be resolved.



 Comments   
Comment by Sean Corfield [ 02/Jul/17 9:13 PM ]

Thanks! I'll take a look on Monday and add something to my test setup so no more reflection issues creep in!

Comment by Jeaye Wilkerson [ 02/Jul/17 11:56 PM ]

Sounds great, thanks!

Comment by Sean Corfield [ 03/Jul/17 12:41 PM ]

Removed check-all from project.clj as you can't run check on Clojure 1.7/1.8 due to spec-using code. Updated my local test-all.sh to run check on Clojure 1.9 so that reflection warnings will be highlighted by my normal testing process with this library.

Comment by Sean Corfield [ 03/Jul/17 12:49 PM ]

Added type hints in two places in reducible-result-set to address reflection warnings. Will be in 0.7.0.

Comment by Sean Corfield [ 04/Jul/17 11:05 PM ]

Available now in 0.7.0-beta3.

Comment by Jeaye Wilkerson [ 04/Jul/17 11:07 PM ]

Thank you!





[JDBC-31] distinct? throws clojure.lang.ArityException, when applied with no arguments Created: 12/May/12  Updated: 01/Jun/16  Resolved: 10/Jun/12

Status: Closed
Project: java.jdbc
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jürgen Hötzel Assignee: Sean Corfield
Resolution: Completed Votes: 0
Labels: patch,, performance

Attachments: Text File 0001-add-make-cols-unique-test-case-empty-cols.patch     Text File 0002-distinct-throws-clojure.lang.ArityException-when-app.patch    
Patch: Code and Test

 Description   

HSQLDB returns an empty ResultSet when using (.getGeneratedKeys stmt)
and no keys are generated. So this Exception is thrown for each record
without generated keys.

While this Exception is caught in do-prepared-return-keys, this can lead to a huge overhead caused by the JVM exception handling. I did a performance test.

Before Patch:

clojure.java.test-jdbc> (time (sql/with-connection hsqldb-db (count (apply sql/insert-records :dummy  (map #(hash-map :name (str %) :id %) (range 10000))))))
"Elapsed time: 3429.346743 msecs"
10000

After Patch:

 clojure.java.test-jdbc> (time (sql/with-connection hsqldb-db (count (apply sql/insert-records :dummy  (map #(hash-map :name (str %) :id %) (range 10000))))))
"Elapsed time: 1397.444753 msecs"


 Comments   
Comment by Sean Corfield [ 10/Jun/12 5:24 PM ]

Thanx for the patch!





[JDBC-29] Performance improvement (Remove intermediate lazy sequence in resultset-seq) Created: 10/May/12  Updated: 01/Jun/16  Resolved: 10/May/12

Status: Closed
Project: java.jdbc
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Jürgen Hötzel Assignee: Sean Corfield
Resolution: Completed Votes: 0
Labels: patch,, performance

Attachments: Text File 0001-Don-t-create-intermediate-lazy-sequences-of-vectors.patch    

 Description   

This improves performance on large result sets by up to 30%.



 Comments   
Comment by Sean Corfield [ 10/May/12 12:09 PM ]

Nice catch!





[CLJS-2383] Improve perf of select-keys by using keyword-identical? Created: 17/Oct/17  Updated: 18/Oct/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.908
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File 0001-CLJS-2383-Speed-up-select-keys.patch     Text File 0002-CLJS-2383-Speed-up-select-keys-no-transient.patch     Text File CLJS-2383.patch    
Patch: Code and Test

 Description   

select-keys uses not= to compare keywords. Instead, using keyword-identical? results in decent speedups (an average of 1.34 across benchmarks and engines). Note that using identical? and lookup-sentinel doesn't appear to improve perf.

Speedup Summary:

V8: 1.15, 1.08, 1.08
  SpiderMonkey: 1.71, 1.48, 1.67
JavaScriptCore: 1.33, 1.35, 1.25
       Nashorn: 1.16, 1.04, 0.97
    ChakraCore: 1.59, 1.66, 1.72

Before:

Benchmarking with V8
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 179 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 121 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 183 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 251 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 201 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 290 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 112 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 73 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 119 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 1277 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 524 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 635 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 463 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 268 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 414 msecs

After

Benchmarking with V8
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 155 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 112 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 169 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 146 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 135 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 173 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 84 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 54 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 95 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 1099 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 502 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 648 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 292 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 151 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 240 msecs


 Comments   
Comment by Erik Assum [ 17/Oct/17 2:37 PM ]

Just want to bring your attention to CLJ-1789, where reimplementing `select-keys` in terms of reduce gave a significant speedup.
Added a patch to show that way.

Comment by Mike Fikes [ 18/Oct/17 6:46 AM ]

Here are the perf numbers for Erik's patch:

V8: 0.81, 1.14, 1.30
  SpiderMonkey: 1.49, 1.31, 1.58
JavaScriptCore: 1.02, 0.99, 0.96
       Nashorn: 1.13, 1.17, 1.21
    ChakraCore: 1.27, 1.35, 1.28

After:

Benchmarking with V8
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 220 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 106 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 141 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 169 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 153 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 183 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 110 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 74 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 124 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 1128 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 447 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 524 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c]), 200000 runs, 366 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :x]), 200000 runs, 199 msecs
[m {:a 1, :b 2, :c 3, :d 4}], (select-keys m [:a :b :c :x :y :z]), 200000 runs, 323 msecs
Comment by Erik Assum [ 18/Oct/17 6:56 AM ]

Uploaded a patch without transients as well.

Comment by Mike Fikes [ 18/Oct/17 7:40 AM ]

Since the CLJ-1789 patch works better with larger maps, here is an additional perf test covering that case using the data from that ticket, testing both the original patch I had attached and Erik's subsequent patch. You can see the CLJ-1789 approach pays off for ClojureScript as well.

Erik, I see you attached a 3rd patch. I'd recommend adding perf numbers with each such patch, so the effect of the patch under advanced optimizations can be more readily assessed.

Engine          keyword-identical?  CLJ-1789
            V8:               1.13      1.29
  SpiderMonkey:               1.89      2.39
JavaScriptCore:               1.02      0.96
       Nashorn:               1.12      1.42
    ChakraCore:               1.68      1.82

Before:

Benchmarking with V8
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 373 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 668 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 200 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 2236 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 1074 msecs

After (keyword-identical?)

Benchmarking with V8
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 330 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 353 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 197 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 1991 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 640 msecs

After (CLJ-1789)

Benchmarking with V8
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 290 msecs

Benchmarking with SpiderMonkey
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 279 msecs

Benchmarking with JavaScriptCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 209 msecs

Benchmarking with Nashorn
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 1578 msecs

Benchmarking with ChakraCore
;;; select-keys
[m {:a "b", :c "d", :b "b", :d "d", :e "e", :f "f", :g "g"}], (select-keys m [:a :c :b :d :e :f :g]), 200000 runs, 591 msecs
Comment by Erik Assum [ 18/Oct/17 7:54 AM ]

Both patches should have benchmarks now

Comment by Erik Assum [ 18/Oct/17 7:56 AM ]

oh, and as a comment to the comment about larger maps, I believe the performance `transient` bit is dependent on the size of the selected keys,
eg the more selected keys found in the map, the more we gain from `conj!`





[CLJS-2342] Speed up printing of js object by using forEach and regex optimizations Created: 30/Aug/17  Updated: 01/Oct/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.908
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-2342-2.patch     Text File CLJS-2342.patch    
Patch: Code

 Description   

By using goog.object/forEach, along with direct interop for regex checking it is possible to speed up the printing of JavaScript objects anywhere between 1.14 and 1.77



 Comments   
Comment by Mike Fikes [ 30/Aug/17 10:50 AM ]
Speedup Summary:

        Engine  small, medium, with sub-object
            V8: 1.42, 1.31, 1.19
  SpiderMonkey: 1.14, 1.48, 1.41
JavaScriptCore: 1.48, 1.58, 1.62
       Nashorn: 1.27, 1.36, 1.20
    ChakraCore: 1.49, 1.77, 1.77


Before:

Benchmarking with V8
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 74 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 84 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 75 msecs

Benchmarking with SpiderMonkey
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 95 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 92 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 134 msecs

Benchmarking with JavaScriptCore
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 46 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 41 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 55 msecs

Benchmarking with Nashorn
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 1228 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 1048 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 620 msecs

Benchmarking with ChakraCore
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 55 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 76 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 129 msecs


After:

Benchmarking with V8
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 52 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 64 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 63 msecs

Benchmarking with SpiderMonkey
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 83 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 62 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 95 msecs

Benchmarking with JavaScriptCore
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 31 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 26 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 34 msecs

Benchmarking with Nashorn
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 970 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 769 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 518 msecs

Benchmarking with ChakraCore
[obj (js-obj "a" 1 "b" 2)], (pr-str obj), 1000 runs, 37 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (pr-str obj), 1000 runs, 43 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (pr-str obj), 1000 runs, 73 msecs
Comment by Mike Fikes [ 01/Oct/17 9:27 AM ]

Added CLJS-2342-2.patch which rebases against current master.





[CLJS-2341] Speed up js->clj on objs using forEach and transient volatile Created: 30/Aug/17  Updated: 31/Aug/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.908
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-2341.patch    
Patch: Code

 Description   

It is possible to speed up js->clj on JavaScript objects by revising the implementation to employ goog.object/forEach, accumulating by bashing on a transient volatile map.

The speedups range from 1.5 to 2.1 over the current implementation.

Note: The current implementation uses js-keys. The optimization in CLJS-2340 could help js->clj, but it doesn't appear to provide much speedup in itself (perhaps 1.1?) compared to the change in implementation described above.



 Comments   
Comment by Mike Fikes [ 30/Aug/17 9:19 AM ]
Speedup Summary:

        Engine  small, medium, with sub-object
            V8: 1.62, 1.50, 1.13
  SpiderMonkey: 1.91, 1.74, 1.59
JavaScriptCore: 1.67, 1.74, 2.10
       Nashorn: 1.54, 2.13, 1.51
    ChakraCore: 1.71, 2.10, 1.95


Before:

Benchmarking with V8
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 55 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 63 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 93 msecs

Benchmarking with SpiderMonkey
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 155 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 87 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 94 msecs

Benchmarking with JavaScriptCore
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 45 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 33 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 42 msecs

Benchmarking with Nashorn
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 1123 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 1195 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 773 msecs

Benchmarking with ChakraCore
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 65 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 44 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 78 msecs


After:

Benchmarking with V8
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 34 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 42 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 82 msecs

Benchmarking with SpiderMonkey
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 81 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 50 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 59 msecs

Benchmarking with JavaScriptCore
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 27 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 19 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 20 msecs

Benchmarking with Nashorn
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 728 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 561 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 513 msecs

Benchmarking with ChakraCore
[obj (js-obj "a" 1 "b" 2)], (js->clj obj), 1000 runs, 38 msecs
[obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6)], (js->clj obj), 1000 runs, 21 msecs
[sub-obj (js-obj "g" 7 "h" 8 "i" 9 "j" 10) obj (js-obj "a" 1 "b" 2 "c" 3 "d" 4 "e" 5 "f" 6 "s" sub-obj)], (js->clj obj), 1000 runs, 40 msecs




[CLJS-2339] Significant code reload slowdown with :npm-deps Created: 29/Aug/17  Updated: 24/Sep/17  Resolved: 24/Sep/17

Status: Resolved
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.908
Fix Version/s: Next

Type: Enhancement Priority: Minor
Reporter: Petter Eriksson Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2339-2.patch     Text File CLJS-2339.patch    
Patch: Code
Approval: Accepted

 Description   

After migrating our dependencies from cljsjs to node_modules we noticed that figwheel took a lot longer to reload our code.

I asked in #clojurescript if anyone had an idea of what could be done and @anmonteiro wrote:
@petterik might just be because we're processing all your node dependencies through Closure on every reload feel free to open a JIRA ticket, we could probably employ a caching strategy somewhere

Versions used:
clojurescript "1.9.908"
lein-figwheel "0.5.13"

npm-depm deps:
:react-helmet "5.1.3"
:react "15.6.1"
:react-dom "15.6.1"
as well as some devDependencies in package.json.



 Comments   
Comment by Petter Eriksson [ 29/Aug/17 1:23 PM ]

If needed to repro, here are the additional devDependencies:
"phantomjs": "1.9.19",
"foundation-cli": "2.2.3",
"bower": "1.8.0"

Comment by David Nolen [ 15/Sep/17 3:32 PM ]

This ticket will need more information. It might just be a Figwheel issue.

Comment by Tony Kay [ 17/Sep/17 9:58 PM ]

OK, so I have some additional interesting facts. It does compile and work (where "work" is defined as "renders a thing from the npm module").

cljs: 1.9.908
cljsbuild: 1.1.7 via lein
Heap size: 2g
npm-deps: react, react-dom, and @blueprintjs/core
cljs deps: Om Next
verbose: true

Project is an om-next app, so cljsjs.react is in the classpath

Building the application without the npm deps and no code that refers to them: 46 seconds (CPU time)
Adding the NPM deps (with install true, but no code that uses them) increases this just be the amount of the install: 59 seconds
using the npm deps increases compile time to: 3 minutes 50 seconds

Critically: with verbose on, I can see that om/next.cljs takes about 15s in the first two cases, and 3 minutes in the final one. In other words: the thing that is slow is compiling next.cljc. Nothing else gets slower.

I'm not sure if this is even going to work "right" when it does compile, since I'm not sure how the cljsjs.react and npm react can co-exist (this is where I assume Om Next probably just needs to be made to use npm instead of cljsjs?)

But, since I saw that Petter was pulling in similar deps, he might be hitting a similar problem with cljsjs.react and npm react both being "in scope" so to speak.

When I use it from figwheel, the time between reloads becomes unusable. I assume it is related, but I have no data to back that up.

EDIT: I had missed the new doc on :global-exports. I'm going to try that and add my results.

Comment by Tony Kay [ 17/Sep/17 10:51 PM ]

So, I fixed the build to be "correct" with `:global-exports so that I only have the NPM version of react and react-dom around (excluded cljsjs/react and react-dom from Om Next). The compile time for next.cljc is still about 3 minutes (compared to the "normal" 15s)

BUT, I then removed blueprint from my `:npm-deps` (and usage), but kept react, react-dom, and a use of react (the NPM one) The time to compile next.cljc dropped to about a minute! Still 4x slower than "normal", but 4x faster than with blueprint in the npm deps. It seems that a file using an npm dep see a significant slowdown that is somehow proportional to the amount of total npm deps code "in view".

Obviously, Om Next uses React, but not blueprint. Yet, it's compile time is significantly affected.

What Antonio said about processing them all through Closure certainly sounds relevant. Kind of a let down to go from recent speed-ups in compiler speed to this sudden jolt of a performance hit when we finally get a better interface to libraries

Comment by António Nuno Monteiro [ 17/Sep/17 11:35 PM ]

Patch attached.

Comment by Tony Kay [ 17/Sep/17 11:48 PM ]

That patch fixes the build issue on plain cljsbuild.

Figwheel now starts quickly (first complete compile), but a simple file change on a small project takes 12s to hot code reload, making it pretty unusable.

Comment by António Nuno Monteiro [ 18/Sep/17 12:01 AM ]

So it looks like this is a 2-part problem.

The first one, which my patch solved, is related to CLJS compilation (where we were performing unnecessary computations on every symbol analysis – which slowed down proportionally to the number of JS modules processed).

The second problem is that we process every JS module on every code reload: the solution for this one is implementing a strategy for figuring out if we need to process JS modules again (e.g. based on last modified timestamps of source files, just like `cljs.compiler/requires-compilation?`).

Comment by António Nuno Monteiro [ 18/Sep/17 10:42 PM ]

The attached CLJS-2339-2.patch contains the changes in the previous patch and also solves the issues around reloading, only running the foreign libraries that are modules through Closure if any of the source files in the dependency graph have changed.

I'd appreciate if people who are seeing the issue can try out the patch and report back.

Comment by Tony Kay [ 19/Sep/17 11:01 PM ]

So, I did some profiling with visualvm, and gave the results to Antonio. They were showing the vast majority of the time being consumed by `pipe`, under the direction of the node module discovery functions. His new version of the second patch definitely improves reload speed considerably. My hot code reload went from about 14 seconds (via patch 1) to 2 seconds with the new version of patch 2. This is on a project with Om Next, and some largish node libraries.

Comment by David Nolen [ 24/Sep/17 4:36 AM ]

fixed https://github.com/clojure/clojurescript/commit/245bdee2c35e19a9685b7a0849f26fce8bdaf7ca





[CLJS-2336] Call alength once in areduce and amap Created: 26/Aug/17  Updated: 27/Aug/17  Resolved: 27/Aug/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.908
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2336.patch    
Patch: Code
Approval: Accepted

 Description   

Make a single call to alength in areduce and amap, essentially porting relevant aspects of CLJ-1765 and CLJ-1901 to ClojureScript.



 Comments   
Comment by Mike Fikes [ 26/Aug/17 5:32 PM ]
Before:

Benchmarking with V8
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 63 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 83 msecs

Benchmarking with SpiderMonkey
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 26 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 796 msecs

Benchmarking with JavaScriptCore
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 92 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 98 msecs

Benchmarking with Nashorn
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 1258 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 1433 msecs

Benchmarking with ChakraCore
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 9 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 67 msecs


After:

Benchmarking with V8
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 56 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 79 msecs

Benchmarking with SpiderMonkey
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 23 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 787 msecs

Benchmarking with JavaScriptCore
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 93 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 104 msecs

Benchmarking with Nashorn
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 991 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 1252 msecs

Benchmarking with ChakraCore
[arr (to-array (range 1000000))], (reset! x (areduce arr i ret 0 (+ ret (aget arr i)))), 1 runs, 9 msecs
[arr (to-array (range 1000000))], (amap arr i ret (* 10 (aget arr i))), 1 runs, 71 msecs
Comment by David Nolen [ 27/Aug/17 4:55 PM ]

fixed https://github.com/clojure/clojurescript/commit/998933f5090254611b46a2b86626fb17cabc994a





[CLJS-2314] Eliminate str call on literal strings in str macro Created: 09/Aug/17  Updated: 10/Aug/17  Resolved: 10/Aug/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.854
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2314.patch    
Patch: Code
Approval: Accepted

 Description   

The str macro unconditionally calls str on each of its arguments. This can be skipped for literal strings, saving on the unnecessary extra runtime call.

For example: (str "x" x "y") would be compiled down to JavaScript with fewer calls ["x",u.c(a),"y"].join("")
and silly things like (str "a" "b") compile all the way down to "ab".



 Comments   
Comment by Mike Fikes [ 09/Aug/17 4:17 PM ]

Hey David, on the surface, this seems like a reasonable change, which could even benefit users targeting :simple or :none environments.

If you'd like to see some benchmarks added to the suite that help justify whether it is worth it, let me know.

Comment by David Nolen [ 10/Aug/17 6:27 AM ]

fixed https://github.com/clojure/clojurescript/commit/5c12d660561492ab98bddd39d5e5da1dfb6bb74f





[CLJS-2300] Delegate clojure.string/capitalize to goog.string/capitalize Created: 04/Aug/17  Updated: 01/Oct/17

Status: In Progress
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.854
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-2300-2.patch     Text File CLJS-2300-3.patch     Text File CLJS-2300.patch    
Patch: Code and Test
Approval: Accepted

 Description   

Since the implementation of clojure.string/capitalize (6 years ago), Closure library added an identical function (3 years ago), which is actually much faster:

cljs.user=> (simple-benchmark [s "abc"] (goog.string/capitalize s) 10000000)
[s "abc"], (goog.string/capitalize s), 10000000 runs, 1181 msecs
nil
cljs.user=>  (simple-benchmark [s "abc"] (clojure.string/capitalize s) 10000000)
[s "abc"], (clojure.string/capitalize s), 10000000 runs, 5295 msecs

We could just have ClojureScript's delegate to Closure's.



 Comments   
Comment by Mike Fikes [ 04/Aug/17 9:17 PM ]

Before/After tests look promising:

Before:

Benchmarking with V8
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 166 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 380 msecs
Benchmarking with SpiderMonkey
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 833 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 1541 msecs
Benchmarking with JavaScriptCore
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 53 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 438 msecs
Benchmarking with Nashorn
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 903 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 1181 msecs
Benchmarking with ChakraCore
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 345 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 727 msecs

After:

Benchmarking with V8
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 28 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 28 msecs
Benchmarking with SpiderMonkey
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 263 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 263 msecs
Benchmarking with JavaScriptCore
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 5 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 5 msecs
Benchmarking with Nashorn
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 327 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 327 msecs
Benchmarking with ChakraCore
[s "a" f clojure.string/capitalize], (f s), 1000000 runs, 57 msecs
[s "aBcDeF" f clojure.string/capitalize], (f s), 1000000 runs, 63 msecs
Comment by Mike Fikes [ 04/Aug/17 10:53 PM ]

The benchmarks produced in the previous comment are with included in the attached patch.

The patch also includes a generative regression test, using the previous implementation as a reference.

All engines pass the test fine if you crank up the number of values tested, except for ChakraCore. ChakraCore appears to be buggy, and I'll follow up upstream with a ticket for them. At its core, if you capitalize "ß" you should get back "SS". With the original ClojureScript implementation, you get back "ﮰ" on ChakraCore, and with the revision in the patch (essentially Closure Library) you get back "鯪", neither of which are correct. So, this patch doesn't introduce a regression on ChakraCore.

Comment by Mike Fikes [ 05/Aug/17 11:59 AM ]

ChakraCore bug: https://github.com/Microsoft/ChakraCore/issues/421

Comment by Mike Fikes [ 01/Oct/17 9:34 AM ]

CLJS-2300-2.patch rebases against current master.

Comment by Mike Fikes [ 01/Oct/17 11:50 AM ]

I took a look at ChakraCore's upper-case bug, and it is fairly non-trivial for them to fix. (The code relies on upper-casing strings in-place and thus can't handle things like ß -> SS .) Additionally, SpiderMonkey does the wrong thing mapping ß -> ß.

Given that engines are evidently prone to get this wrong, and we don't want to live with unit tests failing forever, attached CLJS-2300-3.patch essentially works around this by limiting the generative test to covering ASCII strings.





[CLJS-2238] Perf regression with node module indexing Created: 14/Jul/17  Updated: 14/Jul/17  Resolved: 14/Jul/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.854
Fix Version/s: 1.9.854

Type: Defect Priority: Blocker
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: js-modules, npm-deps, performance

Attachments: Text File CLJS-2238.patch    
Patch: Code
Approval: Accepted

 Description   

On master, I'm observing that, using Figwheel, changes to source take about 12 seconds to appear in a React Native app running in an iOS simulator, whereas with the 1.9.671 such changes take about 2 seconds.

I apologize for not having a minimal repro available. I suspect it is easily reproducible locally using a setup generated using re-natal init FutureApp (see https://github.com/drapanjanas/re-natal)

Of note: I don't have :npm-deps set. My compiler options are:

{:output-to     "target/ios/not-used.js"
 :main          "env.ios.main"
 :output-dir    "target/ios"
 :optimizations :none
 :checked-arrays false 
 :warnings      true}

A git bisect shows that this was introduced with the change in CLJS-2212.



 Comments   
Comment by Mike Fikes [ 14/Jul/17 1:26 PM ]

For reference, in handle-js-modules the top-level set ends up having 14748 elements, and it does appear to take about 12 seconds to produce that number.

My requires set has 77 elements and the node-required (the intersection) has 0 elements.

Comment by David Nolen [ 14/Jul/17 2:03 PM ]

fixed https://github.com/clojure/clojurescript/commit/b17b83121486d6e8ffe9887208cdf105993b73cd

Comment by Mike Fikes [ 14/Jul/17 2:04 PM ]

With the patch, changes appear in about 2 seconds.





[CLJS-2132] Optimize transient vector creation Created: 27/Jun/17  Updated: 27/Jun/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.655
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: A. R Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-2132.patch    
Patch: Code

 Description   

This is a very simple optimization around transient []. It avoids copying the empty array.

Performance improvements, for mapv on smallish vectors (5-32) elements anywhere from 20% up to 100% across FF & Chrome.

(defn faster-editable-root
  [node]
  (if (identical? (.-EMPTY_NODE PersistentVector) node)
    (VectorNode. (js-obj) (make-array 32))
    (VectorNode. (js-obj) (aclone (.-arr node)))))
(def orig-editabe-root tv-editable-root)
(enable-console-print!)
(dotimes [_ 2]
  (doseq [size [5 10 40]]
    (let [xs (range size)
          sims 500000]
      (set! tv-editable-root orig-editabe-root)
      (prn "Size: " size)
      (simple-benchmark [] (mapv inc xs) sims)
      (set! tv-editable-root faster-editable-root)
      (prn "NEW:")
      (simple-benchmark [] (mapv inc xs) sims))))





[CLJS-2120] Optimize keyword function Created: 24/Jun/17  Updated: 25/Jun/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.562
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: A. R Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-2120.patch    
Patch: Code
Approval: Accepted

 Description   

keyword function can be sped up. This matters when keywords are created dynamically when doing XHR.

The patch now matches Clojure more closely (using substring). It's also optimized for passing strings.

Results:

(enable-console-print!)
(let [sims 1000000]
  (dotimes [_ 2]
    (doseq [x ["foo" "foo/bar" [nil "foo"] ["foo" "bar"] [:foo :bar] [nil :foo]]]
      (prn "Testing keyword with: " x)
      (if (vector? x)
        (do (simple-benchmark [[a0 a1] x] (set! js/FOOO (keyword a0 a1)) sims)
            (simple-benchmark [[a0 a1] x] (set! js/FOOO (keyword2 a0 a1)) sims))
        (do (simple-benchmark [] (set! js/FOOO (keyword x)) sims)
            (simple-benchmark [] (set! js/FOOO (keyword2 x)) sims))))))


"Testing keyword with: " "foo"
[], (set! js/FOOO (keyword x)), 1000000 runs, 194 msecs
[], (set! js/FOOO (keyword2 x)), 1000000 runs, 71 msecs
"Testing keyword with: " "foo/bar"
[], (set! js/FOOO (keyword x)), 1000000 runs, 260 msecs
[], (set! js/FOOO (keyword2 x)), 1000000 runs, 104 msecs
"Testing keyword with: " [nil "foo"]
[[a0 a1] x], (set! js/FOOO (keyword a0 a1)), 1000000 runs, 278 msecs
[[a0 a1] x], (set! js/FOOO (keyword2 a0 a1)), 1000000 runs, 188 msecs
"Testing keyword with: " ["foo" "bar"]
[[a0 a1] x], (set! js/FOOO (keyword a0 a1)), 1000000 runs, 379 msecs
[[a0 a1] x], (set! js/FOOO (keyword2 a0 a1)), 1000000 runs, 215 msecs
"Testing keyword with: " [:foo :bar]
[[a0 a1] x], (set! js/FOOO (keyword a0 a1)), 1000000 runs, 351 msecs
[[a0 a1] x], (set! js/FOOO (keyword2 a0 a1)), 1000000 runs, 207 msecs
"Testing keyword with: " [nil :foo]
[[a0 a1] x], (set! js/FOOO (keyword a0 a1)), 1000000 runs, 376 msecs
[[a0 a1] x], (set! js/FOOO (keyword2 a0 a1)), 1000000 runs, 37 msecs


 Comments   
Comment by A. R [ 24/Jun/17 10:56 AM ]

Changes the behavior of:

((juxt namespace name) (keyword "foo/bar/hmm"))
=> [nil "foo"]
(.-fqn (keyword "foo/bar/hmm"))
=> "foo/bar/hmm"
((juxt namespace name) (keyword2 "foo/bar/hmm"))
=> ["foo" "bar/hmm"]
(.-fqn (keyword2 "foo/bar/hmm"))
=> "foo/bar/hmm"

Clojure 1.9:

((juxt namespace name) (keyword "foo/bar/hmm"))
=> ["foo" "bar/hmm"]

So: yay





[CLJS-2112] Iterator based reduce path Created: 21/Jun/17  Updated: 23/Jun/17  Resolved: 23/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.562
Fix Version/s: 1.9.655

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: Thomas Mulvaney
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2112.1.patch     Text File CLJS-2112.patch    
Patch: Code
Approval: Accepted

 Description   

In Clojure, when reducing collections which do not support IReduce but are Iterable, reduce uses the iter-reduce function to efficiently reduce these collections. Maps based on PersistentArrayMap and PersistentHashMap and the PersistentHashSet set type would immediately benefit from this. PersistentTreeMap and PersistentTreeSet currently do not support IIterator and would continue to fall back to the current seq-reduce.



 Comments   
Comment by Thomas Mulvaney [ 21/Jun/17 7:22 AM ]

Some crude benchmarks suggest a 2-4x improvement under V8:

== Maps ==
Patch:
[r (mk-map 4)], (reduce (fn [_ _] nil) r), 1000000 runs, 416 msecs
[r (mk-map 4)], (reduce (fn [_ _] nil) nil r), 1000000 runs, 508 msecs
[r (mk-map 8)], (reduce (fn [_ _] nil) r), 1000000 runs, 775 msecs
[r (mk-map 8)], (reduce (fn [_ _] nil) nil r), 1000000 runs, 762 msecs
[r (mk-map 100)], (reduce (fn [_ _] nil) r), 1000 runs, 38 msecs
[r (mk-map 100)], (reduce (fn [_ _] nil) nil r), 1000 runs, 31 msecs
[r (mk-map 1000)], (reduce (fn [_ _] nil) r), 1000 runs, 212 msecs
[r (mk-map 1000)], (reduce (fn [_ _] nil) nil r), 1000 runs, 220 msecs
[r (mk-map 10000)], (reduce (fn [_ _] nil) r), 1000 runs, 1784 msecs
[r (mk-map 10000)], (reduce (fn [_ _] nil) nil r), 1000 runs, 1839 msecs

Master:
[r (mk-map 4)], (reduce (fn [_ _] nil) r), 1000000 runs, 1084 msecs
[r (mk-map 4)], (reduce (fn [_ _] nil) nil r), 1000000 runs, 1133 msecs
[r (mk-map 8)], (reduce (fn [_ _] nil) r), 1000000 runs, 1999 msecs
[r (mk-map 8)], (reduce (fn [_ _] nil) nil r), 1000000 runs, 2335 msecs
[r (mk-map 100)], (reduce (fn [_ _] nil) r), 1000 runs, 47 msecs
[r (mk-map 100)], (reduce (fn [_ _] nil) nil r), 1000 runs, 48 msecs
[r (mk-map 1000)], (reduce (fn [_ _] nil) r), 1000 runs, 759 msecs
[r (mk-map 1000)], (reduce (fn [_ _] nil) nil r), 1000 runs, 670 msecs
[r (mk-map 10000)], (reduce (fn [_ _] nil) r), 1000 runs, 7860 msecs
[r (mk-map 10000)], (reduce (fn [_ _] nil) nil r), 1000 runs, 8406 msecs

== Sets ==
Patch:
[r (into #{} (range 4))], (reduce (fn [_ _] nil) r), 1000000 runs, 516 msecs
[r (into #{} (range 4))], (reduce (fn [_ _] nil) nil r), 1000000 runs, 523 msecs
[r (into #{} (range 8))], (reduce (fn [_ _] nil) r), 1000000 runs, 657 msecs
[r (into #{} (range 8))], (reduce (fn [_ _] nil) nil r), 1000000 runs, 811 msecs
[r (into #{} (range 100))], (reduce (fn [_ _] nil) r), 1000 runs, 28 msecs
[r (into #{} (range 100))], (reduce (fn [_ _] nil) nil r), 1000 runs, 19 msecs
[r (into #{} (range 1000))], (reduce (fn [_ _] nil) r), 1000 runs, 267 msecs
[r (into #{} (range 1000))], (reduce (fn [_ _] nil) nil r), 1000 runs, 323 msecs
[r (into #{} (range 10000))], (reduce (fn [_ _] nil) r), 1000 runs, 1568 msecs
[r (into #{} (range 10000))], (reduce (fn [_ _] nil) nil r), 1000 runs, 1852 msecs

Master:
[r (into #{} (range 4))], (reduce (fn [_ _] nil) r), 1000000 runs, 1362 msecs
[r (into #{} (range 4))], (reduce (fn [_ _] nil) nil r), 1000000 runs, 1361 msecs
[r (into #{} (range 8))], (reduce (fn [_ _] nil) r), 1000000 runs, 2361 msecs
[r (into #{} (range 8))], (reduce (fn [_ _] nil) nil r), 1000000 runs, 2295 msecs
[r (into #{} (range 100))], (reduce (fn [_ _] nil) r), 1000 runs, 74 msecs
[r (into #{} (range 100))], (reduce (fn [_ _] nil) nil r), 1000 runs, 150 msecs
[r (into #{} (range 1000))], (reduce (fn [_ _] nil) r), 1000 runs, 884 msecs
[r (into #{} (range 1000))], (reduce (fn [_ _] nil) nil r), 1000 runs, 859 msecs
[r (into #{} (range 10000))], (reduce (fn [_ _] nil) r), 1000 runs, 8202 msecs
[r (into #{} (range 10000))], (reduce (fn [_ _] nil) nil r), 1000 runs, 7649 msecs
Comment by David Nolen [ 21/Jun/17 11:09 AM ]

For patches like this it's important to provide more context to aid assessment. Does this actually match Clojure's approach? Is there any reason for these collection types to not implement IReduce instead?

Comment by Thomas Mulvaney [ 21/Jun/17 12:43 PM ]

The iter-reduce in the patch is basically a 1-1 copy of what is in Clojure's clojure.core.protocols namespace.

In Clojure the reduce path way looks like: Check if coll is an instance of IReduce, if so use it otherwise call clojure.core.protocols/coll-reduce which will fall back on iter-reduce, seq-reduce , etc.

Is there any reason not to implement IReduce for the collections instead? The collections mentioned above don't have IReduce in Clojure - not sure if that is a good reason.

Should implementing IReduce for all collections be chosen it should be fairly simple:

PAM, PHM and PTM's IReduce would be basically a copy of their IKVReduce implementation. PHS and PTS's IReduce implementation would be a call to their backing maps new -reduce method.

Comment by A. R [ 22/Jun/17 5:36 AM ]

I wonder what the performance would be if we just used IKVReduce, like

(-reduce [coll f]
    (reduce-kv
      (fn [xs k v]
        (f xs [k v]))
      coll))

Could be added to all those IKVReduce or to the reduce function more globally.

I think in CLJ the iter reduce was mostly done because it works with the Java collections? (Just a wild guess).

Comment by Thomas Mulvaney [ 22/Jun/17 5:49 AM ]

@aralo That works great for implementing the 3-arity version of reduce. The example you give is for the the 2-arity (no initial input) which reduce-kv does not support.

But yes, that would be an elegant implementation for the 3-arity reduce method.

If IKVReduce was to also support 2-arity, both forms of reduce on maps could be implemented this way.

Comment by A. R [ 22/Jun/17 6:40 AM ]

Ah, right, the 2-arity version. I never use that. Way too confusing. I think this implementation using kv-reduce would satisfy the behavior semantics of reduce:

(let [find-max-entry (fn
                       ([] :no-max)
                       ([[max-key max-val :as entry] [k v]]
                        (if (< max-val v)
                          [k v]
                          entry)))
      res (reduce-kv
            (fn [xs k v]
              (if (identical? lookup-sentinel xs)
                [k v]
                (find-max-entry xs [k v])))
            lookup-sentinel
            {:a 1, :b 2})]
  (if (identical? res lookup-sentinel)
    (find-max-entry)
    res))

(try with empty map, one map entry and as given with 2 map entries). How is the performance for this?

Comment by David Nolen [ 22/Jun/17 8:44 AM ]

Ok, I'm fine with the basic approach of the patch as is. Minor nit, `.hasNext` method calls need `^boolean` hint for `if` test to be optimized.

Comment by Thomas Mulvaney [ 23/Jun/17 10:38 AM ]

Updated patch attached with .hasNext calls annotated as boolean.

Comment by David Nolen [ 23/Jun/17 10:39 AM ]

fixed https://github.com/clojure/clojurescript/commit/4e48a3448efa24987cfef6fb676439a7b131f017

Added ^boolean hints in the following commit.

Comment by Thomas Mulvaney [ 23/Jun/17 10:40 AM ]

Great, thanks!





[CLJS-2108] Faster set equivalence Created: 20/Jun/17  Updated: 20/Jun/17  Resolved: 20/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2108.patch    
Patch: Code

 Description   

Equivalence calls on the core sets, PersistentHashSet and PersistentTreeSet, can be made faster by switching from every? to reduce-kv as they are both backed by maps which support IKVReduce.

A benchmark under V8:

=== PHS ===
Patch:
[a (into #{} (range 1)) b (into #{} (range 1))], (= a b), 1000000 runs, 345 msecs
[a (into #{} (range 2)) b (into #{} (range 2))], (= a b), 1000000 runs, 530 msecs
[a (into #{} (range 4)) b (into #{} (range 4))], (= a b), 1000000 runs, 801 msecs
[a (into #{} (range 8)) b (into #{} (range 8))], (= a b), 1000000 runs, 1497 msecs
[a (into #{} (range 32)) b (into #{} (range 32))], (= a b), 10000 runs, 140 msecs
[a (into #{} (range 100)) b (into #{} (range 100))], (= a b), 10000 runs, 318 msecs
[a (into #{} (range 1000)) b (into #{} (range 1000))], (= a b), 1000 runs, 404 msecs
[a (into #{} (range 100000)) b (into #{} (range 100000))], (= a b), 100 runs, 3411 msecs

Master:
[a (into #{} (range 1)) b (into #{} (range 1))], (= a b), 1000000 runs, 986 msecs
[a (into #{} (range 2)) b (into #{} (range 2))], (= a b), 1000000 runs, 1488 msecs
[a (into #{} (range 4)) b (into #{} (range 4))], (= a b), 1000000 runs, 2255 msecs
[a (into #{} (range 8)) b (into #{} (range 8))], (= a b), 1000000 runs, 3716 msecs
[a (into #{} (range 32)) b (into #{} (range 32))], (= a b), 10000 runs, 328 msecs
[a (into #{} (range 100)) b (into #{} (range 100))], (= a b), 10000 runs, 928 msecs
[a (into #{} (range 1000)) b (into #{} (range 1000))], (= a b), 1000 runs, 1284 msecs
[a (into #{} (range 100000)) b (into #{} (range 100000))], (= a b), 100 runs, 13921 msecs

=== PTS ===
Patch:
[a (into (sorted-set) (range 1)) b (into (sorted-set) (range 1))], (= a b), 1000000 runs, 504 msecs
[a (into (sorted-set) (range 2)) b (into (sorted-set) (range 2))], (= a b), 1000000 runs, 690 msecs
[a (into (sorted-set) (range 4)) b (into (sorted-set) (range 4))], (= a b), 1000000 runs, 1106 msecs
[a (into (sorted-set) (range 8)) b (into (sorted-set) (range 8))], (= a b), 1000000 runs, 2065 msecs
[a (into (sorted-set) (range 32)) b (into (sorted-set) (range 32))], (= a b), 10000 runs, 85 msecs
[a (into (sorted-set) (range 100)) b (into (sorted-set) (range 100))], (= a b), 10000 runs, 315 msecs
[a (into (sorted-set) (range 1000)) b (into (sorted-set) (range 1000))], (= a b), 1000 runs, 381 msecs
[a (into (sorted-set) (range 100000)) b (into (sorted-set) (range 100000))], (= a b), 100 runs, 4877 msecs

Master:
[a (into (sorted-set) (range 1)) b (into (sorted-set) (range 1))], (= a b), 1000000 runs, 1204 msecs
[a (into (sorted-set) (range 2)) b (into (sorted-set) (range 2))], (= a b), 1000000 runs, 2034 msecs
[a (into (sorted-set) (range 4)) b (into (sorted-set) (range 4))], (= a b), 1000000 runs, 3471 msecs
[a (into (sorted-set) (range 8)) b (into (sorted-set) (range 8))], (= a b), 1000000 runs, 7128 msecs
[a (into (sorted-set) (range 32)) b (into (sorted-set) (range 32))], (= a b), 10000 runs, 260 msecs
[a (into (sorted-set) (range 100)) b (into (sorted-set) (range 100))], (= a b), 10000 runs, 643 msecs
[a (into (sorted-set) (range 1000)) b (into (sorted-set) (range 1000))], (= a b), 1000 runs, 718 msecs
[a (into (sorted-set) (range 100000)) b (into (sorted-set) (range 100000))], (= a b), 100 runs, 8302 msecs


 Comments   
Comment by David Nolen [ 20/Jun/17 4:39 PM ]

fixed https://github.com/clojure/clojurescript/commit/fb14b183e6dc2485f67ba7f7f668981dd7cdfd3c





[CLJS-2080] Faster equiv-map Created: 12/Jun/17  Updated: 12/Jun/17  Resolved: 12/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: Thomas Mulvaney
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2080.1.patch     Text File CLJS-2080.patch    
Patch: Code

 Description   

equiv-map is a private function used by the core map types (PHM and PTM) to determine map equivalence. The current implementation uses every? to map over all the entries in the first map and compare them against the other map. This has extra overhead as seqs are created and an object per entry.

Using reduce-kv avoids the creation of seqs and entry objects making equivalence checks faster.



 Comments   
Comment by Thomas Mulvaney [ 12/Jun/17 7:54 AM ]

A quick benchmark under V8 gave the following output:

Master PHM equality:
[x (new-phm 9) y (new-phm 9)], (= x y), 1000000 runs, 13432 msecs
[x (new-phm 16) y (new-phm 16)], (= x y), 1000000 runs, 21361 msecs
[x (new-phm 32) y (new-phm 32)], (= x y), 1000000 runs, 51520 msecs
[x (new-phm 100) y (new-phm 100)], (= x y), 100000 runs, 15823 msecs
[x (new-phm 1000) y (new-phm 1000)], (= x y), 10000 runs, 27147 msecs

Patch PHM equality:
[x (new-phm 9) y (new-phm 9)], (= x y), 1000000 runs, 3470 msecs
[x (new-phm 16) y (new-phm 16)], (= x y), 1000000 runs, 5779 msecs
[x (new-phm 32) y (new-phm 32)], (= x y), 1000000 runs, 12951 msecs
[x (new-phm 100) y (new-phm 100)], (= x y), 100000 runs, 3433 msecs
[x (new-phm 1000) y (new-phm 1000)], (= x y), 10000 runs, 6124 msecs

Master PTM equality:
[x (new-ptm 9) y (new-ptm 9)], (= x y), 1000000 runs, 12377 msecs
[x (new-ptm 16) y (new-ptm 16)], (= x y), 1000000 runs, 22793 msecs
[x (new-ptm 32) y (new-ptm 32)], (= x y), 1000000 runs, 48445 msecs
[x (new-ptm 100) y (new-ptm 100)], (= x y), 100000 runs, 15767 msecs
[x (new-ptm 1000) y (new-ptm 1000)], (= x y), 10000 runs, 18207 msecs

Patch PTM equality:
[x (new-ptm 9) y (new-ptm 9)], (= x y), 1000000 runs, 3152 msecs
[x (new-ptm 16) y (new-ptm 16)], (= x y), 1000000 runs, 6092 msecs
[x (new-ptm 32) y (new-ptm 32)], (= x y), 1000000 runs, 14557 msecs
[x (new-ptm 100) y (new-ptm 100)], (= x y), 100000 runs, 5544 msecs
[x (new-ptm 1000) y (new-ptm 1000)], (= x y), 10000 runs, 7848 msecs
Comment by David Nolen [ 12/Jun/17 10:28 AM ]

Not all maps will implement reduce-kv. You need to check that the map implements it first and fall back to the old way if not.

Comment by Thomas Mulvaney [ 12/Jun/17 10:49 AM ]

I figured that as equiv-map was private, other map implementations outside of core would not be using it.

I'll update the patch with the fallback.

Comment by David Nolen [ 12/Jun/17 12:21 PM ]

fixed https://github.com/clojure/clojurescript/commit/76c1a63c6062556e29c87e2b7e2d33d539b47b7b





[CLJS-2073] Don't flush for every emitted line Created: 04/Jun/17  Updated: 16/Jun/17  Resolved: 16/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2073.patch    
Patch: Code

 Description   

When profiling compilation, java.io.BufferedWriter.flush() shows up near the top of the profile, with the root cause being the println here:

https://github.com/clojure/clojurescript/blob/d4db18970c8eec587b2c9e022034983e29eb8e81/src/main/clojure/cljs/compiler.cljc#L206

It is probably not important to flush every line of emitted JavaScript while compiling, and binding flush-on-newline to false here eliminates this from the top of the profile.



 Comments   
Comment by Mike Fikes [ 05/Jun/17 10:28 AM ]

I tested this and appear to get a 2.9% improvement.

The test involves the JVM ClojureScript portion of the Planck compile, on a non-SDD drive, measuring using :compiler-stats.

Raw measurements of 10 runs (first two discarded to ensure no transients):

Master: 33305.826946, 32951.074239, 33232.689479 ,33437.3924, 33084.874104, 33330.858312, 32704.202852, 32971.140835, 32784.698435, 33677.632887
Patched: 32162.290706, 32381.684736, 31956.178709, 31802.413141, 32607.124829, 32149.32672, 32281.441289, 32423.652908, 31935.573276, 32233.427539

Comment by David Nolen [ 16/Jun/17 9:26 AM ]

fixed https://github.com/clojure/clojurescript/commit/55f9bcc5445e8bb6a6534ee6c528c871d0e5e280





[CLJS-2072] Eliminate reflection in cljs.js-deps/build-index Created: 04/Jun/17  Updated: 16/Jun/17  Resolved: 16/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2072.patch    
Patch: Code

 Description   

There is some reflection in cljs.js-deps/build-index which shows up near the top when profiling incremental compile loops (as opposed to cold compilation of large codebases), owing to the interop calls introduced here:

https://github.com/clojure/clojurescript/commit/ce6c657a751cce5fb1b8e94eb97e74944c0d7fa6

It is certainly not critical to eliminate these.



 Comments   
Comment by David Nolen [ 16/Jun/17 9:33 AM ]

fixed https://github.com/clojure/clojurescript/commit/028fd479799d543f68ab4c02ae86ac32b2c45858





[CLJS-2066] Avoid analyzing named fn literal bodies twice Created: 01/Jun/17  Updated: 02/Jun/17  Resolved: 02/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2066.patch    
Patch: Code

 Description   

In named fn literal like (fn foo [] 17), the body will be analyzed twice in order to leverage the self-name in a second analysis pass for further optimization.

A consequence, though is exponential compilation complexity when you compile nested such forms. For example, in

(fn foo [] (fn bar [] 17))

The innermost body will be analyzed 4 times, with this doubling for each such nesting.

One case where this kind of nesting occurs is in the macroexpansion of for involving multiple bindings. For example, the analysis of:

(for [a nil b nil c nil d nil e nil f nil g nil h nil i nil j nil k nil l nil m nil] 1)

can take on the order of 10 seconds, and with a fix it can be made to be instantaneous (as it is in Clojure).



 Comments   
Comment by Mike Fikes [ 01/Jun/17 1:55 PM ]

With the attached CLJS-2066.patch some of tests fail under self-host:

Testing foo.ns-shadow-test

ERROR in (test-shadow) (TypeError:NaN:NaN)
expected: (= (quux 2) 3)
  actual: #object[TypeError TypeError: Cannot read property 'bar' of undefined]

ERROR in (test-shadow) (TypeError:NaN:NaN)
expected: (= (baz) 3)
  actual: #object[TypeError TypeError: Cannot read property 'bar' of undefined]
Comment by Mike Fikes [ 01/Jun/17 2:28 PM ]

The test failures in the previous comment were not caused by this patch, but instead by CLJS-2065 (and covered in CLJS-2067).

I've confirmed that the self-hosts pass if this patch is applied to 1.9.562 along with CLJS-2056.

Comment by Mike Fikes [ 02/Jun/17 12:10 PM ]

I'm seeing a 38% compiler perf improvement when compiling Planck with this patch. This is the results of :compiler-stats true output with the patch

Compile sources, elapsed time: 13877.434417 msecs

and without the patch:

Compile sources, elapsed time: 22415.116295 msecs

Note that this is an additional improvement on top of the one obtained via CLJS-2065. Without that patch, the baseline 1.9.562 produces:

Compile sources, elapsed time: 27757.805102 msecs

In other words, the two patches together produce a 50% improvement.

Andreas Steffan reports a 47% improvement when focusing on cljs.core$macros: https://twitter.com/deas/status/870667192353927168

Comment by David Nolen [ 02/Jun/17 12:31 PM ]

fixed https://github.com/clojure/clojurescript/commit/3cf1db168c716fb404a466e90cb2affe6fc6865b





[CLJS-2065] Improve analyzer munge performance Created: 31/May/17  Updated: 01/Jun/17  Resolved: 31/May/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.293
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: A. R Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-2065.patch     PNG File cljs-profile.png    

 Description   

Profiling shows that munge in cljs.analyzer is a bottlenck.

Approach:

  1. Get rid of string/split and replace with indexOf
  2. Memoize ns-first-segments


 Comments   
Comment by A. R [ 31/May/17 2:17 PM ]

In case somebody is wondering & just for the records:

Memoizing munge didn't work. The compiler never finished. Not sure if memory pressure or some other issue. Could be investigated at some point. But a simple memoize definitely didn't work.

Comment by A. R [ 31/May/17 2:22 PM ]

Results of the patch:

  • 17k LOC project went from 21s to 11.9s for a full recompile (optimizations none)

Via Mike Fikes on slack:

Self hosted Plank: "(require 'cljs.core.async): 48 seconds without the change, 28 seconds with the change."

Self hosted Lumo: "(require 'cljs.core.async) from 52 seconds to 37 seconds. (29% faster)"

Comment by David Nolen [ 31/May/17 3:58 PM ]

patch looks great! fixed in master https://github.com/clojure/clojurescript/commit/dd7403f8f513c1e774b9a65cade2037451fe7565

Comment by Mike Fikes [ 01/Jun/17 2:17 PM ]

See related regression CLJS-2067





[CLJS-1975] Perf: Compare f with nil in Delay impl Created: 11/Mar/17  Updated: 11/Mar/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-1975.patch    
Patch: Code

 Description   

When a Delay has been realized, f is set to nil. We can avoid truth_ and not calls and directly compare with nil for a minor perf boost.

In script/noderepljs, this leads to these

(simple-benchmark [x (delay 1)] @x 1e9)
(simple-benchmark [x (delay 1)] (realized? x) 1e9)
(simple-benchmark [x (doto (delay 1) deref)] (realized? x) 1e9)

speeding up by 6%, 11% and 9%, respectively.






[CLJS-1876] Faster PersistentVector, Subvec and ChunkedSeq reduce. Created: 19/Dec/16  Updated: 16/Jun/17  Resolved: 16/Jun/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: David Nolen
Resolution: Completed Votes: 1
Labels: performance

Attachments: File benchmarks.cljs     Text File CLJS-1876.patch    
Patch: Code

 Description   

This patch improved the performance of the 2-arity `-reduce` of PersistentVectors as well as the 2 and 3 arity versions of `-reduce` for `Subvecs` and `ChunkedSeqs`.

For large (> 32) `Subvecs` and `ChunkedSeqs` saw a 7x speed up in V8 and up to 11x in JavascriptCore. Smaller versions saw no change.

The 2-arity version of PersistentVector `-reduce` was also improved ~4x and ~7x in V8 and JavascriptCore respectively.

In the benchmarks below all runs where (N <= 32) were run 1,000,000 times. For the larger collections 100,000 iterations were made.

PersistentVector 3-arity reduce (no code was changed)

N master (v8) patched (v8) master (jsc) patched (jsc)
0 44 44 17 19
1 121 102 29 32
2 151 133 35 37
4 219 199 50 50
8 360 336 79 79
16 606 588 130 131
32 1124 1109 243 250
100 329 338 75 76
1000 3235 3317 704 725
10000 32960 33575 7365 7316

Persistent Vector 2-arity reduce

N master (v8) patched (v8) master (jsc) patched (jsc)
0 65 41 29 12
1 96 58 49 42
2 147 66 87 45
4 246 85 113 44
8 446 142 202 69
16 829 276 362 98
32 1627 506 693 132
100 534 154 236 41
1000 5442 1568 2321 329
10000 58396 15386 26162 3406

ChunkedSeq 3-arity reduce

N master (v8) patched (v8) master (jsc) patched (jsc)
0 57 69 93 88
1 54 60 31 26
2 68 63 27 28
4 94 91 37 39
8 146 152 59 58
16 272 266 121 107
32 479 526 170 174
100 1186 165 459 39
1000 11760 1539 4547 334
10000 121986 16080 48639 3384

ChunkedSeq 2-arity reduce

N master (v8) patched (v8) master (jsc) patched (jsc)
0 62 63 96 97
1 23 31 16 19
2 35 40 20 17
4 68 70 26 29
8 116 120 49 47
16 232 223 83 89
32 467 444 156 158
100 1123 164 417 39
1000 12328 1659 4516 327
10000 126570 15940 47435 3330

Subvec 3-arity reduce

N master (v8) patched (v8) master (jsc) patched (jsc)
0 67 67 51 34
1 185 71 100 35
2 296 84 160 44
4 514 100 259 52
8 958 156 405 77
16 1878 295 733 138
32 3668 565 1433 139
100 1164 165 462 36
1000 12596 1600 4798 337
10000 122919 16108 48093 3511

Subvec 2-arity reduce

N master (v8) patched (v8) master (jsc) patched (jsc)
0 73 49 35 22
1 169 58 75 48
2 289 70 117 54
4 544 103 197 56
8 961 159 378 74
16 1852 288 697 103
32 3644 514 1425 145
100 1245 147 441 39
1000 12065 1537 4556 333
10000 122519 15600 46324 3370

The source code for the benchmarks is attached.



 Comments   
Comment by David Nolen [ 16/Jun/17 10:11 AM ]

fixed https://github.com/clojure/clojurescript/commit/64f126f136270492555b5afe5a6947646a36bdc4





[CLJS-1871] A declare with :arglists should generate static function calls Created: 14/Dec/16  Updated: 15/May/17

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.293
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: A. R Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: performance


 Description   

This is performance enhancement.

  1. Problem
    After a declare the compiler doesn't know which arities the function will be defined with and hence generates code that checks if that arity is defined and then either calls it or uses a slower `xy-fn.call(null, ...)` construct. This is not optimal since it can be slower and also generates slightly more code.

Especially functions which only have one arity are problematic since they will end up being called with `xy-fn.call`.

  1. Affects
    Code that uses a function that was only declared and not def'ed yet. Such as `cons` in `IndexedSeq` or `conj!` in `TransientHashMap`.
    1. Performance
      A preliminary benchmark showed neglible improvements in Chrome v54 but a significant (factor of 2.2) performance benefit in Firefox.
  1. Solution
    Most of the declares should use `(def ^{:declare true, :arglists '([x xs])} cons)` and the compiler should take the `:arglists` into consideration and emit direct function calls instead.


 Comments   
Comment by A. R [ 12/May/17 8:26 AM ]

Similarly, functions that call themselves recursively don't get invoked optimally. Such as:

  • push-tail
  • do-assoc
  • pop-tail
  • tv-push-tail
  • tv-pop-tail

Matters quite a bit for TreeMap kv-reduce + dissoc.

EDIT: Separately addressed: https://dev.clojure.org/jira/browse/CLJS-2038





[CLJS-1866] RangedIterator performance tweaks Created: 08/Dec/16  Updated: 19/Dec/16

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: Thomas Mulvaney
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS-1866.patch     Text File CLJS-1866-updated.patch     Text File CLJS-1866-updated.patch    
Patch: Code

 Description   

The attached patch simplifies and speeds up the RangedIterator.

The benchmarks were run using the following function to test vector iteration:

(defn consume-iterator
  [v]
  (let [iter (-iterator v)]
    (loop []
      (when (.hasNext iter)
        (.next iter)
        (recur)))))

A series of "simple-benchmarks" were setup as follows:

(simple-benchmark [v (into [] (range N))] (consume-iterator v) I)

Where 'N' and 'I' were values from the 'Vector Size' and 'Iterations' columns of the table below .

Vector Size Iterations V8 Speed [msec] (master) V8 Speed [msec] (patch) JSC Speed [msec] (master) JSC Speed [msec] (patch)
1 100,000 15 11 13 7
2 100,000 14 10 7 4
4 100,000 18 10 9 5
8 100,000 27 12 14 6
16 100,000 43 17 19 9
32 100,000 74 24 37 15
100 100,000 217 59 105 45
1000 100,000 2008 524 1032 392
10,000 100,000 20390 5856 10249 4178
100,000 10,000 20334 5324 10324 4387

Javascript engine versions used:

  • V8 version 5.1.281.47
  • JSC version Unknown

The RangedIterator constructor function `ranged-iterator` was also made private.



 Comments   
Comment by David Nolen [ 16/Dec/16 2:04 PM ]

Let's get a patch with the performance change without altering the constructor first. Thanks.

Comment by Thomas Mulvaney [ 17/Dec/16 7:15 AM ]

Rebased and constructor no longer private.

Comment by David Nolen [ 17/Dec/16 9:59 AM ]

Sorry for not being clear. Leave the fields of the deftype alone even if we aren't using them for now. We want the performance enhancements without changing the API at all.

Comment by Thomas Mulvaney [ 19/Dec/16 5:58 AM ]

Thanks that makes sense. Fixed in this patch.





[CLJS-1827] Improve reader performance on Firefox/Windows Created: 20/Oct/16  Updated: 21/Oct/16

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.76
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Michael Sperber Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, reader
Environment:

Firefox on Windows


Attachments: Text File cljs-reader.patch    
Patch: Code

 Description   

cljs.reader/read-string speeds up by a factor of 2 on Firefox/Windows through this change without complicating the code.

(Other JS engines, including Firefox on Linux/Mac do not seem to be affected as significantly.)



 Comments   
Comment by David Nolen [ 20/Oct/16 10:33 AM ]

It would be nice to have a bit more information on this ticket as to what Google Closure does that's unnecessary or whether this path is actually a faithful port of Clojure behavior (copies the implementation of the EDN reader in these hot spots??). Finally the patch names David Frese, have they submitted a CA?

Thanks!

Comment by Michael Sperber [ 21/Oct/16 5:49 AM ]

I believe the Google functions are too general, work on strings in addition to characters etc.

It's not clear to us though why only Firefox on Windows benefits.

(David Frese is a co-worker - yes, has submitted a CA.)





[CLJS-1776] Add fixed arities for mapcat Created: 13/Sep/16  Updated: 13/Sep/16

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: 1.9.76
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Robert C Faber Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJS_1776__Add_fixed_arities_for_mapcat.patch     Text File CLJS-1776.patch    
Patch: Code

 Description   

Following the pattern of map, this patch adds three fixed arities for mapcat.



 Comments   
Comment by Alex Miller [ 13/Sep/16 10:25 AM ]

Presumably this is to improve performance. Please include a benchmark showing the difference.





[CLJS-1712] Make PersistentHashSet implement IReduce Created: 21/Jul/16  Updated: 21/Jul/16

Status: Open
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Thomas Mulvaney Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File difference-benchmark.txt     Text File into-benchmark.txt     Text File phs-reduce.patch     Text File union-benchmark.txt    
Patch: Code

 Description   

This improves speed of many reduce based operations on set which were falling back to seq-reduce, including code in `clojure.set` namespace such as `clojure.set/union` and `(into [] some-set)`.

I've included a few benchmarks I performed using `simple-benchmark` in a JavascriptCore environment (Planck REPL)



 Comments   
Comment by Rohit Aggarwal [ 21/Jul/16 3:35 PM ]

I think the code currently is faithful to Clojure's implementation of PersistentHashSet. So any change from that would probably require more thought and/or history.

Also someone else also raised a similar issue on ClojureScript mailing list.





[CLJS-1665] Use Javascript array to create a collection in cljs.reader Created: 03/Jun/16  Updated: 10/Aug/16  Resolved: 10/Aug/16

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Rohit Aggarwal Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-1665.patch    
Patch: Code

 Description   

For performance, we should be using a Javascript array to create an ArrayMap/HashMap/Set/List/Vector instead of creating them out of a Vector. Most of the underlying data types do have methods to convert from an array to that type.

The big change is that cljs.reader/read-delimited-list now returns an array instead of a vector.

Benchmarking code:

(def nums (range 20))
(def nums-map (into {} (map (fn [i] [i i]) nums)))

(simple-benchmark [s "{:foo 1 :bar 2}"] (reader/read-string s) 1000000)
(simple-benchmark [s (pr-str nums-map)] (reader/read-string s) 100000)
(simple-benchmark [s (pr-str nums)] (reader/read-string s) 100000)
(simple-benchmark [s (pr-str (vec nums))] (reader/read-string s) 100000)
(simple-benchmark [s (pr-str (set nums))] (reader/read-string s) 100000)

Results (All times in msecs):

Engine Benchmark Old New Improvement
v8 Small Map 6620 5516 20.01%
SpiderMonkey Small Map 11929 10606 12.47%
JSC Small Map 5101 4158 22.68%
v8 Large Map 6075 5548 9.50%
SpiderMonkey Large Map 13070 11933 9.53%
JSC Large Map 4777 4273 11.79%
v8 List 2308 2280 1.23%
SpiderMonkey List 6167 4777 29.10%
JSC List 1891 1737 8.87%
v8 Vector 2276 2242 1.52%
SpiderMonkey Vector 5239 4700 11.47%
JSC Vector 1832 1684 8.79%
v8 Set 3362 3271 2.78%
SpiderMonkey Set 7283 6880 5.86%
JSC Set 2771 2619 5.80%


 Comments   
Comment by David Nolen [ 10/Aug/16 2:24 PM ]

fixed https://github.com/clojure/clojurescript/commit/c98e9a768f003f178b36d001a26cbfcd95fae3c6

Comment by Rohit Aggarwal [ 10/Aug/16 3:00 PM ]

Thank you David!





[CLJS-1662] Optimize seq (&) destructuring (as per latest 0aa3467 in Clojure) Created: 02/Jun/16  Updated: 03/Jun/16  Resolved: 02/Jun/16

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Rohit Aggarwal Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-1662-Optimize+seq+destructuring-2.patch     Text File CLJS-1662-Optimize seq destructuring.patch    
Patch: Code

 Description   

I've applied the changes which have been committed by Rich in this commit. All the tests are passing on v8, Spidermonkey and JSC engines on OS X. I've also tested using ./script/test-self-host and ./script/test-self-parity.

Benchmarking code:

(simple-benchmark [v (into [] (range 1000000))]
                  (loop [[x & xs] v]
                    (if xs
                      (recur xs)
                      x))
                  100)

Results (all times are in msecs):

- v8 Spidermoney JSC
Old 19070 18053 9116
New 10323 15238 6353


 Comments   
Comment by Rohit Aggarwal [ 02/Jun/16 7:34 AM ]

For reference, the performance improvements for Clojure are mentioned in this post.

Comment by Rohit Aggarwal [ 02/Jun/16 3:31 PM ]

I've updated the benchmarking code after conversation with David Nolen on Slack.

Comment by Rohit Aggarwal [ 02/Jun/16 3:34 PM ]

New benchmarks are below.

Code:

(println "\n")
(println ";; Destructuring a sequence")
(simple-benchmark [v (into [] (range 1000000))]
                  (loop [[x & xs] v]
                    (if-not (nil? xs)
                      (recur xs)
                      x))
                  1000)

Timing: (in msecs)

- v8 SpiderMonkey JSC
Old 160503 131297 123059
New 85183 118355 55245
Comment by David Nolen [ 02/Jun/16 6:18 PM ]

fixed https://github.com/clojure/clojurescript/commit/2f012cec88d05f42dd145338d9c942498d3ceb13

Comment by Rohit Aggarwal [ 03/Jun/16 3:41 AM ]

Thank you!





[CLJS-1574] CLJS string equivalence is very slow in Chrome Created: 16/Feb/16  Updated: 08/Jul/17  Resolved: 08/Jul/17

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.7.228
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Stephen Nelson Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: performance
Environment:

Google Chrome 48.0.2564.109 on Mac OS X 10.11.3
Safari 9.0.3 (11601.4.4) on Mac OS X 10.11.3



 Description   

Clojurescript's equivalence for strings in Google Chrome is ~1000 times slower than equivalent javascript functionality, and ~1000 times slower than the same function in Safari.

Google Chrome
js equiv:  0.005 seconds
cljs equiv:  1.898 seconds
Safari
js equiv:  0.005 seconds
cljs equiv:  0.006 seconds
(def size (* 128 1024))

(defn now []
  (.getTime (js/Date.)))

(defn delta [b a]
  (str (/ (- b a) 1000) " seconds"))

(deftest test-js-eq-perf
  (let [str1  (apply str (repeat size "a"))
        str2  (apply str (repeat size "a"))
        start (now)
        _     (is (js* "~{} == ~{}" str1 str2)
                  "js equivalence")
        end   (now)
        ]
    (println "js equiv: " (delta end start))))

(deftest test-cljs-eq-perf
  (let [str1  (apply str (repeat size "a"))
        str2  (apply str (repeat size "a"))
        start (now)
        _     (is (= str1 str2)
                  "cljs equivalence")
        end   (now)
        ]
    (println "cljs equiv: " (delta end start))))


 Comments   
Comment by Stephen Nelson [ 16/Feb/16 6:06 PM ]

This bug only occurs when cljs.pprint has been required.

Comment by Stephen Nelson [ 16/Feb/16 6:38 PM ]

After a whole lot of binary search, here's a minimal reproduction. When cljs.pprint is loaded it constructs write-option-table. It seems that constructing a hash map with the keys :added and :ns causes a call to (= :added :ns), which is sufficient to cause string equality to become extremely slow.

(ns hello-world.core)

(enable-console-print!)

(def size (* 128 1024))

(defn now []
  (.getTime (js/Date.)))

(defn delta [b a]
  (str (/ (- b a) 1000) " seconds"))

(defn test [] 
  (let [str1  (apply str (repeat size "a"))
        str2  (apply str (repeat size "a"))
        start (now)
        _     (= str1 str2)
        end   (now)
        ]
      (println "cljs equiv: " (delta end start))))

(test)

(= :added :ns)

(test)
Comment by Peter Schuck [ 17/Feb/16 4:50 PM ]

Is the ClojureScript compiled with options :optimizations :advanced or :static-fns true? Compiling ClojureScript without those options results in call indirection for all function calls which might explain the slowdown. See http://swannodette.github.io/2015/03/16/optimizing-clojurescript-function-invocation/ for more information.

Comment by Stephen Nelson [ 17/Feb/16 9:06 PM ]

This happens with :advanced, :simple, and without optimisations. Stepping through the generated javascript seems to indicated that the slow down comes from the VM's internal function dispatch. Regardless, I don't think that the extra function calls related to dynamic dispatch in clojurescript could add minutes of overhead per call. Note that the test case above only uses 128k of string data, the case where I encountered this issue first used ~512K and took about 5 minutes to complete a single function call.

Comment by Francis Avila [ 17/Feb/16 9:14 PM ]

I have reproduced this in Chrome for Linux, on :none and :advanced optimization levels using different test code. I verified the result of the compare so the JIT won't optimize it away and I used performance.mark() and performance.measure() for timing, although none of this should have mattered.

Every subsequent string compare after the first -equiv-invoking use of equal is significantly slower for no reason I can see. There are no intermediate GCs or anything to suggest that it should be slower--it just takes longer! The only thing I can think of is maybe the keyword-equals triggers a deopt because it makes the equal-function megamorphic, but the code is run so few times that there should not be jit optimizations kicking it at all. Also, the keyword-compare itself remains fast.

I suspect a Chrome/v8 bug. Possibly a different internal string representation kicks in for some reason which has a slower compare? This is only an issue for largish, non-constant strings, and the slowdown is proportional to string size. I'm going to try and reproduce this with pure JS.

Comment by Francis Avila [ 18/Feb/16 12:33 AM ]

All you need to reproduce this is to use the strict equality operator in a function body non-monomorphically. Subsequent executions of the function with strings (at least) which have not been compared before the polymorphic call will be very slow.

If you replace strict equality (triple-equal) with normal equality (double-equal), this issue goes away.

This is clearly a Chrome/v8 bug, but I'm not sure where to report it.

Minimal pure-javascript reproduction:

function str(size) {
  var s = "";
  for (var i = 0; i < size; i++) s += "a";
  return s;
}

function eq(x, y) {
  performance.mark("start");
  x === y; // No slowdown if use == instead
  performance.mark("end");
}

function print_measures() {
  performance.getEntriesByType("measure")
  .forEach(entry => console.log(entry.name, entry.duration));
}

var s1 = str(64 * 1024);
var s2 = str(64 * 1024);
var s3 = str(64 * 1024);

eq(s1, s2);
performance.measure("eq(s1, s2)", "start", "end");

eq(0, 0);
performance.measure("eq(0, 0)", "start", "end");

eq(s1, s3);
performance.measure("eq(s1, s3)", "start", "end");

eq(s1, s2);
performance.measure("eq(s1, s2)", "start", "end");

eq(s1, s3);
performance.measure("eq(s1, s3)", "start", "end");

print_measures();

Results with Chrome 48.0.2564.109 (64-bit) on a slow iMac with OS X 10.11.3

eq(s1, s2)   4.465000000000003     // fast string compare
eq(0, 0)     0.009999999999990905  // break monomorphism of eq()
eq(s1, s3) 259.665                 // Now string compare is slow
eq(s1, s2)   0.019999999999924967  // Repeated call still fast
eq(s1, s3) 232.52499999999998      // ... but not from after the polymorphic invoke
Comment by Francis Avila [ 22/Feb/16 3:14 AM ]

Issue added to v8: https://bugs.chromium.org/p/v8/issues/detail?id=4773

Comment by Francis Avila [ 03/Jan/17 9:23 AM ]

Update: This bug was fixed upstream today, so it should start showing up in releases eventually.

Comment by David Nolen [ 08/Jul/17 10:39 AM ]

We're not going to work around VM issues like this.





[CLJS-1538] Type hint some cljs.core predicates Created: 06/Jan/16  Updated: 06/Jan/16  Resolved: 06/Jan/16

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.7.145
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Fikes Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File CLJS-1538-2.patch     Text File CLJS-1538.patch    
Patch: Code

 Description   

There are a handful of predicates in cljs.core that can be type hinted. They all delegate directly to either instance? or satisfies?.

The set comprises: var?, iterable?, cloneable?, volatile?, tagged-literal?



 Comments   
Comment by Mike Fikes [ 06/Jan/16 4:30 PM ]

2nd patch employs ^boolean not ^:boolean

Comment by David Nolen [ 06/Jan/16 5:27 PM ]

fixed https://github.com/clojure/clojurescript/commit/bc6f93e937f3f80cd8c47097863399752189d561





[CLJS-1459] Use inline sourcemaps Created: 25/Sep/15  Updated: 10/Nov/15  Resolved: 10/Nov/15

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 1.7.48
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Robin Heggelund Hansen Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: enhancement, performance
Environment:

All



 Description   

Just like the summary says. Doing this will reduce the amount of files that requires to be written by the compiler, and the amount of files needed to be loaded into the browser.



 Comments   
Comment by David Nolen [ 10/Nov/15 10:00 AM ]

Dupe CLJS-1075





[CLJS-1224] cljs.repl: Memoize stack frame mapping Created: 26/Apr/15  Updated: 29/Apr/15  Resolved: 29/Apr/15

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 0.0-3126
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Mike Fikes Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance
Environment:

OS X QuickStart browser REPL against Safari


Attachments: Text File CLJS-1224.patch    

 Description   

When evaluating an expression that results in a stack overflow, and the resulting trace is source mapped by cljs.repl, this can result in thousands of frames being mapped which can take a few minutes. But, oftentimes stack overflow is the result of recursion, so many of the mapped frames are identical. This is trivial to optimize using memoize, resulting in the mapping occurring in a couple of seconds.

A test case, when using the Quick Start browser REPL against Safari, involves evaluating these 3 forms,

(defn next-results
  "Placeholder for function which computes some intermediate
  collection of results."
  [n]
  (range 1 n))

(defn build-result [n]
  (loop [counter 1
         results []]
    (if (< counter n)
      (recur (inc counter)
             (concat results (next-results counter)))
      results)))

(first (build-result 7000))

Note: The browser REPL sometimes fails to display a stack trace at times, showing an EOF error that is probably unrelated. When it does display the trace, though, you can see the performance effects. I found that you can get it to do this by starting with maybe 4000 in the call above, and increasing by 1000 until it overflows the stack.

Will attach a patch for consideration.



 Comments   
Comment by David Nolen [ 29/Apr/15 6:52 AM ]

fixed https://github.com/clojure/clojurescript/commit/4773730eda0a5c5d55cbb90adef54bc4bbe6b35e





[CLJS-1191] Update clojure.walk to the current version on clojure Created: 06/Apr/15  Updated: 03/Jul/15  Resolved: 03/Jul/15

Status: Closed
Project: ClojureScript
Component/s: None
Affects Version/s: 0.0-3308
Fix Version/s: 1.7.28

Type: Defect Priority: Major
Reporter: Stuart Mitchell Assignee: Unassigned
Resolution: Completed Votes: 4
Labels: enhancement, performance

Attachments: Text File 50.patch     Text File CLJS-1191.patch     Text File CLJS-1191-rebase.patch     Text File CLJS-1191-tests.patch    
Patch: Code

 Description   

Currently clojure.walk can not handle records

It is using an old version of clojure.walk and clojure has improved the implementation because of

http://dev.clojure.org/jira/browse/CLJ-1105


src/cljs/clojure/walk.cljs | 4 ++++
1 file changed, 4 insertions

diff --git a/src/cljs/clojure/walk.cljs b/src/cljs/clojure/walk.cljs
index f2ebd8d..541ecea 100644
— a/src/cljs/clojure/walk.cljs
+++ b/src/cljs/clojure/walk.cljs
@@ -43,7 +43,11 @@ the sorting function."}
{:added "1.1"}
[inner outer form]
(cond
+ (list? form) (outer (apply list (map inner form)))
+ (satisfies? IMapEntry form) (outer (vec (map inner form)))
(seq? form) (outer (doall (map inner form)))
+ (satisfies? IRecord form)
+ (outer (reduce (fn [r x] (conj r (inner x))) form form))
(coll? form) (outer (into (empty form) (map inner form)))
:else (outer form)))



 Comments   
Comment by David Nolen [ 07/Apr/15 5:56 AM ]

Please attach the patch as a file. Thanks!

Comment by Stuart Mitchell [ 07/Apr/15 7:18 PM ]

I think this one works

it is a mail formatted patch

Comment by David Nolen [ 08/Apr/15 6:07 AM ]

Please follow the patch conventions described here https://github.com/clojure/clojurescript/wiki/Patches. Thank you.

Comment by David Nolen [ 12/Apr/15 4:53 PM ]

Stuart, I don't see you on the list of contributors. Please submit a CA so I can apply the patch. Thanks!

Comment by Stuart Mitchell [ 13/Apr/15 7:49 PM ]

Hopefully this is the right format

Comment by Stuart Mitchell [ 13/Apr/15 7:50 PM ]

Contributor agreement signed as well

Comment by Sebastian Bensusan [ 01/May/15 1:24 PM ]

Tested the patch on the REPL, works as expected except for walking fns that modify the keys:

> (defrecord Foo [name])
cljs.user/Foo
> (w/prewalk #(if (keyword? %) (str %) %) (Foo. "foo"))
#cljs.user.Foo{:name "foo", ":name" "foo"}

It is not consistent with walking the same fn over a map:

> (w/prewalk #(if (keyword? %) (str %) %) {:name "foo"})

{":name" "foo"}

This behavior was noted in the original Clojure patch as well: http://dev.clojure.org/jira/browse/CLJ-1239 and might be undesirable since it surprises the user.

Comment by Daniel Skarda [ 10/Jun/15 9:59 AM ]

Our project shares many files between server and client (Clojure and ClojureScript). I noticed the same bug today, because tests in CLJ went fine, while same tests in CLJS failed (then I prepared a patch just to find this ticket...)

From my perspective, I would prefer that CLJ and CLJS have the same implementation of clojure.walk.

For example - you implement some clever scheme to avoid kind of surprise you described. If it is not implemented in Clojure in the same way, it will be surprise for people writing code portable between CLJ and CLJS.

So my proposal is to take current Clojure implementation as a master and wait, how clojure.walk implementation will be improved in future releases of Clojure. Then we can port it to ClojureScript.

Comment by Daniel Skarda [ 10/Jun/15 10:10 AM ]

Tests for patch by Stuart

Comment by David Nolen [ 01/Jul/15 1:38 PM ]

Happy to apply these but they need to be rebased to master. Thanks.

Comment by Stuart Mitchell [ 02/Jul/15 9:21 PM ]

Hi this patch does not include the tests as that patch still applies cleanly.

Comment by David Nolen [ 03/Jul/15 2:58 PM ]

fixed
https://github.com/clojure/clojurescript/commit/f706fabfd5f952c4dfb4dc2caeea92f9e00d8287
https://github.com/clojure/clojurescript/commit/f1ff78d882bbaaa5135f205dfa19b01895462a3b





[CLJ-2228] Unroll constantly to improve performance of multi-arity calls Created: 28/Aug/17  Updated: 27/Sep/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Nathan Marz Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File 0001-CLJ-2228-Improve-performance-of-constantly.patch    
Patch: Code and Test
Approval: Prescreened

 Description   

This was found to be a hot spot when testing some Specter use cases.

Review of common uses of constantly shows that the arity 1 case is the most common use of `constantly` in the wild, so only unrolled to two arguments.

Perf test:

(use 'criterium.core)
(def f (constantly 1))
(bench (dotimes [_ 1000] (f)))
(bench (dotimes [_ 1000] (f 1)))
(bench (dotimes [_ 1000] (f 1 2)))

;; Results:
;; Arity Before         After
;; 0     611.455589 ns  607.800747 ns    (no change was expected)
;; 1     3.098828 µs    611.116510 ns    (~5x improvement)
;; 2     3.508726 µs    620.415032 ns    (~5x improvement)

Patch: 0001-CLJ-2228-Improve-performance-of-constantly.patch

Prescreened by: Alex Miller



 Comments   
Comment by Alex Miller [ 29/Aug/17 7:30 AM ]

I want to call out to our workflow diagram at https://dev.clojure.org/display/community/JIRA+workflow for process here. I've triaged this, but it's unlikely to be reviewed for the next release at this point unless it's been prescreened, and that requires a patch. So it would be helpful if you could:

  • mention your use case where this made a difference (Specter)
  • attach a patch
  • include the benchmarks in the ticket description

One question I have is how many arities to unroll. 10 seems like a lot? My guess is that up to 2 or 3 args would cover 90+% of cases but could poke around in https://crossclj.info/fun/clojure.core/constantly.html or do some other research to get an idea.

Comment by Alex Miller [ 29/Aug/17 7:38 AM ]

What is the max arity for constantly you encounter in Specter?

Comment by Alex Miller [ 29/Aug/17 8:04 AM ]

I did some poking around and I suspect arity 1 is the 98% use case and arity 0 making up most of the rest. I failed to actually find an example of any arity >1 although I'm sure there are a few of them out there. I think unrolling to arity 2 would be more than sufficient.

Comment by Nathan Marz [ 27/Sep/17 3:37 PM ]

Specter uses constantly for arity 1, so this patch is sufficient for Specter.





[CLJ-2210] back referential expressions can cause exponential compilation times Created: 21/Jul/17  Updated: 06/Sep/17  Resolved: 06/Sep/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: Release 1.9

Type: Defect Priority: Critical
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Completed Votes: 4
Labels: compiler, performance

Attachments: Text File 0001-CLJ-2210-cache-non-trivial-getJavaClass-hasJavaClass.patch    
Patch: Code
Approval: Ok

 Description   

Reported as causing problems in real world code: https://groups.google.com/forum/#!topic/clojure/Z91bhUvSB1g

With init.clj as :

(def expr '(fn [& {:keys [a b c d e f
                          map-0 map-1 map-2]}]
             (cond-> "foo"
               a (str a)
               b (str b)
               c (str c)
               d (str d)
               e (str e)
               f (str f)
               map-0 (cond->
                         (:a map-0) (str (:a map-0))
                         (:b map-0) (str (:b map-0))
                         (:c map-0) (str (:c map-0))
                         (:d map-0) (str (:d map-0)))
               map-1 (cond->
                         (:a map-1) (str (:a map-1))
                         (:b map-1) (str (:b map-1))
                         (:c map-1) (str (:c map-1))
                         (:d map-1) (str (:d map-1)))
               map-2 (cond->
                         (:a map-2) (str (:a map-2))
                         (:b map-2) (str (:b map-2))
                         (:c map-2) (str (:c map-2))
                         (:d map-2) (str (:d map-2))))))

(time (eval expr))

Before patch:

[~]> clj -i init.clj
"Elapsed time: 24233.193238 msecs"

After patch:

[~]> clj -i init.clj
"Elapsed time: 24.719472 msecs"

This is caused by every local bindings' type depending on the type of the previous binding, triggering an exponential number of calls to hasJavaClass and getJavaClass. By caching on first occurrence, the complexity becomes linear.

Patch: 0001-CLJ-2210-cache-non-trivial-getJavaClass-hasJavaClass.patch

Prescreened by: Alex Miller






[CLJ-2106] satisfies? is quite slow when returning false Created: 06/Feb/17  Updated: 09/Feb/17  Resolved: 09/Feb/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Kaplinskiy Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: performance


 Description   
(defprotocol SatisfiesTest (testThing [this]))
(defrecord MyRecord [] Object SatisfiesTest (testThing [this]))
(def r (MyRecord.))

(time (dotimes [_ 30000] (instance? user.SatisfiesTest r)))
"Elapsed time: 1.715 msecs"

(time (dotimes [_ 30000] (satisfies? SatisfiesTest r)))
"Elapsed time: 3.944 msecs"

(time (dotimes [_ 30000] (instance? user.SatisfiesTest {})))
"Elapsed time: 2.304 msecs"

(time (dotimes [_ 30000] (satisfies? SatisfiesTest {})))
"Elapsed time: 718.949 msecs"

It would be nice if satisfies? memoized negative return values by class (though that cache would need to be cleared by `extend-type` and friends).



 Comments   
Comment by Alex Miller [ 06/Feb/17 1:56 PM ]

This is covered in an existing ticket - http://dev.clojure.org/jira/browse/CLJ-1814

Comment by Alex Miller [ 06/Feb/17 2:02 PM ]

Actually, read too quickly - CLJ-1814 covers the positive case only. I don't think we're going to cache a negative return value though. Managing and cleaning that cache is likely to not be worth it. If you need this kind of thing, you should probably consider a different kind of conditional check before-hand.

Comment by Alex Miller [ 06/Feb/17 2:05 PM ]

For example, you can just memoize this call like this:

user=> (def check (memoize #(satisfies? SatisfiesTest %)))
#'user/check
user=> (time (dotimes [_ 30000] (check {})))
"Elapsed time: 3.512 msecs"
Comment by Alex Miller [ 06/Feb/17 2:13 PM ]

I tweaked the test to remove the use of range and the construction of the record so the numbers are more useful.

Comment by Nicola Mometto [ 08/Feb/17 6:52 PM ]

Alex Miller CLJ-1814 handles the negative cases too, and doesn't keep a cache of negative return values. It keeps a cache of all the protocols extended by a certain class and invalidates that cache on every call to extend, the benchmarks on the ticket description showcase both a positive and a negative test

Comment by Alex Miller [ 09/Feb/17 3:09 PM ]

Nicola - cool! I didn't realize that. Will mark this as a dupe then.





[CLJ-2090] Improve clojure.core/distinct perf by using transient set Created: 23/Dec/16  Updated: 04/Jan/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nikita Prokopov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, transducers, transient

Attachments: Text File clj-2090-use-transient-set-in-distinct-2.patch    
Patch: Code
Approval: Triaged

 Description   

Current implementation of clojure.core/distinct uses persistent set. This patch improves performance of lazy arity by ~25%-30% and transducer by ~40%-50% by using transient set instead.

10 elements
(doall (distinct coll)) 	 5.773439 µs => 4.179092 µs (-27%)
(into [] (distinct) coll) 	 3.238236 µs => 1.943254 µs (-39%)

100 elements
(doall (distinct coll)) 	 67.725764 µs => 42.129993 µs (-37%)
(into [] (distinct) coll) 	 35.702741 µs => 16.495947 µs (-53%)

1000 elements
(doall (distinct coll)) 	 540.652739 µs => 399.053873 µs (-26%)
(into [] (distinct) coll) 	 301.423077 µs => 164.025500 µs (-45%)

10000 elements
(doall (distinct coll)) 	 3.439137 ms => 3.058872 ms (-11%)
(into [] (distinct) coll) 	 1.437390 ms => 848.277178 µs (-40%)

Benchmarking code: https://gist.github.com/tonsky/97dfe1f9c48eccafc983a49c7042fb21



 Comments   
Comment by Alex Miller [ 23/Dec/16 8:52 AM ]

You can't remove the volatile - you still need that for safe publication in multi threaded transducing contexts.

Comment by Nikita Prokopov [ 23/Dec/16 11:50 AM ]

Alex Miller How do you mean?

  • I don’t update seen link because transient set can be mutated in-place
  • Are transducers meant to be used from multiple threads? Because even existing implementation clearly has race condition. I imagine fixing that would be costly (we’ll need a synchronized section), so maybe it should be a specialized transducer that you use only when needed?
Comment by Alex Miller [ 23/Dec/16 12:26 PM ]

Transient sets can NOT be mutated in place - you must use the return value.

Yes, transducers are used from multiple threads in (for example) transducer chans in core.async go blocks.

Comment by Alex Miller [ 23/Dec/16 12:28 PM ]

I should also say transducers are not expected to be used from more than one thread at a time, so there are no race problems. But being used from multiple threads over time requires proper safe publication.

Comment by Nikita Prokopov [ 24/Dec/16 3:07 AM ]

But being used from multiple threads over time requires proper safe publication.

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?

Transient sets can NOT be mutated in place - you must use the return value.

I was thinking that clojure/core.clj and clojure.lang.ATransientSet.java are both part of Clojure internals, colocated, so can share a little bit of internal knowledge about each other. It seems safe to do that, because that knowledge does not leak outside, and, if at any point impl of ATransientSet would change, core.clj could be updated accordingly in the same release. I wouldn’t do that in any third-party library, of course.

Comment by Alex Miller [ 24/Dec/16 9:13 AM ]

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Transients require only that they are asked by no more than a single thread at a time and so are safe to use in a transducer. However, they should guarantee safe publication. core.async channels already do this as an artifact of their implementation, but other transducing contexts may not.

Transients should NEVER be used as "mutate in place", regardless of concurrency. While they will appear to "work" in some circumstances, this is never correct (eventually an update operation will return a new instance and if you are mutating in place, your data will then be missing). This is discussed and correct examples are shown at http://clojure.org/reference/transients.

Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?

That's something Rich and I are discussing but, probably.

Comment by Nikita Prokopov [ 24/Dec/16 12:56 PM ]

Alex Miller Here’s quick test that shows that changes to transient set (which is nothing more but a wrapper around transient map) made in one thread are not always visible from another thread.

https://gist.github.com/tonsky/62a7ec6d539fc013186bee2df0812cf6

That means that if we try to use transients for e.g. distinct it will miss duplicate items

Comment by Nikita Prokopov [ 24/Dec/16 1:02 PM ]

Removed transients from transducer arity of distincts because transducers might be accessed from multiple threads

Comment by Nikita Prokopov [ 24/Dec/16 1:12 PM ]

Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete

Comment by Alex Miller [ 31/Dec/16 12:54 PM ]

I would say that test is demonstrating a bug in transient sets/maps and you should file a ticket for that as it's a lot more important than this enhancement.

distinct should be able to use transients in both the transducer and lazy seq impls. The issue with contains? not working on transients is actually a separate ticket - http://dev.clojure.org/jira/browse/CLJ-700 that will likely require some class hierarchy rearrangement. I don't think we would take this change until that is fixed (so that you can avoid relying on the class and Java method variants).

Comment by Nikita Prokopov [ 04/Jan/17 11:47 AM ]

I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:

Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.

That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.

That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need.

Is my understanding correct? Am I missing something?

Comment by Nikita Prokopov [ 04/Jan/17 11:55 AM ]

Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?





[CLJ-2070] Faster clojure.core/delay implementation Created: 29/Nov/16  Updated: 06/Sep/17  Resolved: 06/Sep/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.9
Fix Version/s: Release 1.9

Type: Enhancement Priority: Major
Reporter: dennis zhuang Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance
Environment:

macOS Sierra
intel Core i7 2.5G Hz
Memory 16GB

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)


Attachments: Text File fast_delay_with_volatile_fn2.patch    
Patch: Code and Test
Approval: Ok

 Description   

clojure.lang.Delay uses a synchronized lock to protect the deref method, because it must make sure the Delay object is realized exactly once.

As we known synchronized lock plays worse performance under heavy concurrency. Instead, using volatile and double-checking lock in this situation improves the performance. The benchmark code is at test-delay.clj. The benchmark-delay function accepts thread count number as an argument, and it will start as many threads as the argument to access delay object concurrently as in (time (benchmark-delay 1)).

threads 1.9.0-alpha14 + patch % better
1 0.570196 ms 0.499905 ms 12
10 19.66194 ms 1.313828 ms 93
20 40.740032 ms 2.149794 ms 95
100 184.041421 ms 8.317295 ms 95

Patch: fast_delay_with_volatile_fn2.patch
Prescreened by: Alex Miller



 Comments   
Comment by Alex Miller [ 29/Nov/16 8:52 AM ]

A faster version of delay would be helpful - these are used extensively inside spec so would actually help the average case spec performance.

These whitespace errors need to be cleaned up...

$ git apply ~/Downloads/fast_delay.patch
/Users/alex/Downloads/fast_delay.patch:67: trailing whitespace.
	                try
/Users/alex/Downloads/fast_delay.patch:105: trailing whitespace.

/Users/alex/Downloads/fast_delay.patch:115: space before tab in indent.
        	    (fn []
/Users/alex/Downloads/fast_delay.patch:116: space before tab in indent.
          		    (.await barrier)
/Users/alex/Downloads/fast_delay.patch:117: space before tab in indent.
          		    (dotimes [_ 10000]
warning: squelched 8 whitespace errors
warning: 13 lines add whitespace errors.

More importantly, the double-check is on fn, so it's critical that fn is marked as volatile. You should re-run the perf test afterwards.

Comment by dennis zhuang [ 29/Nov/16 9:13 AM ]

Sorry, white spaces errors should be fixed before my attached.

But the fn doesn't need to be marked as volatile, because it's protected by synchronized in all blocks. And writing it to be null is fine here.

fn=null;

It's not like double-checking in singleton example, there is no reordering here.

Comment by Alex Miller [ 29/Nov/16 9:25 AM ]

fn is read at the top before the synchronized block - it needs to be volatile or one thread may not see the write inside the synchronized block from another thread.

Comment by dennis zhuang [ 29/Nov/16 9:41 AM ]

Yep ,but it's fine here.
If one thread can't see the writing null for fn at the top, it will enter the locking block.
The double-checking on fn!=null will make sure the fn is called at most once, and if the fn was called by another thread and was set to be null ,then current thread will fail on second checking on fn!=null and exit the locking to go on reading value or exception.

So, in the worst situation, maybe two or more threads would enter the locking block,but they all will fail on second checking on fn!=null except one thread of them success.

I don't want to declare fn to be volatile, because volatile also has a cost on reading. The fn variable may be flushed into main memory too late, but it's acceptable and safe here, and we avoid the cost of volatile reading.

Comment by Alex Miller [ 29/Nov/16 9:45 AM ]

I think you're wrong, and I'm not screening it without it being marked as volatile.

Comment by dennis zhuang [ 29/Nov/16 9:54 AM ]

The patch which mark fn volatile.

Comment by dennis zhuang [ 29/Nov/16 9:54 AM ]

The patch which does't mark fn volatile.

Comment by dennis zhuang [ 29/Nov/16 9:59 AM ]

Hi, alex.

I understand your opinion here. Though i still insist that fn doesn't need to be marked as volatile, but it's not a critical concern here.

I uploaded two patches, one is marked fn volatile, the other is not. All of them fixed the whitespace errors and update the benchmark result in ticket description.

Thanks.

Comment by dennis zhuang [ 29/Nov/16 10:15 AM ]

Rebase master.

Comment by Nicola Mometto [ 30/Nov/16 11:53 AM ]

dennis, here's an article describing why fn needs to be volatile: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

Comment by dennis zhuang [ 30/Nov/16 6:01 PM ]

@Nicola

I knew the double-checking issue in old JVM memory model, but it is different here.
There is no instance constructed, it's only assigning fn to be null, so it doesn't have instruments reordering. And we don't have a partial constructed instance that escapes.

But it's not critical concern here, it seems that volatile doesn't compact performance of this patch.

Thanks.





[CLJ-2050] Remove redundant key comparisons in HashCollisionNode Created: 03/Nov/16  Updated: 03/Nov/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Kwang Yul Seo Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, performance

Attachments: Text File 0001-Remove-redundant-key-comparisons-in-HashCollisionNod.patch    
Patch: Code
Approval: Prescreened

 Description   

Comparing key to array[idx] is redundant as findIndex already performed key comparison to find the index.

Prescreened by: Alex Miller






[CLJ-2022] Add fixed arities for mapcat Created: 13/Sep/16  Updated: 13/Sep/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Robert C Faber Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File CLJ-2022.patch    
Patch: Code

 Description   

Add fixed arities for mapcat, in the pattern of map.

Same change as CLJS-1776.



 Comments   
Comment by Alex Miller [ 13/Sep/16 10:29 AM ]

Presumably this is to improve performance. Please include a benchmark showing the difference.

Comment by Ghadi Shayban [ 13/Sep/16 12:47 PM ]

Please consider interactions with apply and laziness CLJ-1218





[CLJ-1950] cl-format is too slow for production use Created: 05/Jun/16  Updated: 05/Jun/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7, Release 1.9
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alain Picard Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance, print
Environment:

Mac OS X - 3GHz i7 16Gb ram


Approval: Triaged

 Description   

Run this example code:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(in-ns 'clojure.pprint)

(println "Basic output using raw str.")
(time
(doseq [i (range 1000)]
(apply str (interpose "," [1 2 3]))))

(println "Test 1 - raw cl-format")
(time
(doseq [i (range 1000)]
(clojure.pprint/cl-format nil "~{D^,~}" [1 2 3])))
;; ==> "Elapsed time: 231.345 msecs"

(println "Test 2 - call on the compiled format")
(def myx
(compile-format "~{D^,~}"))

(time
(doseq [i (range 1000)]
(clojure.pprint/cl-format nil myx [1 2 3])))

(println "Test 3 - using a formatter")
(def myy
(formatter "~{D^,~}"))

(time
(doseq [i (range 1000)]
(myy nil myx [1 2 3])))

(time
(dotimes (i 100000)
(format nil "~{D^,~}" '(1 2 3))))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

It will print something like this:

Basic output using raw str.
"Elapsed time: 2.402 msecs"
Test 1 - raw cl-format
"Elapsed time: 194.405 msecs"
Test 2 - call on the compiled format
"Elapsed time: 87.271 msecs"
Test 3 - using a formatter
"Elapsed time: 199.318 msecs"

So raw `str' is ~ 100X faster.

For reference, on the same hardware, using
SBCL Common Lisp, this test runs in < 1 ms.

There are (at least) 2 problems here:

1. cl-format function begins with a line like:

let [compiled-format (if (string? format-in) (compile-format format-in) format-in)

But there is no api to pass in a compiled-format into it; (as compile-format
is a private function, so can't be used at large) so this is kind of useless.

2. Even using a precompiled formatter is way too slow.

Suggested fix: none, except perhaps warning unwary users that this
function is simply not suitable for tight loops, and should only be
used to pretty print user input strings, etc.

Thank you






[CLJ-1917] internal-reduce extended on StringSeq calls `.length` on every iteration step Created: 24/Apr/16  Updated: 06/Sep/17  Resolved: 06/Sep/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: Release 1.9

Type: Enhancement Priority: Minor
Reporter: Dimitrios Piliouras Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: performance
Environment:

n/a


Attachments: Text File clj-1917-2.patch     Text File clj-1917.patch    
Patch: Code
Approval: Ok

 Description   

internal-reduce extended on StringSeq calls `.length` (on the same String object) upon every iteration step [1]. There is absolutely no need for this as the length of a String cannot change. Therefore, it can be bound once (in the `let` a couple of lines up) and used thereafter.

[1]: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core/protocols.clj#L151

Prescreened by: Alex Miller
Patch: clj-1917-2.patch



 Comments   
Comment by Jozef Wagner [ 01/Dec/16 2:00 PM ]

Patch attached

Comment by Rich Hickey [ 06/Sep/17 9:21 AM ]

no variable should ever be called l

Comment by Alex Miller [ 06/Sep/17 9:26 AM ]

Updated patch to use len instead of l.





[CLJ-1912] Optimized version of the '<' and '>' functions for arties larger than 2 Created: 08/Apr/16  Updated: 15/May/17  Resolved: 15/May/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Anton Harald Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: performance

Attachments: Text File clj-1912.patch     Text File clj-2075-add-three-arities-to-comparisons.patch.txt    
Patch: Code
Approval: Triaged

 Description   

When looking at the code of the build-in functions '<' and '>', (next more) is invoked twice in each comparison of two neighboring arguments.

Proposed: Compute (next more) only once per 'iteration' for the following set of functions that have the same code pattern: =, ==, <, <=, >, >=.

Perf improvements (see comments for more details):

Function Arity Before After % improved
< 4 140.7 ns 141.6 ns -0.6%
< 10 505.3 ns 460.3 ns 8.9%
< 100 9.0 µs 8.6 µs 4.4%
< 10000 885 µs 851 µs 3.8%
= 4 86.1 ns 86.8 ns 0.8%
= 10 333.4 ns 300.6 ns 9.8%
= 100 4.28 µs 3.65 µs 14.7%
= 10000 397.4 µs 353.3 µs 11.0%
== 4 138.1 ns 135.7 ns 1.7%
== 10 487.9 ns 460.9 ns 5.5%
== 100 5.58 µs 5.27 µs 5.6%
== 10000 565.0 µs 537.7 µs 4.8%

Patch: clj-1912.patch



 Comments   
Comment by Alex Miller [ 08/Apr/16 4:23 PM ]

I don't think there is a particular reason, feel free to make a patch.

Comment by Jozef Wagner [ 01/Dec/16 2:19 PM ]

Patch added. As if-let is defined later in the file, a combination of let and if is used.

Comment by Alex Miller [ 01/Dec/16 8:22 PM ]

Can you include benchmark code and benchmarks?

Comment by Jozef Wagner [ 02/Dec/16 3:25 AM ]

Benchmarks for <

(bench (< 1 2 3 4)) ;; 140.726376 ns
(bench (new< 1 2 3 4)) ;; 141.661964 ns

(bench (< 1 2 3 4 5 6 7 8 9 10)) ;; 505.381596 ns
(bench (new< 1 2 3 4 5 6 7 8 9 10)) ;; 460.331840 ns

(bench (apply < (range 100))) ;; 9.020666 µs
(bench (apply new< (range 100))) ;; 8.604638 µs

(bench (apply < (range 10000))) ;; 885.361898 µs
(bench (apply new< (range 10000))) ;; 851.344031 µs

Benchmarks for =

(bench (= 1 1 1 1)) ;; 86.114371 ns
(bench (new= 1 1 1 1)) ;; 86.874012 ns

(bench (= 1 1 1 1 1 1 1 1 1 1)) ;; 333.438530 ns
(bench (new= 1 1 1 1 1 1 1 1 1 1)) ;; 300.628516 ns

(bench (apply = (repeat 100 1))) ;; 4.282752 µs
(bench (apply new= (repeat 100 1))) ;; 3.650438 µs

(bench (apply = (repeat 10000 1))) ;; 397.401808 µs
(bench (apply new= (repeat 10000 1))) ;; 353.294723 µs

Benchmarks for ==

(bench (== 1 1 1 1)) ;; 138.162620 ns
(bench (new== 1 1 1 1)) ;; 135.759047 ns

(bench (== 1 1 1 1 1 1 1 1 1 1)) ;; 487.963993 ns
(bench (new== 1 1 1 1 1 1 1 1 1 1)) ;; 460.982411 ns

(bench (apply == (repeat 100 1))) ;; 5.587064 µs
(bench (apply new== (repeat 100 1))) ;; 5.273621 µs

(bench (apply == (repeat 10000 1))) ;; 565.031286 µs
(bench (apply new== (repeat 10000 1))) ;; 537.789795 µs

Benchmark code

Comment by Nikita Prokopov [ 15/May/17 3:45 PM ]

The patch from CLJ-2075 gives you much more substantial advantage by introducing real 3-arities instead of just optimizing away one of two next calls. 70-80% improvement vs 5-10% here.

Comment by Alex Miller [ 15/May/17 4:09 PM ]

see CLJ-2075





[CLJ-1905] loop should retain primitive int or float without widening Created: 29/Mar/16  Updated: 15/Apr/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Renzo Borgatti Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: compiler, performance, primitives
Environment:

Possibly older Clojure versions (but not verified).


Attachments: Text File 0001-CLJ-1905-remove-useless-widening-on-loop-bindings.patch    
Patch: Code and Test
Approval: Prescreened

 Description   

In the following example:

(defn incrementer [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (unchecked-inc-int i))
        i))))

the loop-local starts as an int but is widened to a local but is widened to a long in the recur. It should be possible to retain the primitive int (or alternately float) type on the recur, rather than widening to long (or double).

The compiler code that is promoting the int seems to be:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L6270-L6282

Proposed: remove useless widening on loop bindings

Patch: 0001-CLJ-1905-remove-useless-widening-on-loop-bindings.patch

Prescreening comments: My main question here is: do we want to support primitive int/float loop vars?



 Comments   
Comment by Alex Miller [ 29/Mar/16 10:54 AM ]

I don't think anything but primitive longs or doubles are intended to be supported in loop/recur. Presuming that is correct, this would be the expected behavior.

The last major round of numeric/primitive refactoring made the decision to focus only on supporting primitive longs and doubles. One consequence of this is that primitive int loops are difficult to optimize - the main time I run into this is when working with Java interop in a tight loop on (for example) String, collection, or array operations (all of which are int-indexed).

Re unchecked-inc vs unchecked-inc-int, the primary reason to have these two variants is not performance but behavior. In particular, hashing operations often expect to get 32-bit int overflow semantics, not 64-bit int overflow semantics.

In summary, I think in the example given I would not write it with either int or unchecked-inc-int but with long and unchecked-inc if you are looking for best performance.

Comment by Nicola Mometto [ 29/Mar/16 11:01 AM ]

Alex Miller I don't think that's correct, as (AFAIK) it's only in fn params/returns that primitive types are supposed to be restricted to longs and doubles.
Note that for example, char, byte, boolean, short etc work just fine in both let and loop, while int and float work fine in let but not in loop.

This is caused by the following 4 lines in Compiler.java https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L6278-L6281

As far as I can tell, there's no reason for those 4 lines to be there at this point in time, and removing them makes int and float locals to be emitted correctly inside loops

This example in clojure.org itself seems to assume that ints should work in loops http://clojure.org/reference/java_interop#primitives

Also from that same paragraph:

All Java primitive types are supported
let/loop-bound locals can be of primitive types, having the inferred, possibly primitive type of their init-form

Comment by Alex Miller [ 29/Mar/16 12:07 PM ]

I agree that it should be possible to let-bound primitives of other types - I'm really talking about what happens on recur.

What would you expect to happen for a fn recur target? I wouldn't expect primitives other than long or double to work there since they can't occur in the function signature.

Note that I haven't closed this ticket, still talking through this.

Comment by Alex Miller [ 29/Mar/16 12:10 PM ]

I've definitely run into cases where creating a primitive int loop/recur would be useful for tight interop loops given how common int-indexing is in Java (some of the alioth benchmarks in particular would benefit from this). I think the argument is far weaker for float though.

Comment by Nicola Mometto [ 29/Mar/16 12:19 PM ]

I don't think we need to worry about fn recur targets at all, given that the only possible primitive bindings there are either long or double, and int/floats would get widened anyway, but good point, the tests in a patch for this ticket need to be sure that case is indeed handled.

RE: floats – I recall people complaining about bad support for floats when using clojure for graphical processing.

Even if admittedly a weak argument, I'm always of the idea that we should strike to be as consistent as possible. I don't think anybody would expect let/loop locals to behave differently, or differences between primitive types (other than the documented limitation about long/double being the only working prim types for fn arguments/return vals)

Comment by Alex Miller [ 29/Mar/16 12:30 PM ]

I'll leave this one in here but I'm going to treat it as an enhancement to current behavior. I think there's a good chance that Rich will just say this is working as intended.

I don't think the example is a very good one though and would welcome a better example. The reservations regarding unchecked-inc-int do not seem correct or valid to me (as usage should be fine on longs and is not designed for perf reasons anyways). A good example would should usage of a Java api in a loop where int-indexing and int-math gives better performance.

Comment by Nicola Mometto [ 30/Mar/16 8:51 AM ]

I edited the title as the bug is in `loop`, not `recur`

Comment by Nicola Mometto [ 02/Apr/16 9:55 AM ]

Attached a patch that removes the useless widenings done by the compiler on loop bindings, here's a benchmark demonstrating the speedup gained when areducing over int-arrays:

Before patch:

Clojure 1.8.0
user=> (use 'criterium.core)
nil
user=> (let [xs (int-array (range 1e6))] (bench (areduce xs i ret 0 (+ ret (aget xs i)))))
Evaluation count : 64260 in 60 samples of 1071 calls.
             Execution time mean : 954.009929 µs
    Execution time std-deviation : 20.292809 µs
   Execution time lower quantile : 926.331747 µs ( 2.5%)
   Execution time upper quantile : 1.009189 ms (97.5%)
                   Overhead used : 1.840681 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 4 (6.6667 %)
 Variance from outliers : 9.4244 % Variance is slightly inflated by outliers
nil

After patch:

Clojure 1.9.0-master-SNAPSHOT
user=> (use 'criterium.core)
nil
user=> (let [xs (int-array (range 1e6))] (bench (areduce xs i ret 0 (+ ret (aget xs i)))))
Evaluation count : 68640 in 60 samples of 1144 calls.
             Execution time mean : 870.462532 µs
    Execution time std-deviation : 13.100790 µs
   Execution time lower quantile : 852.357513 µs ( 2.5%)
   Execution time upper quantile : 896.531529 µs (97.5%)
                   Overhead used : 1.844045 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil




[CLJ-1901] amap calls `alength` at every iteration step Created: 13/Mar/16  Updated: 06/Sep/17  Resolved: 06/Sep/17

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: Release 1.9

Type: Enhancement Priority: Major
Reporter: Dimitrios Piliouras Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: arrays, performance
Environment:

JVM


Attachments: Text File fix_amap.patch    
Patch: Code
Approval: Ok

 Description   

During the 1.7 => 1.8 upgrade `areduce` was fixed to not call `alength` on the same thing, at every single iteration step. However, `amap` which suffers from the same issue was not fixed (even though exactly the same fix applies).

Example:

(def an-array (long-array 100000 0))
(dotimes [_ 50]
  (time (amap ^longs an-array idx ret (+ 1 (aget ^longs an-array idx)))))

Before (last time): 0.3930 ms
After (last time): 0.3459 ms

Patch: fix_amap.patch

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 13/Mar/16 4:39 PM ]

Thanks!

Comment by Dimitrios Piliouras [ 24/Apr/16 1:33 PM ]

Not a problem. I actually noticed a very similar thing in the `internal-reduce` implementation for StringSeq [1]. The `.length()` method is called on the same String on every single iteration step, even though it is a constant. Is that easy enough to be sorted without me submitting another trivial patch? Thanks in advance...

https://github.com/clojure/clojure/blob/master/src/clj/clojure/core/protocols.clj#L151

Comment by Alex Miller [ 24/Apr/16 1:48 PM ]

Separate ticket would be preferred, thanks.

Comment by Dimitrios Piliouras [ 24/Apr/16 2:32 PM ]

Sure thing, I'll create it now.





[CLJ-1895] Remove loading of clojure.string in clojure.java.io Created: 22/Feb/16  Updated: 22/Feb/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File clj-1895.patch    
Patch: Code
Approval: Prescreened

 Description   

clojure.core loads clojure.java.io to define slurp and spit. clojure.java.io loads clojure.string, solely for a single call to replace. This slows down Clojure core startup for no reason.

Approach: Replace clojure.string/replace call with a Java interop call to .replace. This saves about 18 ms during Clojure core startup.

Patch: clj-1895.patch






[CLJ-1888] AReference#meta() is synchronized Created: 26/Jan/16  Updated: 16/Mar/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Roger Kapsi Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: PNG File aref-meta-after.png     PNG File aref-meta.png     Text File clj-1888-2.patch     Text File clj-1888.patch    
Patch: Code
Approval: Prescreened

 Description   

We use Clojure for a "rules engine". Each function represents a rule and metadata describes the rule and provides some static configuration for the rule itself. The system is immutable and concurrent.

If two or more Threads invoke the same Var concurrently they end up blocking each other because AReference#meta() is synchronized (see attached screenshot, the red dots).

(defn 
  ^{:rule {:remote-address "127.0.0.1"}}
  example
  [request]
  (let [rule (:rule (meta #'example))]
    (= (:remote-address rule) (:remote-address request))))

Approach: Replace synchronized block with a rwlock for greater read concurrency. This approach removes meta read contention (see real world example in comments). However, it comes with the downsides of:

  • extra field for every AReference (all namespaces, vars, atoms, refs, and agents)
  • adds construction of lock into construction of AReference (affects perf and startup time)

Patch: clj-1888-2.patch replaces synchronized with a rwlock for greater read concurrency

Alternatives:

  • Use volatile for _meta and synchronized for alter/reset. Allow read of _meta just under the volatile - would this be safe enough?
  • Extend AReference from ReentrantReadWriteLock instead of holding one - this is pretty weird but would have a different (potentially better) footprint for memory/construction.


 Comments   
Comment by Alex Miller [ 26/Jan/16 10:19 PM ]

A volatile is not sufficient in alterMeta as you need to read/update/write atomically.

You could however use a ReadWriteLock instead of synchronized. I've attached a patch that does this - if you have a reproducible case I'd be interested to see how it affects what you see in the profiler.

There are potential issues that would need to be looked at - this will increase memory per reference (the lock instance) and slow down construction (lock construction) at the benefit of more concurrent reads.

Comment by Roger Kapsi [ 27/Jan/16 8:34 AM ]

Hey Alex,

I do have a reproducible case. The blocking has certainly disappeared after applying your patch (see attached picture). The remaining blocking code on these "WorkerThreads" is sun.nio.ch.SelectorImpl.select(long) (i.e. not clojure related).

You can repro it yourself by executing something like the code below concurrently in an infinite loop.

(defn 
  ^{:rule {:remote-address "127.0.0.1"}}
  example
  [request]
  (let [rule (:rule (meta #'example))]
    (= (:remote-address rule) (:remote-address request))))

Suggestions for the patch: Make the meta lock a final field and maybe pull the read/write locks into local variables to avoid the double methods calls.

alterMeta(...)
  Lock w = _metaLock.writeLock();
  w.lock();
  try {
    // ...
  } finally {
    w.unlock();
  }
}
Comment by Alex Miller [ 16/Mar/16 3:02 PM ]

Marking pre-screened,





[CLJ-1882] Use transients in merge-with Created: 11/Jan/16  Updated: 11/Jan/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Ghadi Shayban Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, transient


 Description   

This ticket has been broken away from CLJ-1458 for tracking.






[CLJ-1866] Optimise argument boxing during reflection Created: 08/Dec/15  Updated: 11/Dec/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, reflection

Attachments: Text File clj-1866.patch    

 Description   

Currently argument boxing is clojure.lang.Reflector is inefficient for the following two reasons:
1. It makes an unnecessary call to Class.cast(..) when the parameter type is non-primitive
2. It allocates an unnecessary extra Object[] array when boxing arguments

This patch fixes these issues, without otherwise changing behaviour. All tests pass.



 Comments   
Comment by Alex Miller [ 09/Dec/15 7:23 AM ]

Example code where this is an issue?

Benchmark before/after ?

Comment by Mike Anderson [ 09/Dec/15 9:08 PM ]

Hi alex, I'm trying to improve the fast path for reflection, this will be a bunch of changes, together with clj-1784 and a few other ideas I have.

I don't think it will be productive to have an extensive debate / benchmarking over every single change. Would it be better to make a new ticket for all changes together with benchmarking for the overall impact?

Happy to do this, but it would be helpful if you could give an indication that this stuff will be accepted on the assumption that an overall improvement is demonstrated. I don't want to waste effort on Clojure dev if you guys are not interested in performance improvements in this space. And I don't have time to have an extensive debate over every individual change.

Comment by Alex Miller [ 10/Dec/15 7:51 AM ]

Each ticket should identify a specific problem and propose a solution with evidence that it helps. If there are multiple issues it is quite likely that they may move at different rates.

If you identify a hot spot, then we are happy to look at it. But we have to start from a problem to solve not just: "here are some changes". If the change makes things better, then you should be able to demonstrate that.

Comment by Alex Miller [ 10/Dec/15 7:53 AM ]

I would say that I am still dubious that the right answer to a problem with reflection is not just removing the reflection. But I can't evaluate that until you provide a problem.

Comment by Mike Anderson [ 10/Dec/15 7:10 PM ]

The problem is that it is inefficient (and therefore slow for users of reflection), as stated in the description. If you have an alternative way that you would like to see it worded, can you suggest?

Whether or not people should be using reflection is orthogonal to making reflection itself faster. I agree people should use type hints where they can but plenty of people don't have time to figure it out / don't know how to do properly / don't realise it is happening so surely any improvement for these cases should be welcome?

Also you haven't answered my question: would you like everything rolled into a single reflection performance improvement ticket, or not?

Comment by Alex Miller [ 11/Dec/15 10:11 AM ]

The title of this ticket is "Optimise argument boxing during reflection". That is a solution, not a problem. What I'm looking for is a title like "Reflection with boxed args is slow" and a description that starts with some example code demonstrating the problem. (That example code often makes for a particularly good template for a test that should be in the patch as well.)

I am then looking for evidence that the change you are suggesting improves the problem. For a performance issue, I am specifically looking for a before/after benchmark, preferably using a testing tool like criterium that gives me some confidence that the gains are real.

From a prioritization standpoint, I do not consider reflection performance to be a high priority because the best answer is probably: don't use reflection. That said, I'm willing to consider it, particularly if there is a compelling example where it may be difficult to remove the reflection or where it is particularly non-obvious that the reflection is happening.

Regarding your final question, we prefer to consider individual problems rather "a big bunch of changes", so separate would be better.





[CLJ-1814] Make `satisfies?` as fast as a protocol method call Created: 11/Sep/15  Updated: 18/Sep/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Critical
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 10
Labels: performance, protocols

Attachments: Text File 0001-CLJ-1814-cache-protocol-impl-satisfies-as-fast-as-me.patch     Text File 0001-CLJ-1814-cache-protocol-impl-satisfies-as-fast-as-me-v2.patch     Text File 0001-CLJ-1814-cache-protocol-impl-satisfies-as-fast-as-me-v3.patch     Text File 0001-CLJ-1814-cache-protocol-impl-satisfies-as-fast-as-me-v4.patch     Text File 0001-CLJ-1814-cache-protocol-impl-satisfies-as-fast-as-me-v5.patch     Text File CLJ-1814-v6.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Currently `satisfies?` doesn't use the same impl cache used by protocol methods, making it too slow for real world usage.

With:

(defprotocol p (f [_]))
(deftype x [])
(deftype y [])
(extend-type x p (f [_]))

Before patch:

(let [s "abc"] (bench (instance? CharSequence s))) ;; Execution time mean : 1.358360 ns
(let [x (x.)] (bench (satisfies? p x))) ;; Execution time mean : 112.649568 ns
(let [y (y.)] (bench (satisfies? p y))) ;; Execution time mean : 2.605426 µs

Cause: `satisfies?` calls `find-protocol-impl` to see whether an object implements a protocol, which checks for whether x is an instance of the protocol interface or whether x's class is one of the protocol implementations (or if its in an inheritance chain that would make this true). This check is fairly expensive and not cached.

Proposed: Extend the protocol's method impl cache to also handle (and cache) instance checks (including negative results).

After patch:

(let [x (x.)] (bench (satisfies? p x))) ;; Execution time mean : 79.321426 ns
(let [y (y.)] (bench (satisfies? p y))) ;; Execution time mean : 77.410858 ns

Patch: CLJ-1814-v6.patch



 Comments   
Comment by Michael Blume [ 11/Sep/15 4:17 PM ]

Nice. Honeysql used to spend 80-90% of its time in satisfies? calls before we refactored them out.

Comment by Michael Blume [ 24/Sep/15 3:55 PM ]

I realize this is a deeply annoying bug to reproduce, but if I clone core.match, point its Clojure dependency to 1.8.0-master-SNAPSHOT, start a REPL, connect to the REPL from vim, and reload clojure.core.match, I get

|| java.lang.Exception: namespace 'clojure.tools.analyzer.jvm.utils' not found, compiling:(clojure/tools/analyzer/jvm.clj:9:1)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5647| clojure.core$throw_if.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5733| clojure.core$load_lib.invokeStatic
|| clojure.core$load_lib.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:142)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5765| clojure.core$load_libs.invokeStatic
|| clojure.core$load_libs.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:137)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5787| clojure.core$require.invokeStatic
|| clojure.core$require.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:703)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.analyzer.jvm/0.6.5/tools.analyzer.jvm-0.6.5.jar::clojure/tools/analyzer/jvm.clj|9| clojure.tools.analyzer.jvm$eval4968$loading__5561__auto____4969.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.analyzer.jvm/0.6.5/tools.analyzer.jvm-0.6.5.jar::clojure/tools/analyzer/jvm.clj|9| clojure.tools.analyzer.jvm$eval4968.invokeStatic
|| clojure.tools.analyzer.jvm$eval4968.invoke(jvm.clj)
|| clojure.lang.Compiler.eval(Compiler.java:6934)
|| clojure.lang.Compiler.eval(Compiler.java:6923)
|| clojure.lang.Compiler.load(Compiler.java:7381)
|| clojure.lang.RT.loadResourceScript(RT.java:372)
|| clojure.lang.RT.loadResourceScript(RT.java:363)
|| clojure.lang.RT.load(RT.java:453)
|| clojure.lang.RT.load(RT.java:419)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5883| clojure.core$load$fn__5669.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5882| clojure.core$load.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5683| clojure.core$load_one.invokeStatic
|| clojure.core$load_one.invoke(core.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5728| clojure.core$load_lib$fn__5618.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5727| clojure.core$load_lib.invokeStatic
|| clojure.core$load_lib.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:142)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5765| clojure.core$load_libs.invokeStatic
|| clojure.core$load_libs.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:137)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5787| clojure.core$require.invokeStatic
|| clojure.core$require.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:457)
src/main/clojure/clojure/core/match.clj|1| clojure.core.match$eval4960$loading__5561__auto____4961.invoke
src/main/clojure/clojure/core/match.clj|1| clojure.core.match$eval4960.invokeStatic
|| clojure.core.match$eval4960.invoke(match.clj)
|| clojure.lang.Compiler.eval(Compiler.java:6934)
|| clojure.lang.Compiler.eval(Compiler.java:6923)
|| clojure.lang.Compiler.load(Compiler.java:7381)
|| clojure.lang.RT.loadResourceScript(RT.java:372)
|| clojure.lang.RT.loadResourceScript(RT.java:363)
|| clojure.lang.RT.load(RT.java:453)
|| clojure.lang.RT.load(RT.java:419)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5883| clojure.core$load$fn__5669.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5882| clojure.core$load.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5683| clojure.core$load_one.invokeStatic
|| clojure.core$load_one.invoke(core.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5728| clojure.core$load_lib$fn__5618.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5727| clojure.core$load_lib.invokeStatic
|| clojure.core$load_lib.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:142)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5765| clojure.core$load_libs.invokeStatic
|| clojure.core$load_libs.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:137)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5787| clojure.core$require.invokeStatic
|| clojure.core$require.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:421)
|| clojure.core.match$eval4949.invokeStatic(form-init2494799382238714928.clj:1)
|| clojure.core.match$eval4949.invoke(form-init2494799382238714928.clj)
|| clojure.lang.Compiler.eval(Compiler.java:6934)
|| clojure.lang.Compiler.eval(Compiler.java:6897)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|3096| clojure.core$eval.invokeStatic
|| clojure.core$eval.invoke(core.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|240| clojure.main$repl$read_eval_print__7404$fn__7407.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|240| clojure.main$repl$read_eval_print__7404.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|258| clojure.main$repl$fn__7413.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|258| clojure.main$repl.invokeStatic
|| clojure.main$repl.doInvoke(main.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:1523)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|58| clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__637.invoke
|| clojure.lang.AFn.applyToHelper(AFn.java:152)
|| clojure.lang.AFn.applyTo(AFn.java:144)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|645| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|1874| clojure.core$with_bindings_STAR_.invokeStatic
|| clojure.core$with_bindings_STAR_.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:425)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|56| clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invokeStatic
|| clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|191| clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__679$fn__682.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|159| clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__674.invoke
|| clojure.lang.AFn.run(AFn.java:22)
|| java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
|| java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
|| java.lang.Thread.run(Thread.java:745)

Same thing with reloading a namespace in my own project which depends on clojure.core.match

Comment by Nicola Mometto [ 24/Sep/15 3:59 PM ]

is it possible that AOT is involved?

Comment by Michael Blume [ 24/Sep/15 5:31 PM ]

Narrowed it down a little, if I check out tools.analyzer.jvm, open a REPL, and do (require 'clojure.tools.analyzer.jvm.utils) I get

|| java.lang.ClassCastException: java.lang.Class cannot be cast to clojure.asm.Type, compiling:(utils.clj:260:13)
|| clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3642)
|| clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3636)
|| clojure.lang.Compiler$DefExpr.eval(Compiler.java:450)
|| clojure.lang.Compiler.eval(Compiler.java:6939)
|| clojure.lang.Compiler.load(Compiler.java:7381)
|| clojure.lang.RT.loadResourceScript(RT.java:372)
|| clojure.lang.RT.loadResourceScript(RT.java:363)
|| clojure.lang.RT.load(RT.java:453)
|| clojure.lang.RT.load(RT.java:419)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5883| clojure.core$load$fn__5669.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5882| clojure.core$load.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5683| clojure.core$load_one.invokeStatic
|| clojure.core$load_one.invoke(core.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5728| clojure.core$load_lib$fn__5618.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5727| clojure.core$load_lib.invokeStatic
|| clojure.core$load_lib.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:142)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5765| clojure.core$load_libs.invokeStatic
|| clojure.core$load_libs.doInvoke(core.clj)
|| clojure.lang.RestFn.applyTo(RestFn.java:137)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|647| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|5787| clojure.core$require.invokeStatic
|| clojure.core$require.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:421)
|| clojure.tools.analyzer.jvm.utils$eval4392.invokeStatic(form-init8663423518975891793.clj:1)
|| clojure.tools.analyzer.jvm.utils$eval4392.invoke(form-init8663423518975891793.clj)
|| clojure.lang.Compiler.eval(Compiler.java:6934)
|| clojure.lang.Compiler.eval(Compiler.java:6897)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|3096| clojure.core$eval.invokeStatic
|| clojure.core$eval.invoke(core.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|240| clojure.main$repl$read_eval_print__7404$fn__7407.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|240| clojure.main$repl$read_eval_print__7404.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|258| clojure.main$repl$fn__7413.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/main.clj|258| clojure.main$repl.invokeStatic
|| clojure.main$repl.doInvoke(main.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:1523)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|58| clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__637.invoke
|| clojure.lang.AFn.applyToHelper(AFn.java:152)
|| clojure.lang.AFn.applyTo(AFn.java:144)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|645| clojure.core$apply.invokeStatic
zipfile:/Users/michael.blume/.m2/repository/org/clojure/clojure/1.8.0-master-SNAPSHOT/clojure-1.8.0-master-SNAPSHOT.jar::clojure/core.clj|1874| clojure.core$with_bindings_STAR_.invokeStatic
|| clojure.core$with_bindings_STAR_.doInvoke(core.clj)
|| clojure.lang.RestFn.invoke(RestFn.java:425)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|56| clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invokeStatic
|| clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj)
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|191| clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__679$fn__682.invoke
zipfile:/Users/michael.blume/.m2/repository/org/clojure/tools.nrepl/0.2.10/tools.nrepl-0.2.10.jar::clojure/tools/nrepl/middleware/interruptible_eval.clj|159| clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__674.invoke
|| clojure.lang.AFn.run(AFn.java:22)
|| java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
|| java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
|| java.lang.Thread.run(Thread.java:745)

I don't see where AOT would be involved?

Comment by Nicola Mometto [ 27/Sep/15 2:28 PM ]

Michael Blume The updated patch should fix the issue you reported

Comment by Michael Blume [ 28/Sep/15 12:39 PM ]

Cool, thanks =)

New patch no longer deletes MethodImplCache, which is not used – is that deliberate?

Comment by Alex Miller [ 02/Nov/15 3:08 PM ]

It would be cool if there was a bulleted list of the things changed in the patch in the description. For example: "Renamed MethodImplCache to ImplCache", etc. That helps makes it easier to review.

Comment by Nicola Mometto [ 02/Nov/15 3:35 PM ]

Attached is an updated patch that doesn't replace MethodImplCache with ImplCache but simply reuses MethodImplCache, reducing the impact of this patch and making it easier (and safer) to review.

Comment by Alex Miller [ 07/Jun/16 11:42 AM ]

Bumping priority as this is used in new inst? predicate - see https://github.com/clojure/clojure/commit/58227c5de080110cb2ce5bc9f987d995a911b13e

Comment by Alex Miller [ 29/Jun/17 2:31 PM ]

I ran the before and after tests with the v3 patch. Before times matched pretty closely but I could not replicate the after results. I got this which is actually much worse in the not found case:

(let [x (x.)] (bench (satisfies? p x))) ;; Execution time mean : 76.833504 ns
(let [y (y.)] (bench (satisfies? p y))) ;; Execution time mean : 20.570007 µs
Comment by Nicola Mometto [ 29/Jun/17 4:09 PM ]

v4 patch fixes the regression on the not-found case, not sure how that happened, apologies.
Here are the timings I'm getting now:

clojure master:

user=> (let [x (x.)] (bench (satisfies? p x)))
Evaluation count : 604961580 in 60 samples of 10082693 calls.
             Execution time mean : 112.649568 ns
    Execution time std-deviation : 12.216782 ns
   Execution time lower quantile : 99.299203 ns ( 2.5%)
   Execution time upper quantile : 144.265205 ns (97.5%)
                   Overhead used : 1.898271 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 2 (3.3333 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 73.7545 % Variance is severely inflated by outliers
nil
user=> (let [y (y.)] (bench (satisfies? p y)))
Evaluation count : 22676100 in 60 samples of 377935 calls.
             Execution time mean : 2.605426 µs
    Execution time std-deviation : 141.100070 ns
   Execution time lower quantile : 2.487234 µs ( 2.5%)
   Execution time upper quantile : 2.873045 µs (97.5%)
                   Overhead used : 1.898271 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 40.1251 % Variance is moderately inflated by outliers
nil

master + v4:

user=> (let [x (x.)] (bench (satisfies? p x)))
Evaluation count : 731759100 in 60 samples of 12195985 calls.
             Execution time mean : 79.321426 ns
    Execution time std-deviation : 3.959245 ns
   Execution time lower quantile : 75.365187 ns ( 2.5%)
   Execution time upper quantile : 87.986479 ns (97.5%)
                   Overhead used : 1.905711 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 35.2614 % Variance is moderately inflated by outliers
nil
user=> (let [y (y.)] (bench (satisfies? p y)))
Evaluation count : 771220140 in 60 samples of 12853669 calls.
             Execution time mean : 77.410858 ns
    Execution time std-deviation : 1.407926 ns
   Execution time lower quantile : 75.852530 ns ( 2.5%)
   Execution time upper quantile : 80.759226 ns (97.5%)
                   Overhead used : 1.897646 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 3 (5.0000 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 7.7866 % Variance is slightly inflated by outliers

So to summarize:
master found = 112ns
master not-found = 2.6us

master+v4 found = 79ns
master+v4 not-found = 77ns

Comment by Michael Blume [ 14/Jul/17 5:12 PM ]

For records that have been declared with an implementation of a particular protocol, and which therefore implement the corresponding interface, would it make sense to use an (instance?) check against that interface as a fast path?

Comment by Alex Miller [ 04/Aug/17 11:28 AM ]

Michael - that check is already in there

Nicola - I have a few comments/questions:

1. I don't get what the purpose of the NIL stuff is - could you explain that?
2. In the case where x is an instance of the interface, the old code returned x and the new code in find-protocol-impl* returns interface. Why the change?
3. This: (alter-var-root (:var protocol) assoc :impl-cache (expand-method-impl-cache cache c impl)) is not thread-safe afaict - I think simultaneous misses in two different threads for different impls would cause the cache to only have one of them. This is probably unlikely, and probably not a big deal since the cache will just be updated on the next call (not give a wrong answer), but wanted to mention it. I don't see any easy way to avoid it without a lot of changes.

Comment by Nicola Mometto [ 04/Aug/17 6:14 PM ]

Alex, thanks for looking at this,
1- The NIL object is a placeholder in the method impl cache for nil, as `find-and-cache-protocol-impl` tests for `nil?` to know if the dispatch has been cached or not

2- The change is purely a consistency one, making find-and-cache-protocol-impl always return a class/interface rather than either a class/interface or a concrete instance. Behaviourally, it doesn't change anything since the two consumers of `find-protocol-impl`, namely `find-protocol-method` and `satisfies?` don't care what that value is in that case

3- Yes you are correct that it is not thread safe, however I think it's a decent tradeoff as it doesn't cause any incorrect behaviour and at worse would cause an extra cache miss, making it thread safe would mean an extra performance penalty in every cache hit/miss

Comment by Michael Blume [ 16/Aug/17 1:44 PM ]

Nicola Mometto I'm seeing a change in behavior from this patch

(defprotocol BoolProtocol
  (proto-fn [this]))

(extend-protocol BoolProtocol
  Object
  (proto-fn [x] "Object impl")

  nil
  (proto-fn [x] "Nil impl"))

(proto-fn false)

returns "Object impl" with Clojure master and "Nil impl" with this patch

Comment by Nicola Mometto [ 17/Aug/17 3:08 AM ]

That's not good, I'll take a look later today, thanks

Comment by Nicola Mometto [ 18/Aug/17 3:29 AM ]

This was an issue with how `nil` was cached, I decided to special-case `nil` to skip the method cache, removing the need for all the `NIL` funny business and fixing this bad interaction between `false` and `nil`.

Comment by Michael Blume [ 18/Aug/17 1:17 PM ]

Not sure if it's in scope for this ticket, but given that this wasn't caught, there should probably be more protocol dispatch tests

Comment by Alex Miller [ 20/Aug/17 5:00 PM ]

yes, should definitely add

Comment by Michael Blume [ 21/Aug/17 12:52 PM ]

Patch with test

Comment by Michael Blume [ 21/Aug/17 12:56 PM ]

Verified that test fails with v4 patch:

[java] Testing clojure.test-clojure.protocols
     [java]
     [java] FAIL in (test-default-dispatch) (protocols.clj:695)
     [java] expected: (= "Object impl" (test-dispatch false))
     [java]   actual: (not (= "Object impl" "Nil impl"))
     [java]
     [java] FAIL in (test-default-dispatch) (protocols.clj:695)
     [java] expected: (= "Object impl" (test-dispatch true))
     [java]   actual: (not (= "Object impl" "Nil impl"))
Comment by Michael Blume [ 18/Sep/17 2:19 PM ]

Has this patch missed the 1.9 train? There was a fix we were hoping to make in HoneySQL that I'd hesitate to make with satisfies? as slow as it is.

Comment by Alex Miller [ 18/Sep/17 2:48 PM ]

Not necessarily. We don't add features after 1.9 but perf stuff like this is still possible. It's been vetted by Rich. It's in my list of stuff to screen.





[CLJ-1804] take transducer optimization Created: 25/Aug/15  Updated: 26/Aug/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Karsten Schmidt Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, transducers

Attachments: Text File clj-1804-1.patch     Text File clj-1804-2.patch    
Patch: Code
Approval: Triaged

 Description   

A basic refactoring to remove the let form and only requires a single counter check for each iteration, yields an 25% performance increase. With the patch, 2 checks are only required for the last iteration (in case counter arg was <= 0)...

;; master
(quick-bench (into [] (take 1000) (range 2000)))
WARNING: Final GC required 34.82584189073624 % of runtime
Evaluation count : 13050 in 6 samples of 2175 calls.
             Execution time mean : 46.921254 µs
    Execution time std-deviation : 1.904733 µs
   Execution time lower quantile : 45.124921 µs ( 2.5%)
   Execution time upper quantile : 49.427201 µs (97.5%)
                   Overhead used : 2.367243 ns

;; w/ patch
(quick-bench (into [] (take 1000) (range 2000)))
WARNING: Final GC required 34.74448252054369 % of runtime
Evaluation count : 18102 in 6 samples of 3017 calls.
             Execution time mean : 34.301193 µs
    Execution time std-deviation : 1.714105 µs
   Execution time lower quantile : 32.341349 µs ( 2.5%)
   Execution time upper quantile : 37.046851 µs (97.5%)
                   Overhead used : 2.367243 ns


 Comments   
Comment by Karsten Schmidt [ 25/Aug/15 10:35 AM ]

Proposed patch, passes all existing tests

Comment by Alex Miller [ 25/Aug/15 11:28 AM ]

From a superficial skim, I see checks for pos? and then neg? which both exclude 0 and that gives me doubts about that branch. That may actually be ok but I will have to read it more closely.

Comment by Karsten Schmidt [ 25/Aug/15 11:58 AM ]

Hi Alex, try running the tests... AFAICT it's all still working as advertised: For (take 0) or (take -1) then the pos? check fails, but we must ensure to not call rf for that iteration. For all other (take n) cases only the pos? check is executed apart from the last iteration (which causes a single superfluous neg? call) The current path/impl always does 2 checks for all iterations and hence is (much) slower.

Comment by Alex Miller [ 25/Aug/15 7:22 PM ]

The only time that neg? case will be hit is if take is passed n <= 0, so I think this is correct. However, you might consider some different way to handle that particular case - for example, it could be pulled out of the transducer function entirely and be created as a separate transducer function entirely. I'm not sure that's worth doing, but it's an idea.

Comment by Karsten Schmidt [ 26/Aug/15 6:01 AM ]

Good idea, Alex! This 2nd patch removes the neg? check and adds a quick bail transducer for cases where n <= 0. It also made it slightly faster still:

(quick-bench (into [] (take 1000) (range 2000)))
Evaluation count : 20370 in 6 samples of 3395 calls.
             Execution time mean : 30.302673 µs

(now ~35% faster than original)





[CLJ-1799] Replace refs in pprint Created: 13/Aug/15  Updated: 28/Sep/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, print

Attachments: Text File CLJ-1799.patch    
Patch: Code

 Description   

I noticed that pprint uses refs and dosync transactions in a number of places, which seems unlikely to be necessary. It seems like these could be replaced by atoms, or even volatiles, given that printing typically happens in a single thread. Presumably this would improve performance of pprint significantly.



 Comments   
Comment by dennis zhuang [ 14/Aug/15 11:28 AM ]

I develop a patch to fix this issue.I run all the tests in clojure and clojure.data.json, and no one fails.

Use criterium to do a simple benchmark as below:

(use 'criterium.core)
(require '[clojure.data.json :as json])
(bench (json/write-str 
  {:a 1 :b 2 :c (range 10) :d "hello world"
   :e (apply hash-set (range 10))}))

before patch:

Evaluation count : 6180060 in 60 samples of 103001 calls.
             Execution time mean : 10.302604 µs
    Execution time std-deviation : 597.958933 ns
   Execution time lower quantile : 9.631444 µs ( 2.5%)
   Execution time upper quantile : 11.618551 µs (97.5%)
                   Overhead used : 1.724553 ns

After patch:

Evaluation count : 6000900 in 60 samples of 100015 calls.
             Execution time mean : 10.212543 µs
    Execution time std-deviation : 564.874941 ns
   Execution time lower quantile : 9.528383 µs ( 2.5%)
   Execution time upper quantile : 11.334033 µs (97.5%)
                   Overhead used : 1.827143 ns




[CLJ-1798] The RetryEx in LockingTransaction should be static Created: 13/Aug/15  Updated: 13/Aug/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: dennis zhuang Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: STM, performance
Environment:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Mac OSX 10.10.4


Attachments: PNG File profile.png     Text File static_retryex.patch    
Patch: Code
Approval: Triaged

 Description   

We are using clojure.data.json, and we profiled our project with jprofiler, it shows that there is a hotspot in LockingTransaction.Too many RetryEx instances were created during the profiling.
The retryex instance variable should be static as a class variable to avoid creating when a new STM transaction begins.

The attacments are the profile result screen snapshot and the patch to make the retryex to be static.
I don't do any benchmark right now,but maybe a clojure.data.json performance benchmark will approve the patch works better.



 Comments   
Comment by Alex Miller [ 13/Aug/15 8:04 AM ]

I think the suggestion here is sound, but I have a hard time believing it will make much of a difference. My real question is why pprint is using dosync.

Comment by dennis zhuang [ 13/Aug/15 9:03 AM ]

I found it uses many dosync in writer as below:

https://github.com/clojure/clojure/blob/master/src/clj/clojure/pprint/pretty_writer.clj

It seems that using the dosync to change some mutable states.Maybe they can be rewrote into other forms such as atom,java object/lock etc.
But it's another story.





[CLJ-1789] Use transients with select-keys if possible Created: 28/Jul/15  Updated: 04/Oct/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance

Attachments: Text File 0001-CLJ-1789-Use-transients-and-reduce-with-select-keys.patch    
Patch: Code

 Description   

Problem: Currently select-keys uses conj to add entries. If the map is editable, conj! could be used instead to improve select-keys performance.

Additionally keyseq is traversed as a seq but could be traversed via reduce instead, which might be faster.

Approach 1: Use a transient map and conj!, keeping loop/recur
Approach 2: Reimplement select-keys to use reduce instead of loop/recur
Approach 3: Combine approach one and two

selected key size loop/recur transient reduce transient + reduce
1 243 ns 256 ns 161 ns 188 ns
7 1.1 ms
885 ns 454 ns

From these numbers, approach 3 was chosen.

Note: In order to implement select-keys in terms of reduce, select-keys needed to be moved until after the definition of reduce. This forced a (declare select-keys) since it's used before the definition of reduce

Patch: 0001-CLJ-1789-Use-transients-and-reduce-with-select-keys.patch



 Comments   
Comment by Erik Assum [ 28/Sep/17 2:28 PM ]

Standard Clojure select-keys:

(bench (clojure.core/select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 246382440 in 60 samples of 4106374 calls.
             Execution time mean : 243.245536 ns
    Execution time std-deviation : 2.714803 ns
   Execution time lower quantile : 238.473675 ns ( 2.5%)
   Execution time upper quantile : 248.544255 ns (97.5%)
                   Overhead used : 1.845047 ns

With transients:

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 232727220 in 60 samples of 3878787 calls.
             Execution time mean : 256.937568 ns
    Execution time std-deviation : 10.025123 ns
   Execution time lower quantile : 249.951872 ns ( 2.5%)
   Execution time upper quantile : 276.251590 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 25.4503 % Variance is moderately inflated by outliers

With reduce

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 364807860 in 60 samples of 6080131 calls.
             Execution time mean : 161.582833 ns
    Execution time std-deviation : 2.212659 ns
   Execution time lower quantile : 158.027524 ns ( 2.5%)
   Execution time upper quantile : 167.673682 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Reduce + transient

(bench (select-keys {:a "b" :c "d"} [:a]))
Evaluation count : 318075720 in 60 samples of 5301262 calls.
             Execution time mean : 188.656164 ns
    Execution time std-deviation : 3.024952 ns
   Execution time lower quantile : 183.867285 ns ( 2.5%)
   Execution time upper quantile : 195.466784 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 4 (6.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

On bigger map/selection

(bench (clojure.core/select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 56147160 in 60 samples of 935786 calls.
             Execution time mean : 1.104653 µs
    Execution time std-deviation : 36.366516 ns
   Execution time lower quantile : 1.048257 µs ( 2.5%)
   Execution time upper quantile : 1.142031 µs (97.5%)
                   Overhead used : 1.845047 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 4 (6.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 19.0389 % Variance is moderately inflated by outliers

reduce

(bench (select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 67723500 in 60 samples of 1128725 calls.
             Execution time mean : 885.840664 ns
    Execution time std-deviation : 11.503115 ns
   Execution time lower quantile : 864.403495 ns ( 2.5%)
   Execution time upper quantile : 905.721942 ns (97.5%)
                   Overhead used : 1.845047 ns

Transient + reduce

(bench (select-keys {:a "b" :c "d" :b "b" :d "d" :e "e" :f "f" :g "g"} [:a :c :b :d :e :f :g]))
Evaluation count : 134119380 in 60 samples of 2235323 calls.
             Execution time mean : 454.587795 ns
    Execution time std-deviation : 15.681611 ns
   Execution time lower quantile : 439.822498 ns ( 2.5%)
   Execution time upper quantile : 485.797378 ns (97.5%)
                   Overhead used : 1.845047 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 20.6393 % Variance is moderately inflated by outliers

The attached patch is using both transients and reduce

Comment by Alex Miller [ 04/Oct/17 8:54 AM ]

The proposed approach seems good to me. The description needs to reflect what was considered and chosen better.





[CLJ-1765] areduce speed improvements Created: 19/Jun/15  Updated: 12/Oct/15  Resolved: 12/Oct/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Karsten Schmidt Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: ft, performance
Environment:

OSX 10.8.5, Oracle JDK1.8.0_25-b17


Attachments: Text File clj-1765-2.patch     Text File clj-1765.patch    
Patch: Code
Approval: Ok

 Description   

Two performance improvements for areduce:

1. Call alength once, rather than every iteration
2. Use unchecked-inc-int instead of inc since array indices are limited to int range

Example:

(def a (long-array (range 1000)))
(areduce a i ret 0 (+ ret (aget a i)))

Criterium quick-bench:

  • 1.7.0-RC2: 15.5 ms
  • RC2+patch: 7.7 ms

Patch: clj-1765-2.patch
Screened by: Alex Miller



 Comments   
Comment by Karsten Schmidt [ 19/Jun/15 7:08 PM ]

added patch w/ changes described above

Comment by Alex Miller [ 09/Oct/15 8:27 AM ]

Updated -2 patch to put ticket id at beginning of commit message and to widen context lines. No semantic changes, attribution retained.





[CLJ-1762] Implement IKVReduce for java.util.map Created: 18/Jun/15  Updated: 18/Jun/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Chen Guo Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: interop, performance

Attachments: Text File reduce-kv-java-map.patch    
Patch: Code
Approval: Triaged

 Description   

reduce works on Java maps, but reduce-kv does not:

user=> (def clojure-map {1 2 3 4})
#'user/clojure-map
user=> (def java-map (java.util.HashMap. {1 2 3 4}))
#'user/java-map
user=> (reduce (fn [sum [k v]] (+ sum k v)) 0 java-map)
10
user=> (reduce-kv + 0 clojure-map)
10
user=> (reduce-kv + 0 java-map)

IllegalArgumentException No implementation of method: :kv-reduce of protocol: #'clojure.core.protocols/IKVReduce found for class: java.util.HashMap  clojure.core/-cache-protocol-fn\
 (core_deftype.clj:544)

It's trivial to destructure arguments in a regular reduce, but there are performance implications. The following example yields a 7x speed up when run with the implementation of reduce-kv for java.util.Map as implemented in this patch:

user=> (def big-clojure-map (into {} (map #(vector % %) (range 10000))))
#'user/big-clojure-map
user=> (def big-java-map (java.util.HashMap. big-clojure-map))
Reflection warning, /tmp/form-init7130245387362554027.clj:1:19 - call to java.util.HashMap ctor can't be resolved.
#'user/big-java-map
user=> (defn reduce-sum [m] (reduce (fn [sum [k v]] (+ sum k v)) 0 m))
#'user/reduce-sum
user=> (defn reduce-sum-kv [m] (reduce-kv (fn [sum k v] (+ sum k v)) 0 m))
#'user/reduce-sum-kv
user=> (time (dotimes [_ 1000] (reduce-sum big-java-map)))
"Elapsed time: 2624.692113 msecs"
nil
user=> (time (dotimes [_ 1000] (reduce-sum-kv big-java-map)))
"Elapsed time: 376.802454 msecs"
nil





[CLJ-1730] Improve `refer` performance Created: 13/May/15  Updated: 07/Sep/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: ft, performance

Attachments: Text File clj-1730-2.patch     Text File refer-perf.patch    
Patch: Code
Approval: Prescreened

 Description   

refer underlies require, use, and refer-clojure use cases and is not particularly efficient at its primary job of copying symbol/var mapping from one namespace to another.

Approach: Some improvements that can be made:

  • Go directly to the namespace mappings and avoid creating filtered intermediate maps (ns-publics)
  • Use transients to build map of references to refer
  • Instead of cas'ing each new reference individually, build map of all changes, then cas
  • For (:require :only ...) case - instead of walking all referred vars and looking for matches, walk only the included vars and look up each one

There are undoubtedly more dramatic changes (like immutable namespaces) in how all this works that could further improve performance but I tried to make the scope small-ish for this change.

While individual refer timings are greatly reduced (~50% reduction for (refer clojure.core), ~90% reduction for :only use), refer is only a small component of broader require load times so the improvements in practice are modest.

Performance:

expr in a new repl 1.7.0-beta3 1.7.0-beta3+patch
(in-ns 'foo) (clojure.core/refer 'clojure.core) 2.65 ms 0.994 ms
(in-ns 'bar) (clojure.core/refer 'clojure.core :only '[inc dec]) 1.04 ms 0.113 ms
(use 'criterium.core) 0.877 ms 0.762 ms
(require '[clojure.core.async :refer (>!! <!! chan close!)]) 3408 ms 3302 ms

Patch: clj-1730-2.patch

Screening Notes

Patch appears correct but we should consider:

  • non-idiomatic use of if instead of when makes branches hard to read
  • non-idiomatic indentation of if (both branches on one line) hard to read
  • I don't think not found should be an IllegalAccessError, but the original code did this already, so ...
  • the optimistic concurrency loop around the swap will never give up, is this ok?


 Comments   
Comment by Alex Miller [ 07/Sep/17 10:52 AM ]

Patch updated to address first screening comment. I didn't actually find any case of the second one? Give me a pointer.





[CLJ-1724] Reuse call to seq() in LazySeq/hashcode for else case Created: 04/May/15  Updated: 12/Oct/15  Resolved: 12/Oct/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.8

Type: Enhancement Priority: Minor
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: collections, ft, performance

Attachments: File clj-1724.diff    
Patch: Code
Approval: Ok

 Description   

In LazySeq/hashCode, seq() is called twice for non-empty seqs. First call to seq() can be reused in else case.






[CLJ-1695] Variadic vector-of overload has poor performance Created: 03/Apr/15  Updated: 10/Apr/15  Resolved: 10/Apr/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Leon Grapenthin Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File clj-1695-2.patch     Text File clj-1695.patch    
Patch: Code
Approval: Ok

 Description   
(time (do (apply vector-of :double (repeat 100000 0))
                nil))

Times after a few runs ~335 ms.

(time (do (into (vector-of :double) (repeat 100000 0))
                nil))

Times after a few runs ~5 ms.

Cause: The variadic case for vector-of is missing a type hint and uses reflection - this will be seen in any call to vector-of with more than 4 elements.

Approach: Switch interop call to instead use conj - there is no reason to be using interop here over standard conj on a vector. The after time for the first case above is 6-9 ms depending on GC (the fast reduce path in repeat reduces gc variability I think).

Patch: clj-1695-2.patch



 Comments   
Comment by Leon Grapenthin [ 03/Apr/15 4:57 PM ]

I thought seqs were passed pretty much "as is" with apply. E.g.:

(time (do (apply (fn [& args]
(into (vector-of :double) args))
(repeat 100000 0))
nil))

performs as fast as (into (vector-of :double) (repeat 100000 0)

Comment by Alex Miller [ 06/Apr/15 9:51 AM ]

I'm going to see if we can include this in 1.7, no guarantees. For now a workaround is any use of vector-of that creates, then fills the vector separately as you are doing with into.

The into case actually will get an additional benefit in 1.7 for sources that can reduce themselves faster (repeat happens to be one of those).

Comment by Michael Blume [ 06/Apr/15 11:02 AM ]

If we're adding a type-hint anyway, wouldn't it be more effective to type-hint the return value of vector-of?

Comment by Alex Miller [ 06/Apr/15 11:10 AM ]

I went back-and-forth on that but it didn't seem like that any other code could benefit from the return type hint?

Re-thinking, maybe the hint should actually be more generic and hint to clojure.lang.IPersistentCollection instead.

Comment by Michael Blume [ 06/Apr/15 11:15 AM ]

Actually I'm wrong, type-hinting the return of vector-of (as either Vec or IPersistentCollection) doesn't remove the reflection.

Comment by Alex Miller [ 06/Apr/15 11:37 AM ]

Attached new patch with more generic typehint, placed more specifically where it's needed.

Comment by Alex Miller [ 10/Apr/15 9:52 AM ]

Switched interop call to just





[CLJ-1679] Add fast path in seq comparison for structurally sharing seqs Created: 21/Mar/15  Updated: 15/May/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, performance, seq

Attachments: Text File 0001-CLJ-1679-do-pointer-checks-in-seq-equality.patch     Text File CLJ-1679-v2.patch     Text File CLJ-1679-v3.patch    
Patch: Code and Test

 Description   

Currently comparing two non identical seqs requires iterating through both seqs comparing value by value, ignoring the possibility of seq `a` and `b` having the same (pointer-equal) rest.

The proposed patch adds a pointer equality check on the seq tails that can make the equality short-circuit if the test returns true, which is helpful when comparing large (or possibly infinite) seqs that share a common subseq.

After this patch, comparisons like

(let [x (range)] (= x (cons 0 (rest x))))
which currently don't terminate, return true in constant time.

Patch: CLJ-1679-v3.patch



 Comments   
Comment by Michael Blume [ 23/Mar/15 12:52 PM ]

When this test fails (it fails on my master, but I've got a bunch of other development patches, I'm still figuring out where the conflict is), it fails by hanging forever. Maybe it'd be better to check equality in a future and time out the future?

Comment by Michael Blume [ 23/Mar/15 1:01 PM ]

like so =)

Comment by Nicola Mometto [ 23/Mar/15 1:11 PM ]

Makes sense, thanks for the updated patch

Comment by Michael Blume [ 23/Mar/15 1:24 PM ]

Hm, previous patch had a problem where the reporting logic still tries to force the sequence and OOMs, this patch prevents that.

Comment by Michael Blume [ 23/Mar/15 1:41 PM ]

ok, looks like CLJ-1515, CLJ-1603, and this patch, all combine to fail together, though any two of them work fine.

Comment by Michael Blume [ 23/Mar/15 1:43 PM ]

(And really there's nothing wrong with the source of this patch, it still works nicely to short-circuit = where there's structural sharing, it's just that the other two patches break structural sharing for ranges, so the test fails)

Comment by Nicola Mometto [ 23/Mar/15 2:02 PM ]

I see, I guess we'll have to change the test if the patches for those tickets get applied.





[CLJ-1665] take-nth transducer could be faster without rem Created: 20/Feb/15  Updated: 20/Feb/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Steve Miner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance
Environment:

Mac OS X 10.10.2, JDK 1.8.0_31


Attachments: Text File CLJ-1665-faster-take-nth-transducer-without-rem.patch    
Patch: Code
Approval: Triaged

 Description   

The take-nth transducer is calling rem on each index, which is relatively expensive compared to a zero? test. It could just count down from N instead as the step size is fixed.



 Comments   
Comment by Steve Miner [ 20/Feb/15 12:34 PM ]

Patch attached. It's about 25% faster on a simple test like:

(time (transduce (take-nth 13) + (range 1e7)))
Comment by Steve Miner [ 20/Feb/15 12:41 PM ]

I didn't worry about (take-nth 0) case, but my patch does give a different result. The current implementation gets a divide by zero error (from rem). My patched version returns just the first element once. The regular collection version returns an infinite sequence of the first element. I doubt anyone expects a sensible answer from the 0 case so I didn't try to do anything special with it.

Comment by Michael Blume [ 20/Feb/15 12:55 PM ]

Nice =)

I would say that the transducer version really ought to match the collection version as closely as possible, but I don't think there's actually a way to write a transducer that transforms a finite sequence into an infinite sequence, so no luck there.

Maybe while we're at it we should change both the transducer and the collection arities to throw on zero?





[CLJ-1656] Unroll assoc and assoc! for small numbers of arguments Created: 06/Feb/15  Updated: 28/Sep/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6, Release 1.7, Release 1.8
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Tom Crayford Assignee: Unassigned
Resolution: Unresolved Votes: 5
Labels: performance

Attachments: File assoc.diff     Text File assoc-gen-test.patch     Text File CLJ-1656-v1.patch     Text File CLJ-1656-v2.patch     Text File CLJ-1656-v3.patch     Text File CLJ-1656-v4.patch     Text File CLJ-1656-v5.patch     Text File CLJ-1656-v6.patch     Text File CLJ-1656-v7.patch     Text File CLJ-1656-v8.patch     File cpuinfo     File javaversion     File output     File uname    
Patch: Code and Test
Approval: Triaged

 Description   

Whilst doing performance work recently, I discovered that unrolling to single assoc calls were significantly faster than using multiple keys (~10% for my particular application). Zachary Tellman then pointed out that clojure.core doesn't unroll assoc at all, even for the case of relatively low numbers of keys.

We already unroll other performance critical functions that call things via `apply`, e.g. `update` https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L5914, but `assoc` (which is, I think in the critical path for quite a bunch of applications and libraries), would likely benefit from this.

I have not yet developed patches for this, but I did some standalone benchmarking work:

https://github.com/yeller/unrolling-assoc-benchmarks

benchmark results:

code: https://github.com/yeller/unrolling-assoc-benchmarks/blob/master/src/bench_assoc_unrolling.clj

  1 2 3 4
empty array map (not unrolled) 23ns 93ns 156ns 224ns
empty array map (unrolled assoc) N/A 51ns 80ns 110ns
         
20 element persistent hashmap (not unrolled) 190ns 313ns 551ns 651ns
20 element persistent hashmap (unrolled assoc) N/A 250ns 433ns 524ns
         
record (not unrolled) 12ns 72ns 105ns 182ns
record (unrolled assoc) N/A 21ns 28ns 41ns

Each measurement was made in a separate JVM, to avoid JIT path dependence.

Benchmarks were ran on a commodity server (8 cpus, 32gb ram), with ubuntu 12.04 and a recent release of Java 8. Attached are `cpuinfo`, `uname` and `java -version` output.

Relatively standard JVM production flags were enabled, and care was taken to disable leiningen's startup time optimizations (which disable many of the JIT optimizations).

Benchmarks can be run by cloning the repository, and running `script/bench`

There's one outstanding question for this patch: How far should we unroll these calls? `update` (which is unrolled in the 1.7 alphas) is unrolled to 3 arguments. Adding more unrolling isn't difficult, but it does impact the readability of assoc.

Patch: CLJ-1656-v5.patch



 Comments   
Comment by Tom Crayford [ 09/Feb/15 12:01 PM ]

Ok, attached `assoc.diff`, which unrolls this to a single level more than the current code (so supporting two key/value pairs without recursion). The code's going to get pretty complex in the case with more than the unrolled number of keys if we continue on this way, so I'm unsure if this is a good approach, but the performance benefits seem very compelling.

Comment by Michael Blume [ 09/Feb/15 3:35 PM ]

Since the unroll comes out kind of hairy, why not have a macro write it for us?

Comment by Michael Blume [ 09/Feb/15 4:03 PM ]

Patch v2 includes assoc!

Comment by Tom Crayford [ 09/Feb/15 5:01 PM ]

I benchmarked conj with similar unrolling, across a relatively wide range of datatypes from core (lists, sets, vectors, each one empty and then again with 20 elements):

  1 2 3 4
empty vector (not unrolled) 19ns 57ns 114ns 126ns
empty vector (unrolled conj) N/A 44ns 67ns 91ns
         
20 element vector (not unrolled) 27.35ns 69ns 111ns 107ns
20 element vector (unrolled conj) N/A 54ns 79ns 104ns
         
empty list (not unrolled) 7ns 28ns 53ns 51ns
empty list (unrolled conj) N/A 15ns 20ns 26ns
         
twenty element list (not unrolled) 8.9ns 26ns 49ns 49ns
twenty element list (unrolled) N/A 15ns 19ns 30ns
         
empty set (not unrolled) 64ns 170ns 286ns 290ns
empty set (unrolled) N/A 154ns 249ns 350ns
         
twenty element set (not unrolled) 33ns 81ns 132ns 130ns
twenty element set (unrolled) N/A 69ns 108ns 139ns

Benchmarks were run on the same machine as before. There's a less clear benefit here, except for lists, where the overhead of creating seqs and recursing seems to be clearly dominating the cost of actually doing the conj (which makes sense - conj on any element list should be a very cheap operation). Raw benchmark output is here: https://gist.github.com/tcrayford/51a3cd24b8b0a8b7fd74

Comment by Tom Crayford [ 09/Feb/15 5:04 PM ]

Michael Blume: I like those patches! They read far nicer to me than my original patch. Did you check if any of those macro generated methods blew Hotspot's hot code inlining limit? (it's 235 bytecodes). That'd be my only worry with using macros here - it's easy to generate code that defeats the inliner.

Comment by Michael Blume [ 09/Feb/15 5:57 PM ]

Thanks! This is new for me, so I might be doing the wrong thing, but I just ran nodisassemble over both definitions and the "instruction numbers" next to each line go up to 219 for the varargs arity assoc and up to 251 for assoc!, so, assuming I'm looking at the right thing, maybe that one needs to have an arity taken off? If I remove the highest arity I get 232 for varargs which is just under the line.

I guess alternatively we could call assoc! instead of assoc!* in the varargs arity, which removes a lot of code – in that case it's 176 for varargs and 149 for six pairs.

Comment by Michael Blume [ 09/Feb/15 6:01 PM ]

Gah, I forgot to include coll in the varargs call to assoc!

which reminds me that this patch needs tests.

Comment by Michael Blume [ 09/Feb/15 10:27 PM ]

OK, this has some fixes I made after examining the disassembled output. There's a change to the assoc!* macro to be sure it type-hints correctly – I'm honestly not sure why it didn't type-hint properly before, but it does now. Also, I changed the call to assoc! rolling up the first six entries at the top of the varargs version from a macro call to a function call so it'd fit within the 251 inlineable bytecodes. (This, again, is assuming I'm reading the output correctly).

Comment by Tom Crayford [ 10/Feb/15 6:38 AM ]

Michael: Wanna push a branch with these patches to clojars or something? Then I can rerun the benchmarks with the exact code in the patches.

Comment by Michael Blume [ 10/Feb/15 2:36 PM ]

Hmm, not sure I know how to do that – here's a branch on github though https://github.com/MichaelBlume/clojure/tree/unroll-assoc

Comment by Michael Blume [ 12/Feb/15 1:12 PM ]

v5 marks the helper macros private.

Comment by Tom Crayford [ 13/Feb/15 4:11 AM ]

Michael: was that branch just based off clojure/clojure master? I tried running benchmarks off it, but ran into undefined var errors when building this code (which doesn't happen with alpha5):

(Retrieving com/yellerapp/clojure-unrolled-assoc/1.7.0-unrollassoc-SNAPSHOT/clojure-unrolled-assoc-1.7.0-unrollassoc-20150213.092242-1.pom from clojars)
(Retrieving com/yellerapp/clojure-unrolled-assoc/1.7.0-unrollassoc-SNAPSHOT/clojure-unrolled-assoc-1.7.0-unrollassoc-20150213.092242-1.jar from clojars)
(Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.jar from central)
Exception in thread "main" java.lang.RuntimeException: Unable to resolve symbol: bench in this context, compiling:(bench_assoc_unrolling.clj:5)
at clojure.lang.Compiler.analyze(Compiler.java:6235)
at clojure.lang.Compiler.analyze(Compiler.java:6177)
at clojure.lang.Compiler$InvokeExpr.parse(Compiler.java:3452)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:6411)
at clojure.lang.Compiler.analyze(Compiler.java:6216)
at clojure.lang.Compiler.analyze(Compiler.java:6177)
at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:5572)
at clojure.lang.Compiler$FnMethod.parse(Compiler.java:5008)

Comment by Michael Blume [ 23/Feb/15 5:08 PM ]

Ok, how are you building? Why the strange clojure group?

Comment by Michael Blume [ 23/Feb/15 5:09 PM ]

The existing version of assoc does runtime checking that an even number of varargs are passed in, but assoc! does not. Do we want to preserve this behavior or do checks in both?

Comment by Michael Blume [ 23/Feb/15 6:00 PM ]

Also, I'm curious how relevant inlining is here – does HotSpot inlining actually work with Var invocation when there's a getRootBinding step in the way?

Comment by Alex Miller [ 23/Feb/15 7:59 PM ]

Yes, inlining works through var invocation.

Comment by Tom Crayford [ 16/Mar/15 7:05 AM ]

Michael,

That group is just an uploaded version of clojure master with your patches applied, built in just the same way as before (you should be able to check out the repo and replicate).

Comment by Alex Miller [ 29/Apr/15 1:44 PM ]

The patch CLJ-1656-v5.patch doesn't seem to do anything with the old version of assoc (in core.clj around line 179)?

The new one needs to have the arglists and other stuff like that. I'm not sure about the macro/private vars in there either. Did you try leveraging RT.assocN() with a vector?

Are there existing tests in the test suite for assoc with N pairs?

Comment by Michael Blume [ 29/Apr/15 8:46 PM ]

The dependencies in clojure.core were such that assoc needed to be defined well before syntax-quoting, so I just let it be defined twice, once slower, once faster. I'll put up a patch with arglists. Does it need an arglist for every new arity, or are the existing arglists enough? (I'm afraid I'm not 100% solid on what the arglists metadata does) There is an annoying lack of existing tests of assoc. I have a generative test in my tree because that seemed more fun than writing cases for all the different arities. I can post it if it seems useful, it might be overkill though.

Comment by Michael Blume [ 29/Apr/15 9:50 PM ]

Here's the test patch I mentioned, it's even more overkill than I remembered

Comment by Michael Blume [ 29/Apr/15 10:01 PM ]

There, code and test.

This also checks that assoc! is passed an even number of kvs in the varargs case, which is the behavior of assoc. The test verifies that both assoc and assoc! throw for odd arg count.

Comment by Alex Miller [ 29/Apr/15 11:10 PM ]

The existing arglist is fine - it just overrides the generated one for doc purposes.

Did you try any of the RT.assocN() stuff?

I guess another question I have is whether people actually do this enough that it matters?

Comment by Michael Blume [ 28/Sep/16 2:14 PM ]

Updated patch to apply to master





[CLJ-1654] Reuse seq in some Created: 04/Feb/15  Updated: 18/Jan/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Gijs Stuurman Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: ft, performance

Attachments: Text File 0000-reuse-seq-in-some.patch     Text File clj-1654.patch    
Patch: Code
Approval: Prescreened

 Description   

By using when-let at most two seq constructions can be avoided per invocation of some.

Patch: clj-1654.patch

Screened by: Alex Miller



 Comments   
Comment by Ghadi Shayban [ 04/Feb/15 12:11 PM ]

This is similar to the tweak to dorun/doall in CLJ-1515. It is a good benefit when a collection doesn't cache its seq

Comment by Alex Miller [ 12/Jan/16 9:40 AM ]

Updated patch to include ticket number in commit message, no other changes, attribution retained.





[CLJ-1646] Small filter performance enhancement Created: 19/Jan/15  Updated: 11/Apr/15  Resolved: 11/Apr/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Critical
Reporter: Alex Miller Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: performance

Attachments: Text File clj-1646.patch    
Patch: Code
Approval: Triaged

 Description   

I found when working on other tickets for 1.7 that filter repeats each call to .nth on the chunk.
Reusing the result is a small perf boost for all filter uses.

Timing with criterium quick-bench:

What stock patch applied comment
(into [] (filter odd? (range 1024))) 54.3 µs 50.2 µs 7.5% reduction

Approach: storing the nth value as to avoid double lookup.

Patch: clj-1646.patch
Screened by:



 Comments   
Comment by Michael Blume [ 25/Mar/15 5:09 PM ]

This seems have been rolled into the patch for CLJ-1515 – should it be closed?

Comment by Alex Miller [ 25/Mar/15 5:13 PM ]

I'm not sure yet which patch will be applied for 1515. I will close this if it's included.

Comment by Alex Miller [ 11/Apr/15 6:45 AM ]

included in CLJ-1515 patch





[CLJ-1598] Make if forms compile directly to the appropriate branch expression if the test is a literal Created: 24/Nov/14  Updated: 26/May/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: compiler, performance, primitives

Attachments: Text File 0001-if-test-expr-of-an-if-statement-is-a-literal-don-t-e.patch    
Patch: Code
Approval: Triaged

 Description   

This allows expressions like `(cond (some-expr) 1 :else 2)` to be eligible for unboxed use, which is not currently possible since the cond macro always ends up with a nil else branch that the compiler currently takes into account.

With the attached patch, the trailing (if :else 2 nil) in the macroexpansion will be treated as 2 by the Compiler, thus allowing the unboxed usage of the cond expression.






[CLJ-1529] Significantly improve compile time by reducing calls to Class.forName Created: 21/Sep/14  Updated: 14/Nov/14  Resolved: 14/Nov/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Critical
Reporter: Zach Tellman Assignee: Unassigned
Resolution: Completed Votes: 28
Labels: compiler, performance

Attachments: File class-for-name.diff     File clj-1529-no-cache-2.diff     File clj-1529-no-cache.diff     PNG File clj-1529.png     File clj-1529-with-cache.diff     Text File maybe-class-cache-2.patch     Text File maybe-class-cache.patch    
Patch: Code
Approval: Ok

 Description   

Compilation speed has been a real problem for a number of my projects, especially Aleph [1], which in 1.6 takes 18 seconds to load. Recently I realized that Class.forName is being called repeatedly on symbols which are lexically bound. Hits on Class.forName are cached, but misses are allowed to go through each time, which translates into tens of thousands of calls after calling `(use 'aleph.http)`.

Proposed: Avoid calling Class.forName() on non-namespaced symbols that do not contain "." or start with "[", don't map to a Class in the ns, and are names in the current local env. Also, adjust the ordering of checks in tagToClass to check for hints before checking for class.

[Note that the latest variant of the patch moves the check from the original patch farther down in the logic to avoid changing the semantics. This still yields virtually all of the performance gains. See comments for details.]

Patch: clj-1529-no-cache-2.diff

Screened by: Stu Halloway. Note that for this change the patch ended up being so small it is easier follow the code than the prose description.

[1] https://github.com/ztellman/aleph



 Comments   
Comment by Ghadi Shayban [ 21/Sep/14 4:30 PM ]

One of our larger projects (not macro-laden) just went from 36 seconds to 23 seconds to start with this patch.

Comment by Ramsey Nasser [ 03/Oct/14 12:34 PM ]

I ported this patch to Clojure-CLR for the Unity integration project and we have seen significant speedups as well. I too agree that this is the behavior I expect as a user.

Comment by Alex Miller [ 06/Oct/14 12:19 PM ]

I ran this on a variety of open-source projects. I didn't find that it produced any unexpected behavior or test errors. Most projects were about 10% faster to run equivalent of "lein test" with a few as high as 35% faster.

Comment by Alex Miller [ 07/Oct/14 12:52 PM ]

We're interested in comparing this and the class caching in fastload branch to get something in for 1.7. Next step is to extract a patch of the stuff in fastload so we can compare them better.

Comment by Alex Miller [ 07/Oct/14 4:06 PM ]

Add maybe class cache patch from fastload branch

Comment by Alex Miller [ 08/Oct/14 8:57 AM ]

Times below to run "time lein test" on a variety of projects with columns:

  • master = current 1.7.0 master
  • maybe-cache = maybe-class-cache.patch extracted from Rich's fastload branch
  • class-for-name = class-for-name.diff from Zach
  • % maybe-cache = % improvement for maybe-cache over master
  • % class-for-name = % improvement for class-for-name patch over master (sorted desc)

project,master,maybe-cache,class-for-name,% maybe-cache,% class-for-name
aleph,25.605,16.572,14.460,35.278,43.527
riemann,40.550,27.656,24.734,31.798,39.004
lamina,37.247,30.072,29.045,19.263,22.021
trapperkeeper,11.979,11.158,10.3,6.854,14.016
plumbing,73.777,68.388,66.922,7.304,9.292
cheshire,5.583,5.089,5.086,8.848,8.902
tools.analyzer,5.411,5.289,5.023,2.255,7.171
core.async,19.161,18.090,17.942,5.589,6.362
tools.reader,4.686,4.435,4.401,5.356,6.082
clara-rules,43.964,42.140,41.542,4.149,5.509
core.typed,158.885,154.954,151.445,2.474,4.683
instaparse,9.286,8.922,8.859,3.920,4.598
schema,45.3,43.914,43.498,3.060,3.978
mandoline,76.295,74.831,74.425,1.919,2.451

The summary is that both patches improve times on all projects. In most cases, the improvement from either is <10% but the first few projects have greater improvements. The class-for-name patch has a bigger improvement in all projects than the maybe-cache patch (but maybe-cache has no change in semantics).

Comment by Nicola Mometto [ 08/Oct/14 9:03 AM ]

Are the two patches mutually exclusive?

Comment by Alex Miller [ 08/Oct/14 9:35 AM ]

They are non-over-lapping. I have not considered whether they could both be applied or whether that makes any sense.

Comment by Alex Miller [ 08/Oct/14 9:53 AM ]

The two patches both essentially cut off the same hot code path, just at different points (class-for-name is earlier), so applying them both effectively should give you about the performance of class-for-name.

Comment by Alex Miller [ 08/Oct/14 2:14 PM ]

Added a picture of the data for easier consumption.

Comment by Deepak Giridharagopal [ 10/Oct/14 4:35 PM ]

One of our bigger projects saw a reduction of startup time of 16% with class-for-name, 14% with maybe-cache, and a whopping 23% with both patches applied. This was actually starting up the program, as opposed to running "lein test", FWIW.

Maybe it's worth re-running the benchmarks with a "both-patches" variant?

Comment by Alex Miller [ 10/Oct/14 5:28 PM ]

Hey Deepak, I did actually run some of them with both patches and saw times similar to class-for-name.

Were your times consistent across restarts? The times in the data above are the best of 3 trials for every data point (although they were relatively consistent).

Comment by Deepak Giridharagopal [ 10/Oct/14 6:08 PM ]

Hi Alex, the tests I ran did 20-iteration loops, and I took the mean (though it was pretty consistent between restarts). I can redo stuff and upload the raw data for you if that will help.

Comment by Deepak Giridharagopal [ 10/Oct/14 6:43 PM ]

So repeating the experiment several times does in fact behave as you suspected...apologies for my previous LOLDATA.

Comment by Alex Miller [ 24/Oct/14 3:01 PM ]

maybe-class-cache-2.patch removes some debugging stuff

Comment by Alex Miller [ 27/Oct/14 4:41 PM ]

I've done more testing and made mods to both patches and moved them closer together.

On the maybe-class-cache patch (new version = clj-1529-with-cache.diff):
1) I found that adding a final else branch that returned null was an improvement - this avoids caching things that will never hit in the future (Cons, PersistentList, Symbols with namespaces, etc). That's both a perf improvement (avoids hashing those things) and a memory improvement.
2) The tagToClass improvement from Zach's patch is orthogonal and also valid here so I added it.
3) I added Zach's check, but moved the placement lower so that it doesn't alter semantics. It helps avoid caching locals that aren't classes.

On the class-for-name patch (new version = clj-1529-no-cache.diff):
1) Same change as #3 above - moved check lower to avoid semantics change.

With these changes, both patches have tagToClass and local checks, neither patch changes semantics, and the only difference is whether to keep or remove the cache.

aleph timings (for "lein test"):

  • 1.7 master = 25.415 s
  • 1.7 + clj-1529-with-cache.diff = 14.329 s
  • 1.7 + clj-1529-no-cache.diff = 14.808 s

lamina timings (for "lein test"):

  • 1.7 master = 37.340 s
  • 1.7 + clj-1529-with-cache.diff = 28.680 s
  • 1.7 + clj-1529-no-cache.diff = 28.759 s

The cache helps slightly in both cases, but it does not seem worth adding the new dynamic var and unbounded memory use inherent in the cache.

Comment by Alex Miller [ 05/Nov/14 11:40 AM ]

Talked to Rich, focusing on no-cache patch. Added new version that fixes tabbing and restores Zach's name to the patch, which seems appropriate.





[CLJ-1517] Unrolled small vectors Created: 01/Sep/14  Updated: 18/May/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Backlog

Type: Enhancement Priority: Critical
Reporter: Zach Tellman Assignee: Unassigned
Resolution: Unresolved Votes: 22
Labels: collections, performance

Attachments: File unrolled-collections-2.diff     File unrolled-collections.diff     Text File unrolled-vector-2.patch     Text File unrolled-vector.patch    
Patch: Code
Approval: Incomplete

 Description   

As discussed on the mailing list [1], this patch has two unrolled variants of vectors and maps, with special inner classes for each cardinality. Currently both grow to six elements before spilling over into the general versions of the data structures, which is based on rough testing but can be easily changed. At Rich's request, I haven't included any integration into the rest of the code, and there are top-level static create() methods for each.

The sole reason for this patch is performance, both in terms of creating data structures and performing operations on them. This can be seen as a more verbose version of the trick currently played with PersistentArrayMap spilling over into PersistentHashMap. Based on the benchmarks, which can be run by cloning cambrian-collections [2] and running 'lein test :benchmark', this should supplant PersistentArrayMap. Performance is at least on par with PAM, and often much faster. Especially noteworthy is the creation time, which is 5x faster for maps of all sizes (lein test :only cambrian-collections.map-test/benchmark-construction), and on par for 3-vectors, but 20x faster for 5-vectors. There are similar benefits for hash and equality calculations, as well as calls to reduce().

This is a big patch (over 5k lines), and will be kind of a pain to review. My assumption of correctness is based on the use of collection-check, and the fact that the underlying approach is very simple. I'm happy to provide a high-level description of the approach taken, though, if that will help the review process.

I'm hoping to get this into 1.7, so please let me know if there's anything I can do to help accomplish that.

[1] https://groups.google.com/forum/#!topic/clojure-dev/pDhYoELjrcs
[2] https://github.com/ztellman/cambrian-collections

Patch: unrolled-vector-2.patch

Screener Notes: The approach is clear and understandable. Given the volume of generated code, I believe that best way to improve confidence in this code is to get people using it asap, and add collection-test [3] to the Clojure test suite. I would also like to get the generator [4] included in the Clojure repo. We don't need to necessarily automate running it, but would be nice to have it nearby if we want to tweak something later.

[3] https://github.com/ztellman/collection-check/blob/master/src/collection_check.clj
[4] https://github.com/ztellman/cambrian-collections/blob/master/generate/cambrian_collections/vector.clj



 Comments   
Comment by Zach Tellman [ 01/Sep/14 10:13 PM ]

Oh, I forgot to mention that I didn't make a PersistentUnrolledSet, since the existing wrappers can use the unrolled map implementation. However, it would be moderately faster and more memory efficient to have one, so let me know if it seems worthwhile.

Comment by Nicola Mometto [ 02/Sep/14 5:23 AM ]

Zach, the patch you added isn't in the correct format, they need to be created using `git format-patch`

Comment by Nicola Mometto [ 02/Sep/14 5:31 AM ]

Also, I'm not sure if this is on-scope with the ticket but those patches break with *print-dup*, as it expects a static create(x) method for each inner class.

I'd suggest adding a create(Map x) static method for the inner PersistentUnrolledMap classes and a create(ISeq x) one for the inner PersistentUnrolledVector classes

Comment by Alex Miller [ 02/Sep/14 8:14 AM ]

Re making patches, see: http://dev.clojure.org/display/community/Developing+Patches

Comment by Jozef Wagner [ 02/Sep/14 9:16 AM ]

I wonder what is the overhead of having meta and 2 hash fields in the class. Have you considered a version where the hash is computed on the fly and where you have two sets of collections, one with meta field and one without, using former when the actual metadata is attached to the collection?

Comment by Zach Tellman [ 02/Sep/14 12:13 PM ]

I've attached a patch using the proper method. Somehow I missed the detailed explanation for how to do this, sorry. I know the guidelines say not to delete previous patches, but since the first one isn't useful I've deleted it to minimize confusion.

I did the print-dup friendly create methods, and then realized that once these are properly integrated, 'pr' will just emit these as vectors. I'm fairly sure the create methods aren't necessary, so I've commented them out, but I'm happy to add them back in if they're useful for some reason I can't see.

I haven't given a lot of thought to memory efficiency, but I think caching the hashes are worthwhile. I can see an argument for creating a "with-meta" version of each collection, but since that would double the size of an already enormous patch, I think that should probably wait.

Comment by Zach Tellman [ 03/Sep/14 4:31 PM ]

I found a bug! Like PersistentArrayMap, I have a special code path for comparing keywords, but my generators for collection-check were previously using only integer keys. There was an off-by-one error in the transient map implementation [1], which was not present for non-keyword lookups.

I've taken a close look for other gaps in my test coverage, and can't find any. I don't think this substantively changes the risk of this patch (an updated version of which has been uploaded as 'unrolled-collections-2.diff'), but obviously where there's one bug, there may be others.

[1] https://github.com/ztellman/cambrian-collections/commit/eb7dfe6d12e6774512dbab22a148202052442c6d#diff-4bf78dbf5b453f84ed59795a3bffe5fcR559

Comment by Zach Tellman [ 03/Oct/14 2:34 PM ]

As an additional data point, I swapped out the data structures in the Cheshire JSON library. On the "no keyword-fn decode" benchmark, the current implementation takes 6us, with the unrolled data structures takes 4us, and with no data structures (just lexing the JSON via Jackson) takes 2us. Other benchmarks had similar results. So at least in this scenario, it halves the overhead.

Benchmarks can be run by cloning https://github.com/dakrone/cheshire, unrolled collections can be tested by using the 'unrolled-collections' branch. The pure lexing benchmark can be reproduced by messing around with the cheshire.parse namespace a bit.

Comment by Zach Tellman [ 06/Oct/14 1:31 PM ]

Is there no way to get this into 1.7? It's an awfully big win to push off for another year.

Comment by Alex Miller [ 07/Oct/14 2:08 PM ]

Hey Zach, it's definitely considered important but we have decided to drop almost everything not fully done for 1.7. Timeframe for following release is unknown, but certainly expected to be significantly less than a year.

Comment by John Szakmeister [ 30/Oct/14 2:53 PM ]

You are all free to determine the time table, but I thought I'd point out that Zach is not entirely off-base. Clojure 1.4.0 was released April 5th, 2012. Clojure 1.5.0 was released March 1st, 2013 with 1.6.0 showing up March 25th, 2014. So it appears that the current cadence is around a year.

Comment by Alex Miller [ 30/Oct/14 3:40 PM ]

John, there is no point to comments like this. Let's please keep issue comments focused on the issue.

Comment by Zach Tellman [ 13/Nov/14 12:23 PM ]

I did a small write-up on this patch which should help in the eventual code review: http://blog.factual.com/using-clojure-to-generate-java-to-reimplement-clojure

Comment by Zach Tellman [ 07/Dec/14 10:34 PM ]

Per my conversation with Alex at the Conj, here's a patch that only contains the unrolled vectors, and uses the more efficient constructor for PersistentVector when spilling over.

Comment by Alex Miller [ 08/Dec/14 1:10 PM ]

Zach, I created a new placeholder for the map work at http://dev.clojure.org/jira/browse/CLJ-1610.

Comment by Jean Niklas L'orange [ 09/Dec/14 1:52 PM ]

It should probably be noted that core.rrb-vector will break for small vectors by this patch, as it peeks into the underlying structure. This will also break other libraries which peeks into the vector implementation internals, although I'm not aware of any other – certainly not any other contrib library.

Also, two comments on unrolled-vector.patch:

private transient boolean edit = true;
in the Transient class should probably be
private volatile boolean edit = true;
as transient means something entirely different in Java.

conj in the Transient implementation could invalidate itself without any problems (edit = false;) if it is converted into a TransientVector (i.e. spills over) – unless it has a notable overhead. The invalidation can prevent some subtle bugs related to erroneous transient usage.

Comment by Alex Miller [ 09/Dec/14 1:58 PM ]

Jean - understanding the scope of the impact will certainly be part of the integration process for this patch. I appreciate the heads-up. While we try to minimize breakage for things like this, it may be unavoidable for libraries that rely on implementation internals.

Comment by Michał Marczyk [ 09/Dec/14 2:03 PM ]

I'll add support for unrolled vectors to core.rrb-vector the moment they land on master. (Probably with some conditional compilation so as not to break compatibility with earlier versions of Clojure – we'll see when the time comes.)

Comment by Michał Marczyk [ 09/Dec/14 2:06 PM ]

I should say that it'd be possible to add generic support for any "vector lookalikes" by pouring them into regular vectors in linear time. At first glance it seems to me that that'd be out of line with the basic promise of the library, but I'll give it some more thought before the changes actually land.

Comment by Zach Tellman [ 09/Dec/14 5:43 PM ]

Somewhat predictably, the day after I cut the previous patch, someone found an issue [1]. In short, my use of the ArrayChunk wrapper applied the offset twice.

This was not caught by collection-check, which has been updated to catch this particular failure. It was, however, uncovered by Michael Blume's attempts to merge the change into Clojure, which tripped a bunch of alarms in Clojure's test suite. My own attempt to do the same to "prove" that it worked was before I added in the chunked seq functionality, hence this issue persisting until now.

As always, there may be more issues lurking. I hope we can get as many eyeballs on the code between now and 1.8 as possible.

[1] https://github.com/ztellman/cambrian-collections/commit/2e70bbd14640b312db77590d8224e6ed0f535b43
[2] https://github.com/MichaelBlume/clojure/tree/test-vector

Comment by Zach Tellman [ 10/Jul/15 1:54 PM ]

As a companion to the performance analysis in the unrolled map issue, I've run the benchmarks and posted the results at https://gist.github.com/ztellman/10e8959501fb666dc35e. Some notable results:

Comment by Alex Miller [ 13/Jul/15 9:02 AM ]

Stu: I do not think this patch should be marked "screened" until the actual integration and build work (if the generator is integrated) has been completed.

Comment by Alex Miller [ 14/Jul/15 4:33 PM ]

FYI, we have "reset" all big features for 1.8 for the moment (except the socket repl work). We may still include it - that determination will be made later.

Comment by Zach Tellman [ 14/Jul/15 4:43 PM ]

Okay, any idea when the determination will be made? I was excited that we seemed to be finally moving forward on this.

Comment by Alex Miller [ 14/Jul/15 4:51 PM ]

No, but it is now on my work list.

Comment by Rich Hickey [ 15/Jul/15 8:17 AM ]

I wonder if all of the overriding of APersistentVector yields important benefits - e.g. iterator, hashcode etc.

Comment by Zach Tellman [ 15/Jul/15 11:51 AM ]

In the case of hashcode, definitely: https://gist.github.com/ztellman/10e8959501fb666dc35e#file-gistfile1-txt-L1013-L1076. This was actually one of the original things I wanted to speed up.

In the case of the iterator, probably not. I'd be fine removing that.

Comment by Zach Tellman [ 16/Jul/15 5:17 PM ]

So am I to infer from https://github.com/clojure/clojure/commit/36d665793b43f62cfd22354aced4c6892088abd6 that this issue is defunct? If so, I think there's a lot of improvements being left on the table for no particular reason.

Comment by Rich Hickey [ 16/Jul/15 6:34 PM ]

Yes, that commit covers this functionality. It takes a different approach from the patch in building up from a small core, and maximizing improvements to the bases rather than having a lot of redundant definitions per class. That also allowed for immediate integration without as much concern for correctness, as there is little new code. It also emphasizes the use case for tuples, e.g. small vectors used as values that won't be changed, thus de-emphasizing the 'mutable' functions. I disagree that many necessary improvements are being left out. The patch 'optimized' many things that don't matter. Further, there are not big improvements to the pervasive inlining. In addition, the commit includes the integration work at a fraction of the size of the patch. In all, it would have taken much more back and forth to get the patch to conform with this approach than to just get it all done, but I appreciate the inspiration and instigation - thanks!

Comment by Rich Hickey [ 16/Jul/15 6:46 PM ]

That said, this commit need not be the last word - it can serve as a baseline for further optimization. But I'd rather that be driven by need. Clojure could become 10x as large optimizing things that don't matter.

Comment by Zach Tellman [ 19/Jul/15 1:36 PM ]

What is our reference for "relevant" performance? I (or anyone else) can provide microbenchmarks for calculating hashes or whatever else, but that doesn't prove that it's an important improvement. I previously provided benchmarks for JSON decoding in Cheshire, but that's just one of many possible real-world benchmarks. It might be useful to have an agreed-upon list of benchmarks that we can use when debating what is and isn't useful.

Comment by Mike Anderson [ 19/Jul/15 11:14 PM ]

I was interested in this implementation so created a branch that integrates Zach's unrolled vectors on top of clojure 1.8.0-alpha2. I also simplified some of the code (I don't think the metadata handling or unrolled seqs are worthwhile, for example)

Github branch: https://github.com/mikera/clojure/tree/clj-1517

Then I ran a set of micro-benchmarks created by Peter Taoussanis

Results: https://gist.github.com/mikera/72a739c84dd52fa3b6d6

My findings from this testing:

  • Performance is comparable (within +/- 20%) on the majority of tests
  • Zach's approach is noticeably faster (by 70-150%) for 4 operations (reduce, mapv, into, equality)

My view is that these additional optimisations are worthwhile. In particular, I think that reduce and into are very important operations. I also care about mapv quite a lot for core.matrix (It's fundamental to many numerical operations on arrays implemented using Clojure vectors).

Happy to create a patch along these lines if it would be acceptable.

Comment by Zach Tellman [ 19/Jul/15 11:45 PM ]

The `reduce` improvements are likely due to the unrolled reduce and kvreduce impls, but the others are probably because of the unrolled transient implementation. The extra code required to add these would be pretty modest.

Comment by Mike Anderson [ 20/Jul/15 9:20 PM ]

I actually condensed the code down to a single implementation for `Transient` and `TupleSeq`. I don't think these really need to be fully unrolled for each Tuple type. That helps by making the code even smaller (and probably also helps performance, given JVM inline caching etc.)

Comment by Peter Taoussanis [ 21/Jul/15 11:46 AM ]

Hey folks,

Silly question: is there actually a particular set of reference benchmarks that everyone's currently using to test the work on tuples? It took me a while to notice how bad the variance was with my own set of micro benchmarks.

Bumping all the run counts up till the noise starts ~dying down, I'm actually seeing numbers now that don't seem to agree with others here .

Google Docs link: https://docs.google.com/spreadsheets/d/1QHY3lehVF-aKrlOwDQfyDO5SLkGeb_uaj85NZ7tnuL0/edit?usp=sharing
gist with the benchmarks: https://gist.github.com/ptaoussanis/0a294809bc9075b6b02d

Thanks, cheers!

Comment by Zach Tellman [ 21/Jul/15 6:52 PM ]

Hey Peter, I can't reproduce your results, and some of them are so far off what I'd expect that I have to think there was some data gathering error. For instance, the assoc operation being slower is kind of inexplicable, considering the unrolled version doesn't do any copying, etc. Also, all of your numbers are significantly slower than the ones on my 4 year old laptop, which is also a bit strange.

Just to make sure that we're comparing apples to apples, I've adapted your benchmarks into something that pretty-prints the mean runtime and variance for 1.7, 1.8-alpha2, and Mike's 1517 fork. It can be found at https://github.com/ztellman/tuple-benchmark, and the results of a run at https://gist.github.com/ztellman/3701d965228fb9eda084.

Comment by Mike Anderson [ 22/Jul/15 2:24 AM ]

Hey Zach just looked at your benchmarks and they are definitely more consistent with what I am seeing. The overall nanosecond timings look about right from my experience with similar code (e.g. working with small vectors in vectorz-clj).

Comment by Peter Taoussanis [ 22/Jul/15 2:41 AM ]

Hi Zach, thanks for that!

Have updated the results -
Gist: https://gist.github.com/ptaoussanis/0a294809bc9075b6b02d
Google docs: https://goo.gl/khgT83

Note that I've added an extra sheet/tab to the Google doc for your own numbers at https://gist.github.com/ztellman/3701d965228fb9eda084.

Am still struggling to produce results that show any consistent+significant post-JIT benefit to either of the tuple implementations against the micro benchmarks and one larger small-vec-heavy system I had handy.

It's looking to me like it's maybe possible that the JIT's actually optimising away most of the non-tuple inefficiencies in practice?

Of course it's very possible that my results are off, or my expectations wrong. The numbers have been difficult to pin down.

It definitely helped to have a standardised reference micro benchmark to work against (https://github.com/ztellman/tuple-benchmark). Could I perhaps suggest a similar reference macro benchmark (maybe something from core.matrix, Mike?)

Might also be a good idea to define a worthwhile target performance delta for ops like these that run in the nanosecond range (or for the larger reference macro benchmark)?

Just some thoughts from someone passing through in case they're at all useful; know many of you have been deeply involved in this for some time so please feel free to ignore any input that's not helpful

Appreciate all the efforts, cheers!

Comment by Rich Hickey [ 22/Jul/15 9:24 AM ]

I think everyone should back off on their enthusiasm for this approach. After much analysis, I am seeing definite negative impacts to tuples, especially the multiple class approach proposed by Zach. What happens in real code is that the many tuple classes cause call sites that see different sized vectors to become megamorphic, and nothing gets adequately optimized. In particular, call sites that will see tuple-sized and large vectors (i.e. a lot of library code) will optimize differently depending upon which they see often first. So, if you run your first tight loop on vector code that sees tuples, that code could later do much worse (50-100%) on large vectors than before the tuples patch was in place. Being much slower on large collections is a poor tradeoff for being slightly faster on small ones.

Certainly different tuple classes for arity 0-6 is a dead idea. You get as good or better optimization (at some cost in size) from a single class e.g. one with four fields, covering sizes 0-4. I have a working version of this in a local branch. It is better in that sites that include pvectors are only bi-morphic, but I am still somewhat skittish given what I've seen.

The other takeaway is that the micro benchmarks are nearly worthless for detecting these issues.

Comment by Zach Tellman [ 22/Jul/15 11:07 AM ]

I'm pretty sure that all of my real-world applications of the tuples (via clj-tuple) have been fixed cardinality, and wouldn't have surfaced any such issue. Thanks for putting the idea through its paces.

Comment by Mike Anderson [ 22/Jul/15 10:37 PM ]

Rich these are good insights - do you have a benchmark that you are using as representative of real world code?

I agree that it is great if we can avoid call sites becoming megamorphic, though I also believe the ship has sailed on that one already when you consider the multiple types of IPersistentVector that already exist (MapEntry, PersistentVector, SubVector plus any library-defined IPersistentVector instances such as clojure.core.rrb-vector). As a consequence, the JVM is usually not going to be able to prove that a specific IPersistentVector interface invocation is monomorphic, which is when the really big optimisations happen.

In most of the real world code that I've been working with, the same size/type of vector gets used repeatedly (Examples: iterating over map entries, working with a sequence of size-N vectors), so in such cases we should be able to rely on the polymorphic inline cache doing the right thing.

The idea of a single Tuple class for sizes 0-4 is interesting, though I can't help thinking that a lot of the performance gain from this may stem from the fact that a lot of code does stuff like (reduce conj [] .....) or the transient equivalent which is a particularly bad use case for Tuples, at least from the call site caching perspective. There may be a better way to optimise such cases rather than simply trying to make Tuples faster.... e.g. calling asTransient() on a Tuple0 could perhaps switch straight into the PersistentVector implementation.

Comment by Colin Fleming [ 18/May/16 8:32 PM ]

Here's a relevant issue from Google's Guava project, where they also found serious performance degradation with fixed-size collections: https://github.com/google/guava/issues/1268. It has a lot of interesting detail about how the performance breaks down.





[CLJ-1493] Fast keyword intern Created: 06/Aug/14  Updated: 14/Aug/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: dennis zhuang Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: keywords, performance
Environment:

Mac OS X 10.9.4 / 2.6 GHz Intel Core i5 / 8 GB 1600 MHz DDR3
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Attachments: File fast_keyword_intern.diff    
Patch: Code
Approval: Triaged

 Description   

Keyword's intern(Symbol) method uses recursive invocation to get a valid keyword instance.I think it can be rewrite into a 'for loop'
to reduce method invocation cost.
So i developed this patch, and make some simple benchmark.Run the following command line three times after 'ant jar':

java -Xms64m -Xmx64m -cp test:clojure.jar clojure.main -e "(time (dotimes [n 10000000] (keyword (str n))))"

Before patched:

"Elapsed time: 27343.827 msecs"
"Elapsed time: 26172.653 msecs"
"Elapsed time: 25673.764 msecs"

After patched:

"Elapsed time: 24884.142 msecs"
"Elapsed time: 23933.423 msecs"
"Elapsed time: 25382.783 msecs"

It looks the patch make keyword's intern a little more fast.

The patch is attached and test.

Thanks.

P.S. I've signed the contributor agreement, and my email is killme2008@gmail.com .



 Comments   
Comment by Alex Miller [ 07/Aug/14 9:01 AM ]

Looks intriguing (and would be a nice change imo). I ran this on a json parsing benchmark I used for the keyword changes and saw ~3% improvement.

Comment by dennis zhuang [ 07/Aug/14 9:54 PM ]

Updated the patch, remove the 'k == null' clause in for loop,it's not necessary.

Comment by Andy Fingerhut [ 11/Aug/14 1:29 AM ]

Dennis, while JIRA can handle multiple patches with the same name, it can be confusing for people discussing the patches, and for some scripts I have to evaluate them. Please consider giving the patches different names (e.g. with version numbers in them), or removing older ones if they are obsolete.

Comment by dennis zhuang [ 11/Aug/14 9:19 AM ]

Hi,andy

Thank you for reminding me.I deleted the old patch.

Comment by dennis zhuang [ 11/Sep/14 10:34 AM ]

I am glad to see it is helpful.I benchmark the patch with current master branch,it's fine too.

Comment by dennis zhuang [ 14/Aug/15 9:12 AM ]

Is this patch can be merged? Or is it rejected?

Comment by Alex Miller [ 14/Aug/15 9:41 AM ]

As a minor enhancement, this patch has not yet been high enough priority to be considered yet.

Comment by dennis zhuang [ 14/Aug/15 11:31 AM ]

All right.Hope to merge it.Thanks.





[CLJ-1469] Emit KeywordInvoke callsites only when keyword is not namespaced Created: 18/Jul/14  Updated: 15/May/17

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Ghadi Shayban Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: performance

Attachments: Text File kwinvoke.patch    
Patch: Code
Approval: Triaged

 Description   

Summary: Don't emit KeywordLookup thunks and machinery for namespaced keyword access

Description: When the compiler sees a keyword at the beginning of a sexpr, (:foo x), it emits some machinery that takes into account that 'x' could be a defrecord with a defined 'foo' field. This exists to fast-path it into a field lookup. Here is the supporting code from the target defrecord: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core_deftype.clj#L185-L198
The compiler currently emits the same machinery for (:foo/bar x), a namespaced keyword access, but defrecords don't have any fast path field access for that. This trivial patch turns that scenario into a normal invocation.

Here is the disassembly for (fn [x] (:foo/bar x))
https://gist.github.com/anonymous/d94fc56fba4a1665f73f

There are two static fields on the IFn also for every kw access.

With the trivial patch, it turns into a normal invoke. (emit the fn aka the namespaced keyword, then the args Aka the target, and call IFn invoke: kw.invoke(target))






[CLJ-1458] Enhance the performance of map merges Created: 04/Jul/14  Updated: 14/Dec/16

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Yongqian Li Assignee: Unassigned
Resolution: Unresolved Votes: 8
Labels: performance

Attachments: Text File 0001-very-simple-test-of-the-merge-function.patch     Text File clj-1458-4.patch     Text File CLJ-1458-5.patch     Text File CLJ-1458-6.patch     Text File CLJ-1458-7.patch     Text File CLJ-1458-transient-merge2.patch     Text File CLJ-1458-transient-merge3.patch     Text File CLJ-1458-transient-merge.patch     Text File merge-test-2.patch     File transient-merge.diff    
Patch: Code and Test
Approval: Triaged

 Description   

It would be nice if merge used transients.

Patch

  • clj-1458-7.patch

Approach
Migrate c.c/merge later in core after transients & reduce. Leave older version as merge1 for use in cases the precede the newer definition. Make APersistentMap/conj & ATransientMap/cons aware of IKVReduce.

The attached patch preserves two existing behaviors of merge

  • metadata propagation
  • the right hand side of the merges can be a Map.Entry, an IPersistentVector where size=2, and regular maps.

Screened by:



 Comments   
Comment by Jason Wolfe [ 13/Sep/14 5:09 PM ]

I will take a crack at a patch today.

Comment by Jason Wolfe [ 13/Sep/14 5:42 PM ]

This patch (transient-merge.diff) makes merge, merge-with, and zipmap (since it was right there and could obviously benefit from transients as well) use transients.

Three potential issues:

  • I had to move the functions, since they depend on transient and friends. I assume this is preferable to a forward declaration. This was the best place I could find, but happy to move them elsewhere.
  • I added multiple arities, to avoid potential performance cost of transient-ing a single argument. Happy to undo this if desired.
  • I had to slightly alter the logic in merge-with, since transient maps don't support contains? (or find).
Comment by Michał Marczyk [ 14/Sep/14 12:43 PM ]

I posted a separate ticket for zipmap, with patch, on 30/May/12: CLJ-1005.

Comment by Jason Wolfe [ 14/Sep/14 5:28 PM ]

Ah, sorry if I overstepped then. Happy to remove that change from this patch then if that will simplify things – just let me know.

Comment by Ghadi Shayban [ 28/Dec/14 10:07 PM ]

alternate approach attached delaying merge until after protocols load, and then using transducers.

Comment by Michael Blume [ 28/Dec/14 11:50 PM ]

Looks like you're doing (get m k) twice – shouldn't that be thrown in a local?

Comment by Michael Blume [ 29/Dec/14 1:41 PM ]

um, put, in a local, I mean, 'throw' was a bad choice of word.

Comment by Ghadi Shayban [ 29/Dec/14 2:14 PM ]

Yeah there's that – won't be using get anyways after CLJ-700 gets committed.

We should add performance tests too. merging two maps, three, many maps, also varying the sizes of the maps, and for merge-with, varying the % of collisions.

Need to go back to the (some identity) logic, otherwise metadata is propagated from maps other than the first provided. I'll fix later.

Comment by Michael Blume [ 29/Dec/14 2:49 PM ]

I don't know if this is supposed to be allowed, but this breaks

(merge {} [:foo 'bar])

which is used in the wild by compojure-api

Comment by Michael Blume [ 29/Dec/14 2:49 PM ]

https://github.com/metosin/compojure-api/blob/0.16.6/src/compojure/api/meta.clj#L198

Comment by Michael Blume [ 29/Dec/14 2:54 PM ]

Ghadi, contains? uses get under the covers, so it's still two gets, right? It seems like it'd be more performant to stick with the ::none trick.

Comment by Nicola Mometto [ 29/Dec/14 5:36 PM ]

This calls for if-let + find.

Comment by Ghadi Shayban [ 29/Dec/14 10:37 PM ]

new patch addressing concerns so far

Comment by Ghadi Shayban [ 29/Dec/14 10:48 PM ]

CLJ-1458-transient-merge3.patch removes silly inlining macro, uses singleton fns instead.

Comment by Michael Blume [ 29/Dec/14 11:14 PM ]

Nice =)

This should come with tests. If we want to preserve the ability to merge with a MapEntry, we should test it. This isn't so much a weakness of the patch as of the existing tests. I see merge and merge-with being used a few times in the test suite, but I see no test whose purpose is to test their behavior.

Comment by Michael Blume [ 29/Dec/14 11:17 PM ]

Extremely simple merge test, we need more than this, but this is a start

Comment by Alex Miller [ 22/Jun/15 10:11 AM ]

clj-1458-4.patch refreshed to apply to master, no changes.

Comment by Ghadi Shayban [ 09/Jan/16 5:09 PM ]

I'd like to reevaluate the scope of this ticket. Can we address 'merge' only and defer 'merge-with'? It's by far the more common function. I've attached a new simplified patch.

Comment by Ghadi Shayban [ 09/Jan/16 9:50 PM ]

CLJ-1458-6.patch is yet another cleaner approach

Comment by Alex Miller [ 01/Feb/16 5:17 AM ]

Can you update the ticket approach section to discuss the APersistentMap.cons / ASSOC changes. Also, can you add a before / after perf test for one or more common cases?

Comment by Michael Blume [ 28/Sep/16 1:55 PM ]

Updated patch to handle use of merge in core_print before it's defined in core

Comment by Ghadi Shayban [ 28/Sep/16 2:22 PM ]

If anyone wants to take stewardship of this, go ahead. I had trouble getting consistent performance improvements on this. Obviously this needs fresh benchmarks.

Comment by Alex Miller [ 28/Sep/16 2:28 PM ]

Yes, this needs a benchmark showing demonstrable improvement. The whole goal here is improved perf - if we can't prove it's consistently faster, then there is no point in even reviewing it.





[CLJ-1439] Reduce keyword cache lookup cost Created: 05/Jun/14  Updated: 07/Oct/14  Resolved: 01/Aug/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: keywords, performance

Attachments: Text File 0001-Improve-Keyword.intern-performance.patch    
Patch: Code
Approval: Ok

 Description   

Background: Symbol is composed of name and namespace strings. Symbol construction interns both of these strings - this reduces memory usage and allows for string == checks inside Symbol. Keywords wrap a Symbol and have an additional cache to reuse Keyword instances.

Problem: Certain applications make heavy use of keywords (in particular the case of parsing or transforming JSON, XML, or other data into Clojure maps with keyword keys). Constructing the same keyword from a string over and over again will cause the string to be interned, a symbol constructed, and the lookup to occur in the keyword cache. In the case where the keyword already exists, this is more work than is necessary, making this path slower than it can be.

Reproduce: The following test simulates rounds of creating many keywords - the unique? flag indicates whether to use new or the same set of keywords each rep. unique?=false should be more similar to parsing a similar JSON record format over and over.

(set! *unchecked-math* true)

(defn kw-new [n unique?]
  (let [base (if unique? (str (rand)) "abcdef")]
    (loop [i 0
           kws (transient [])]
      (if (< i n)
        (recur (inc i) (conj! kws (keyword (str base i))))
        (persistent! kws)))))

(defn bench-kw [reps n unique?]
  (dotimes [_ reps]
    (let [begin (System/nanoTime)]
        (kw-new n unique?)
        (let [end (System/nanoTime)
              elapsed (/ (- end begin) 1000000.0)]
          (println elapsed "ms")))))

(bench-kw 50 10000 false)  ;; expected similar to JSON use case
(bench-kw 50 10000 true)   ;; for comparison

On 1.6, we see about 5.5 ms for repeated and 134 ms for unique after warmup.
With the patch, we see about 2.2 ms for repeated and 120 ms for unique after warmup.

Cause: Keyword construction based on a string involves:

  • Interning string(s) in new kw
  • Constructing Symbol with interned strings
  • Clearing Keywords from the Keyword cache if GC has reclaimed them
  • Constructing a new Keyword
  • Wrapping the Keyword in a WeakReference
  • CHM putIfAbsent on the cache
  • If new, return. If exists, get the old one and return.
  • In the event the Keyword is reclaimed by GC between the last 2 steps, retry.

This process involves a fair amount of speculative interning and object creation if the keyword already exist.

Proposal: Streamline the keyword construction process by reworking the cache implementation and the Keyword.intern() process. The patch changes the cache to key by string name instead of symbol, deferring interning and symbol creation on lookup to when we know the keyword construction is needed. The various Keyword.intern() methods are also reworked to take advantage if called with an existing Symbol to avoid re-creating it.

Patch: 0001-Improve-Keyword.intern-performance.patch

Related: CLJ-1415



 Comments   
Comment by Alex Miller [ 01/Aug/14 11:48 AM ]

Alternate changes were committed today to improve both symbol and keyword creation times.

https://github.com/clojure/clojure/commit/c9e70649d2652baf13b498c4c3ebb070118c4573

Comment by Alex Miller [ 07/Oct/14 4:42 PM ]

related patch was applied





[CLJ-1437] Keyword cache lookup could be faster Created: 05/Jun/14  Updated: 05/Jun/14  Resolved: 05/Jun/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: keywords, performance


 Description   

Breaking this piece out from CLJ-1415. Keyword cache lookup includes symbol creation and interning which could be avoided, making cache lookup faster.






[CLJ-1430] Improve performance of partial Created: 23/May/14  Updated: 05/Sep/14  Resolved: 29/Aug/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Timothy Baldridge Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance

Attachments: File partial-perf.diff    
Patch: Code
Approval: Ok

 Description   

This patch improves performance of partial by only using apply when needed. The code structure follows that of juxt.

Performance benchmark:

(ns partial-test.core
  (:require [criterium.core :refer [bench]])
  (:gen-class))

(defn -main []
  (let [f (partial + 1 1)]
    (println "Starting")
    (bench (f 1 1))
    (println "Done")))

Results for 1.6.0:

Evaluation count : 228751140 in 60 samples of 3812519 calls.
             Execution time mean : 266.700063 ns
    Execution time std-deviation : 2.966851 ns
   Execution time lower quantile : 262.641023 ns ( 2.5%)
   Execution time upper quantile : 274.207916 ns (97.5%)
                   Overhead used : 1.610513 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Results for 1.7.0 with this patch:

 Evaluation count : 348208140 in 60 samples of 5803469 calls.
              Execution time mean : 171.210533 ns
     Execution time std-deviation : 2.011660 ns
    Execution time lower quantile : 168.819526 ns ( 2.5%)
    Execution time upper quantile : 176.015584 ns (97.5%)
                    Overhead used : 2.644128 ns

 Found 3 outliers in 60 samples (5.0000 %)
 	low-severe	 3 (5.0000 %)
  Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Benchmarks performed via lein uberjar + running via the commandline.

Patch: partial-perf.diff

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 23/May/14 10:46 AM ]

Screened, looks as expected.

Comment by Andy Fingerhut [ 02/Jun/14 10:50 AM ]

Timothy, just a nit that I would not have noticed except for my program that checks for name and email address of patch authors, to see if they are on my contributor's list, but do you really have both of the email addresses tbaldridge@gmail.com and tbaldidge@gmail.com (note the spelling difference)? The latter is the one on this patch.

Comment by Timothy Baldridge [ 02/Jun/14 11:04 AM ]

fixed email

Comment by Timothy Baldridge [ 02/Jun/14 11:05 AM ]

nice catch! it was a typeo in my .gitconfig defaults. I've fixed the patch.

Comment by Alex Miller [ 02/Jun/14 11:19 AM ]

Tim (and anyone really) - please let someone know if you need to change a screened patch! Looks fine here, but screener should be notified so they can re-screen.

Comment by Alex Baranosky [ 05/Sep/14 9:11 PM ]

Very nice patch. I've gotten into the habit of not using partial anymore for performance sensitive code. Perhaps this change means I need to rethink that.





[CLJ-1429] Cache unknown multimethod value default dispatch Created: 22/May/14  Updated: 29/Aug/14  Resolved: 29/Aug/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Completed Votes: 2
Labels: performance

Attachments: Text File clj-1429.patch    
Patch: Code
Approval: Ok

 Description   

Multimethods maintain a cache from dispatch value (result of the dispatch function) to dispatch method. If the dispatch value does not find a match in the available methods, it falls through to a lookup using the default dispatch value and returns that method. This default dispatch case is NOT recorded in the cache. This means that every case that falls through to the default case incurs a scan of the methodTable (and the class inheritance checks that involves).

Perf test:

(defmulti mm class)
(defmethod mm String [s] s)
(defmethod mm Long [l] l)
(defmethod mm :default [v] v)

(defn perf [reps size]
  (let [data (take size (cycle ["abc" 5 :k]))]
    (dotimes [_ reps]
      (time (doall (map mm data))))))

And results:

;; Without patch:
user=> (perf 5 100000)
"Elapsed time: 1301.262 msecs"
"Elapsed time: 928.888 msecs"
"Elapsed time: 942.905 msecs"
"Elapsed time: 858.513 msecs"
"Elapsed time: 832.314 msecs"

;; With patch:
user=> (perf 5 100000)
"Elapsed time: 134.169 msecs"
"Elapsed time: 28.859 msecs"
"Elapsed time: 45.452 msecs"
"Elapsed time: 13.189 msecs"
"Elapsed time: 13.42 msecs"

Attached patch caches the mapping of unknown value -> default dispatch method and significantly improves the performance for this case.

Patch: clj-1429.patch
Screened by: Stu






[CLJ-1423] Applying a var to an infinite arglist consumes all available memory Created: 15/May/14  Updated: 19/Aug/16  Resolved: 19/Aug/16

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.9

Type: Defect Priority: Major
Reporter: Alan Malloy Assignee: Unassigned
Resolution: Completed Votes: 4
Labels: performance

Attachments: Text File apply-var.patch     Text File clj-1423.patch    
Patch: Code and Test
Approval: Ok

 Description   

It is possible to apply a function to an infinite argument list: for example, (apply distinct? (repeat 1)) immediately returns false, after realizing just a few elements of the infinite sequence (repeat 1). However, (apply #'distinct? (repeat 1)) attempts to realize all of (repeat 1) into memory at once.

This happens because Var.applyTo delegates to AFn.applyToHelper to decide which arity of Var.invoke to dispatch to; but AFn doesn't expect infinite arglists (mostly those use RestFn). So it uses RT.seqToArray, which doesn't work well in this case.

Instead, Var.applyTo(args) can just dispatch to deref().applyTo(args), and let the function being stored figure out what to do with the arglist.

I've changed Var.applyTo to do this, and added a test (which fails before my patch is applied, and passes afterwards).

Patch: clj-1423.patch

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 18/Jan/16 5:29 PM ]

I also did a quick perf test on this:

(dotimes [x 20] (time (dotimes [y 10000] (apply #'+ (range 100)))))

which showed times ~82 ms per rep before the patch and ~10 ms per rep after the patch (on par with applying the function directly).

I added a new patch that squashes the commits, adds the ticket number to the commit message, attribution was retained.





[CLJ-1422] Recur around try boxes primitives Created: 14/May/14  Updated: 28/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: compiler, performance, typehints


 Description   

Primitive function and recur variables can't pass through a (try) cleanly; they're boxed to Object instead. This causes reflection warnings for fns or loops that use primitive types.

user=> (set! *warn-on-reflection* true)
true
 
user=> (fn [] (loop [t 0] (recur t)))
#<user$eval676$fn__677 user$eval676$fn__677@3d80023a>
 
user=> (fn [] (loop [t 0] (recur (try t))))
NO_SOURCE_FILE:1 recur arg for primitive local: t is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: t
#<user$eval680$fn__681 user$eval680$fn__681@5419323a>

user=> (fn [^long x] (recur (try x)))
NO_SOURCE_FILE:1 recur arg for primitive local: x is not matching primitive, had: Object, needed: long

CompilerException java.lang.IllegalArgumentException:  recur arg for primitive local: x is not matching primitive, had: Object, needed: long, compiling:(NO_SOURCE_PATH:1:1)


 Comments   
Comment by David James [ 15/Jun/14 10:27 PM ]

Without commenting on the most desirable behavior, the following code does not cause reflection warnings:

user=> (set! *warn-on-reflection* true)
true
user=> (fn [] (loop [t 0] (recur (long (try t)))))
#<user$eval673$fn__674 user$eval673$fn__674@4e56c411>
Comment by Nicola Mometto [ 16/Jun/14 6:33 AM ]

Similar ticket http://dev.clojure.org/jira/browse/CLJ-701

Comment by Kevin Downey [ 21/Jul/14 6:59 PM ]

try/catch in the compiler only implements Expr, not MaybePrimitiveExpr, looking at extending TryExpr with MaybePrimitiveExpr it seems simple enough, but it turns out recur analyzes it's arguments in the statement context, which causes (try ...) to essentially wrap itself in a function like ((fn [] (try ...))), at which point it is an invokeexpr which is much harder to add maybeprimitiveexpr too and it reduces to the same case as CLJ-701

Comment by Kevin Downey [ 22/Jul/14 9:27 PM ]

http://dev.clojure.org/jira/browse/CLJ-701 has a patch that I think solves this

Comment by Alex Miller [ 28/Jul/14 1:56 PM ]

Should I dupe this to CLJ-701?

Comment by Kevin Downey [ 28/Jul/14 5:22 PM ]

if you want the fixes for try out of the return context to be part of CLJ-701 then yes it is a dupe, if you are unsure or would prefer 701 to stay more focused (my patch may not be acceptable, or may be too large and doing too much) then no it wouldn't be a dupe. I sort of took it on myself to solve both in the patch on CLJ-701 because I came to CLJ-701 via Nicola's comment here, and the same compiler machinery can be used for both.

I think the status is pending on the status of CLJ-701.





[CLJ-1420] ThreadLocalRandom instead of Math/random Created: 11/May/14  Updated: 29/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Backlog

Type: Enhancement Priority: Minor
Reporter: Linus Ericsson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: math, performance
Environment:

Requires Java >=1.7!


Attachments: Text File 0001-rand-using-ThreadLocalRandom-and-tests-for-random.patch    
Patch: Code
Approval: Vetted

 Description   

The standard Math.random() is thread-safe through being declared as a synchronized static method.

The patch uses java.util.concurrent.ThreadLocalRandom which actually seems to be two times faster than the ordinary Math.random() in a simple single threaded criterium.core/bench:

The reason I investigated the function at all was to be sure random-number generation was not a bottleneck when performance testing multithreaded load generation.

If necessary, one could of course make a conditional declaration (like in fj-reducers) based on the existence of the class java.util.concurrent.ThreadLocalRandom, if Clojure 1.7 is to be compatible with Java versions < 1.7



 Comments   
Comment by Linus Ericsson [ 11/May/14 11:05 AM ]

Benchmark on current rand (clojure 1.6.0), $ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.13.10.1)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)

:jvm-opts ^:replace [] (ie no arguments to the JVM)

(bench (rand 10))
Evaluation count : 1281673680 in 60 samples of 21361228 calls.
Execution time mean : 43.630075 ns
Execution time std-deviation : 0.420801 ns
Execution time lower quantile : 42.823363 ns ( 2.5%)
Execution time upper quantile : 44.456267 ns (97.5%)
Overhead used : 3.194591 ns

Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

Clojure 1.7.0-master-SNAPSHOT.

(bench (rand 10))
Evaluation count : 2622694860 in 60 samples of 43711581 calls.
Execution time mean : 20.474605 ns
Execution time std-deviation : 0.248034 ns
Execution time lower quantile : 20.129894 ns ( 2.5%)
Execution time upper quantile : 21.009303 ns (97.5%)
Overhead used : 2.827337 ns

Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

I had similar results on Clojure 1.6.0, and ran several different tests with similar results. java.util.Random.nextInt is suprisingly bad. The ThreadLocalRandom version of .nextInt is better, but rand-int can take negative integers, which would lead to some argument conversion for (.nextInt (ThreadLocalRandom/current) n) since it need upper and lower bounds instead of a simple multiplication of a random number [0,1).

CHANGE:

The (.nextDouble (ThreadLocalRandom/current) argument) is very quick, but cannot handle negative arguments. The speed given a plain multiplication is about 30 ns.

Comment by Linus Ericsson [ 11/May/14 12:44 PM ]

Added some simplistic tests to be sure that rand and rand-int accepts ratios, doubles and negative numbers of various kinds. A real test would likely include repeated generative testing, these tests are mostly for knowing that various arguments works etc.

Comment by Linus Ericsson [ 11/May/14 1:38 PM ]

0001-rand-using-ThreadLocalRandom-and-tests-for-random.patch contains the changed (rand) AND the test cases.

Comment by Alex Miller [ 11/May/14 5:45 PM ]

Clojure requires Java 1.6.0 so this will need to be reconsidered at a later date. We do not currently have any plans to bump the minimum required JDK in Clojure 1.7 although that could change of course.

Comment by Gary Fredericks [ 11/May/14 5:54 PM ]

I've always thought that the randomness features in general are of limited utility due to the inability to seed the PRNG, and that a clojure.core/rand dynamic var would be a reasonable way to do that.

Maybe both of these problems could be partially solved with a standard library? I started one at https://github.com/fredericksgary/four, but presumably a contrib library would be easier for everybody to standardize on.

Comment by Linus Ericsson [ 12/May/14 2:17 AM ]

Gary, I'm all for creating some well-thought out random-library, which could be a candidate for some library clojure.core.random if that would be useful.

Please have a look at http://code.google.com/p/javarng/ since that seems to do what you library four does (and more). Probably we could salvage either APIs, algorithms or both from this library.

I'll contact you via mail!

Comment by Gary Fredericks [ 20/Jun/14 10:21 AM ]

Come to think of it, a rand var in clojure.core shouldn't be a breaking change, so I'll just make a ticket for that to see how it goes. That should at the very least allow solving the concurrency issue with binding. The only objection I can think of is perf issues with dynamic vars?

Comment by Gary Fredericks [ 20/Jun/14 10:42 AM ]

New issue is at CLJ-1452.

Comment by Andy Fingerhut [ 29/Aug/14 4:50 PM ]

Patch 0001-rand-using-ThreadLocalRandom-and-tests-for-random.patch dated May 11 2014 no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. It did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch. See section "Updating Stale Patches" on this wiki page for some tips on updating patches: http://dev.clojure.org/display/community/Developing+Patches





[CLJ-1415] Keyword cache cleanup incurs linear scan of cache Created: 06/May/14  Updated: 08/Oct/15  Resolved: 01/Aug/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Alex Miller
Resolution: Not Reproducible Votes: 6
Labels: keywords, performance

Attachments: File faster-keywords.diff     File keyword-cache.diff     Text File kw-clean-future.patch     File unified-kw-patch.diff    
Patch: Code
Approval: Vetted

 Description   

If the GC reclaims a keyword, any subsequent attempt to create a keyword requires an O(n) scan over the entire keyword table via Util.clearCache. This is a significant performance cost in keyword-heavy operations; e.g. JSON parsing.

Patch: keyword-cache.diff - patch to defer cleaning till portion of the table is dead and avoid multiple threads cleaning simultaneously.

Patch: kw-clean-future.patch - patch to spin cache cleaning into a future. Found that 1) this introduces some ordering constraints and circularity between Agent and Keyword (fixable) and 2) using the future pool for this means shutdown-agents would always need to be called (in the patch I avoided this by changing agent's soloExecutor to use daemon threads.

Patch: unified-kw-patch.diff - Alternative to keyword-cache and clean-future.patch. Combines all keyword-perf changes, including the EDN kw parser improvement, improved table lookup performance, and has threads cooperate to empty the table refqueue with a minimum of contention.



 Comments   
Comment by Alex Miller [ 06/May/14 5:53 PM ]

Any perf-related ticket will need some clear before/after timings (with good methodology and how to repro) and also a consideration of cases where the change may introduce any perf degradation in normal usage.

Comment by Kyle Kingsbury [ 07/May/14 9:54 PM ]

I've experimented with a patch reducing the cache clearing cost and removing the need for String.intern. Preliminary results are good, but I want to try a few alternative approaches for cache keys. For instance, could we use pure strings like "foo" and "clojure.core/foo" as the cache keys, removing a level of memory indirection? If we're being really sneaky, we could share those same strings with the Symbol _str field to halve our memory use, assuming it's OK to reach in and mutate it.

https://gist.github.com/aphyr/f72e72992dade4578232
http://imgur.com/a/YSgUa#2

Comment by Alex Miller [ 08/May/14 12:29 PM ]

Great start on this - having the perf data is hugely important. One thing I don't see you've covered yet is what the corresponding memory increase you're incurring with CacheKey to get the benefit - we need to quantify both sides of the tradeoff here (latency/throughput vs memory) to fully judge.

Questions/comments on your patch...

1) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L101 - do we need the (o instanceof CacheKey) check? If the usage of this is constrained then we might be able to skip it (and it will blow up on the next line if something is wrong).

2) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L110 - shouldn't we precompute and save the hash code!? The only thing we're making this for is fast hash comparisons. That hash computation is string length dependent - it ain't cheap.

3) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L126 - have you tested with other values here? Sho