<< Back to previous view

[UNIFY-2] Remove reflection warnings Created: 05/Jan/12  Updated: 05/Jan/12  Resolved: 05/Jan/12

Status: Resolved
Project: core.unify
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Fogus Assignee: Fogus
Resolution: Completed Votes: 0
Labels: performance, reflection, unify

Approval: Ok

 Description   
Reflection warning, clojure/core/unify.clj:30 - reference to field getClass can't be resolved.
Reflection warning, clojure/core/unify.clj:30 - reference to field isArray can't be resolved.

These reflective calls occur frequently enough that they should be resolved.



 Comments   
Comment by Fogus [ 05/Jan/12 7:37 AM ]

Fixed in 6b6d1130bf857439d1863f6fc574a7a6541b84b8.





[TBENCH-16] Add performance testing for STM Created: 23/Feb/12  Updated: 23/Feb/12

Status: Open
Project: test.benchmark
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Stefan Kamphausen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, test
Environment:

64bit Linux, Quad-core


Attachments: File stm-perf-TBENCH-16.diff    

 Description   

Write a test-suite to detect regressions in STM performance.



 Comments   
Comment by Stefan Kamphausen [ 23/Feb/12 2:38 PM ]

Patch which adds an stm-namespace containing a collection of performance test functions.





[LOGIC-50] Rel relation PersistentHashSet becomes LazySeq after issuing a retraction Created: 01/Sep/12  Updated: 28/Jul/13  Resolved: 05/Sep/12

Status: Closed
Project: core.logic
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Aaron Brooks Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File core.logic-rel-001.patch     Text File core.logic-rel-002.patch    
Patch: Code

 Description   

The first retraction of facts from a relation transforms the internal PersistentHashSet -set var+atom, which guarantees uniqueness, into a LazySeq which allows duplicate facts to be added. Once the first retraction occurs, a LazySeq persists for the rest of the life of the relation. Only the value of the primary -set var+atom is affected, not the -index var+atom values.

This issue is not an external correctness issue but does affect performance of subsequent duplicate fact additions which grow the size relation.

Preparation:

user=> (defrel foo x y)
#<user$eval4287$fn__4288 user$eval4287$fn__4288@1a9d489b>
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@52aaf223: #{[:joe :man] [:jane :woman] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet
user=> (retraction foo :jane :woman)
nil

Without patch applied:

user=> foo_2-set
#<Atom@52aaf223: ([:joe :man] [:sue :boy])>
user=> (class @foo_2-set)
clojure.lang.LazySeq
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@52aaf223: ([:sue :boy] [:jane :woman] [:joe :man] [:joe :man] [:sue :boy])>
user=> (class @foo_2-set)
clojure.lang.LazySeq

With patch applied:

user=> foo_2-set
#<Atom@31eb9b15: #{[:joe :man] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet
user=> (facts foo [[:joe :man][:jane :woman][:sue :boy]])
nil
user=> foo_2-set
#<Atom@31eb9b15: #{[:joe :man] [:jane :woman] [:sue :boy]}>
user=> (class @foo_2-set)
clojure.lang.PersistentHashSet

I've filed this as a Minor issue as it does not affect core.logic correctness and degraded performance can be avoided by not re-asserting duplicate facts.

I will also issue a GitHub pull request which can be used or ignored at your convenience.



 Comments   
Comment by David Nolen [ 02/Sep/12 5:45 PM ]

Thanks! Is this meant to be applied to master?

Comment by Aaron Brooks [ 03/Sep/12 6:56 PM ]

(cf. here and GitHub I'll keep this thread on JIRA.) Yes, this is targeted for master. I don't know if this warrants a release unto itself.

Comment by Aaron Brooks [ 04/Sep/12 9:17 AM ]

I forgot to mention that I needed to add "src/test/clojure" to the project.clj :source-paths vector to get lein1/2's test command to run the tests. Is that expected? Is there another way to run the tests that I'm missing? I'm happy to file a separate ticket/patch to address this if project.clj needs to be modified.

Comment by David Nolen [ 04/Sep/12 9:25 AM ]

Thanks for the information. Will apply to master. I actually run tests with "mvn test", I've updated :source-paths so other people can run the tests with lein.

Comment by David Nolen [ 04/Sep/12 9:26 AM ]

I tried to apply the patch but I'm getting a "Patch format detection failed".

Comment by Aaron Brooks [ 04/Sep/12 4:16 PM ]

The attached patch should work fine with git-apply (in either "cat foo.patch |git apply" or "git apply foo.patch" form). I made the GitHub pull request as I figured that would be the easiest path to pull the changes in.

Comment by David Nolen [ 04/Sep/12 6:04 PM ]

Yes the patch is not properly formatted and we're not supposed to take pull requests. The patch is missing attribution info. I normally apply patches with "git am foo.patch". I used the following guide for figuring out how to generate patches that can be applied with "git am", http://ariejan.net/2009/10/26/how-to-create-and-apply-a-patch-with-git.

Comment by Aaron Brooks [ 05/Sep/12 9:25 AM ]

My apologies for the extra run-around here. I've attached an -002 version of the patch created with git-format-patch which should be amenable to git-am.

Comment by David Nolen [ 05/Sep/12 11:32 AM ]

fixed, http://github.com/clojure/core.logic/commit/9bc6eb42be28bfd2b493657344f6eea8f5ed657c. pushing out a 0.8.0-alpha3 release as well.





[JDBC-31] distinct? throws clojure.lang.ArityException, when applied with no arguments Created: 12/May/12  Updated: 10/Jun/12  Resolved: 10/Jun/12

Status: Resolved
Project: java.jdbc
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jürgen Hötzel Assignee: Sean Corfield
Resolution: Completed Votes: 0
Labels: patch,, performance

Attachments: Text File 0001-add-make-cols-unique-test-case-empty-cols.patch     Text File 0002-distinct-throws-clojure.lang.ArityException-when-app.patch    
Patch: Code and Test

 Description   

HSQLDB returns an empty ResultSet when using (.getGeneratedKeys stmt)
and no keys are generated. So this Exception is thrown for each record
without generated keys.

While this Exception is caught in do-prepared-return-keys, this can lead to a huge overhead caused by the JVM exception handling. I did a performance test.

Before Patch:

clojure.java.test-jdbc> (time (sql/with-connection hsqldb-db (count (apply sql/insert-records :dummy  (map #(hash-map :name (str %) :id %) (range 10000))))))
"Elapsed time: 3429.346743 msecs"
10000

After Patch:

 clojure.java.test-jdbc> (time (sql/with-connection hsqldb-db (count (apply sql/insert-records :dummy  (map #(hash-map :name (str %) :id %) (range 10000))))))
"Elapsed time: 1397.444753 msecs"


 Comments   
Comment by Sean Corfield [ 10/Jun/12 5:24 PM ]

Thanx for the patch!





[JDBC-29] Performance improvement (Remove intermediate lazy sequence in resultset-seq) Created: 10/May/12  Updated: 10/May/12  Resolved: 10/May/12

Status: Resolved
Project: java.jdbc
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Jürgen Hötzel Assignee: Sean Corfield
Resolution: Completed Votes: 0
Labels: patch,, performance

Attachments: Text File 0001-Don-t-create-intermediate-lazy-sequences-of-vectors.patch    

 Description   

This improves performance on large result sets by up to 30%.



 Comments   
Comment by Sean Corfield [ 10/May/12 12:09 PM ]

Nice catch!





[CLJ-1402] sort-by calls keyfn more times than is necessary Created: 11/Apr/14  Updated: 11/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Steve Kim Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance


 Description   

clojure.core/sort-by evaluates keyfn for every pairwise comparison. This is wasteful when keyfn is expensive to compute.

user=> (def keyfn-calls (atom 0))
#'user/keyfn-calls
user=> (defn keyfn [x] (do (swap! keyfn-calls inc) x))
#'user/keyfn
user=> @keyfn-calls
0
user=> (sort-by keyfn (repeatedly 10 rand))
(0.1647483850582695 0.2836687590331822 0.3222305842748623 0.3850390922996001 0.41965440953966326 0.4777580378736771 0.6051704988802923 0.659376178201709 0.8459820304223701 0.938863131161208)
user=> @keyfn-calls
44


 Comments   
Comment by Steve Kim [ 11/Apr/14 11:46 AM ]

CLJ-99 is a similar issue





[CLJ-1384] clojure.core/set should use transients Created: 15/Mar/14  Updated: 11/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Gary Fredericks Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: performance

Attachments: Text File CLJ-1384-p1.patch     File set-bench.tar    
Patch: Code
Approval: Triaged

 Description   

clojure.core/vec calls (more or less) PersistentVector.create(...), which uses a transient vector to build up the result.

clojure.core/set on the other hand, calls PersistentHashSet.create(...), which repeatedly calls .cons on a PersistentHashSet, with all the associated speed/GC issues.

Operation count now w/transients
set 5 1.771924 µs 1.295637 µs
into 5 1.407925 µs 1.402995 µs
set 1000000 2.499264 s 1.196653 s
into 1000000 0.977555 s 1.006951 s


 Comments   
Comment by Gary Fredericks [ 15/Mar/14 10:13 PM ]

PersistentHashSet has six methods for creating sets – one for each combination of {with check, without check} and {array (varargs), List, ISeq}. Each of them does not use transients but could.

I believe clojure.core/set only depends on the (without check, ISeq) version.

Should all of these be changed? Three of them? One of them?

Comment by Andy Fingerhut [ 15/Mar/14 10:21 PM ]

I believe that the 'with check' versions are only intended to be used when reading set literals in Clojure source code, and give an error if there are duplicate elements. If you find examples where those set creation functions are called in other situations, I would be interested to learn about them to find out where my misunderstanding lies, or whether that is a problem with the current code.

If the belief above is correct, I would suggest not changing the 'with check' versions, since their speed isn't as critical.

Comment by Gary Fredericks [ 15/Mar/14 10:23 PM ]

Thanks Andy, I'll submit a patch that changes the three non-checked methods.

Comment by Gary Fredericks [ 15/Mar/14 10:46 PM ]

Attached CLJ-1384-p1.patch, which updates the three non-check create methods.

I also added benchmarks. It's about 2-3 times faster for large collections.

Comment by Ambrose Bonnaire-Sergeant [ 11/Apr/14 11:15 AM ]

Added benchmark suite (set-bench.tar).

FWIW results are similar to gfrederick's on my machine:

Clojure 1.6

Small collections (5 elements)

set averages 1.220601 µs
into averages 1.597991 µs

Large collections (1,000,000 elements)

set averages 2.429066 sec
into averages 1.006249 sec

After transients

Small collections (5 elements)

set averages 999.248325 ns
into averages 1.162889 µs

Large collections (1,000,000 elements)

set averages 1.003792 sec
into averages 889.993185 ms

Add full output to the tar.

Comment by Ghadi Shayban [ 11/Apr/14 11:35 AM ]

CLJ-1192 is related to this, but and Andy seems to be indicating the use of reduce as the means to better performance there.

Comment by Gary Fredericks [ 11/Apr/14 11:41 AM ]

Oh that's a good point about reduce. The difference should only apply to chunked seqs, right? It's worth noting that the benchmarks above used range which creates chunked seqs, so that might be why into looks faster on the large collections?

So this change only makes set act like vec; I think whether either/both of them should use reduce is a different question.





[CLJ-1376] Initialize internal maps to more efficient version Created: 11/Mar/14  Updated: 11/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance


 Description   

In reviewing some hashing stuff, I noticed that there are many places internal to Clojure that use maps initialized with PersistentHashMap.EMPTY. Many of these maps are likely to have a small number of entries such that a PersistentArrayMap might be more efficient.

These are the candidates:

src/jvm/clojure/lang/ARef.java
19:private volatile IPersistentMap watches = PersistentHashMap.EMPTY;

src/jvm/clojure/lang/Compiler.java
3009:				IPersistentMap m = PersistentHashMap.EMPTY;
3819:					       KEYWORDS, PersistentHashMap.EMPTY,
3820:					       VARS, PersistentHashMap.EMPTY,
3964:	IPersistentMap closes = PersistentHashMap.EMPTY;
3977:	IPersistentMap keywords = PersistentHashMap.EMPTY;
3978:	IPersistentMap vars = PersistentHashMap.EMPTY;
5121:                            ,CLEAR_SITES, PersistentHashMap.EMPTY
7259:			       KEYWORDS, PersistentHashMap.EMPTY,
7260:			       VARS, PersistentHashMap.EMPTY
7418:			IPersistentMap opts = PersistentHashMap.EMPTY;
7475:			IPersistentMap fmap = PersistentHashMap.EMPTY;
7522:					       KEYWORDS, PersistentHashMap.EMPTY,
7523:					       VARS, PersistentHashMap.EMPTY,
7912:                            ,CLEAR_SITES, PersistentHashMap.EMPTY

src/jvm/clojure/lang/LispReader.java
755:					RT.map(GENSYM_ENV, PersistentHashMap.EMPTY));

src/jvm/clojure/lang/MultiFn.java
39:	this.methodTable = PersistentHashMap.EMPTY;
41:	this.preferTable = PersistentHashMap.EMPTY;
49:		methodTable = methodCache = preferTable = PersistentHashMap.EMPTY;

src/jvm/clojure/lang/Var.java
48:	final static Frame TOP = new Frame(PersistentHashMap.EMPTY, null);
175:	setMeta(PersistentHashMap.EMPTY);
341:	IPersistentMap ret = PersistentHashMap.EMPTY;

Approach: Two possible approaches - initialize to PersistentArrayMap.EMPTY or call RT.map(). The latter requires function invocation so is slightly slower, but has the benefit of localizing map construction into a single place.






[CLJ-1373] LazySeq should utilize cached hash from its underlying seq. Created: 09/Mar/14  Updated: 20/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, performance
Environment:

1.6.0 master SNAPSHOT


Attachments: File clj-1373.diff    
Patch: Code
Approval: Triaged

 Description   

Even if underlying seq contains a cached hash, LazySeq computes it every time.

user=> (def a (range 100000))
#'user/a
user=> (time (hash a))
"Elapsed time: 46.904273 msecs"
375952610
user=> (time (hash a)) ;; RECOMPUTE
"Elapsed time: 10.879098 msecs"
375952610
user=> (def b (seq a))
#'user/b
user=> (time (hash b))
"Elapsed time: 10.572522 msecs"
375952610
user=> (time (hash b)) ;; CACHED HASH
"Elapsed time: 0.024927 msecs"
375952610
user=> (def c (lazy-seq b))
#'user/c
user=> (time (hash c))
"Elapsed time: 12.207651 msecs"
375952610
user=> (time (hash c))
"Elapsed time: 10.995798 msecs"
375952610


 Comments   
Comment by Jozef Wagner [ 09/Mar/14 9:20 AM ]

Added patch which checks if underlying seq implements IHashEq and if yes, uses that hash instead of recomputing.





[CLJ-1340] Emit unboxed cohercions from int/long to float/double Created: 05/Feb/14  Updated: 05/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: enhancement, performance

Attachments: File primitive-cohercion.diff    
Patch: Code

 Description   

Currently when an int or long is used where a float or double is expected, boxed conversion happens instead of emitting [IL]2[FD] instructions.






[CLJ-1334] Improve performance of the bean function with caching Created: 30/Jan/14  Updated: 31/Jan/14  Resolved: 31/Jan/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Ron Pressler Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: performance


 Description   

The bean function is a very useful Java interop feature that provides a read-only view of a Java Bean as a Clojure map.
As it stands, the function performs introspection on the bean's class whenever the function is called. We can, however, cache the mapping from keywords to getters using JDK 7's handy ClassValue. The proposed function will look like this:

(def ^:private ^java.lang.ClassValue bean-class-value
(proxy [java.lang.ClassValue]
[]
(computeValue [c]
(reduce1 (fn [m ^java.beans.PropertyDescriptor pd]
(let [name (. pd (getName))
method (. pd (getReadMethod))
type (.getPropertyType pd)]
(if (and method (zero? (alength (. method (getParameterTypes)))))
(assoc m (keyword name) (fn [x] (clojure.lang.Reflector/prepRet type (. method (invoke x nil)))))
m)))
{}
(seq (.. java.beans.Introspector
(getBeanInfo c)
(getPropertyDescriptors)))))))

(defn bean
"Takes a Java object and returns a read-only implementation of the
map abstraction based upon its JavaBean properties."
{:added "1.0"}
[^Object x]
(let [c (. x (getClass))
pmap (.get bean-class-value c)
v (fn [k] ((pmap k) x))
snapshot (fn []
(reduce1 (fn [m e]
(assoc m (key e) ((val e) x)))
{} (seq pmap)))]
(proxy [clojure.lang.APersistentMap]
[]
(containsKey [k] (contains? pmap k))
(entryAt [k] (when (contains? pmap k) (new clojure.lang.MapEntry k (v k))))
(valAt ([k] (when (contains? pmap k) (v k)))
([k default] (if (contains? pmap k) (v k) default)))
(cons [m] (conj (snapshot) m))
(count [] (count pmap))
(assoc [k v] (assoc (snapshot) k v))
(without [k] (dissoc (snapshot) k))
(seq [] ((fn thisfn [plseq]
(lazy-seq
(when-let [pseq (seq plseq)]
(cons (new clojure.lang.MapEntry (first pseq) (v (first pseq)))
(thisfn (rest pseq)))))) (keys pmap))))))



 Comments   
Comment by Jozef Wagner [ 30/Jan/14 9:25 AM ]

associated discussion

Comment by Alex Miller [ 31/Jan/14 8:45 AM ]

Curious if anyone is using this inside production code? I use it at the REPL when exploring Java stuff sometimes but have never used it inside my actual code.

Comment by Stuart Halloway [ 31/Jan/14 12:28 PM ]

This is a cool idea, but doesn't need to be in core. How about https://github.com/clojure/java.data ?

Comment by Ron Pressler [ 31/Jan/14 12:48 PM ]

This might be good for java.data, and I'm certainly not saying it should be in the core except for the fact that it already is. This is a proposal for a performance improvement of a core function.





[CLJ-1296] locking expressions cause vars to be dereferenced, even if not executed, unless wrapped in let Created: 17/Nov/13  Updated: 18/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance


 Description   

Description of one example with poor performance discovered by Michał Marczyk in the discussion thread linked below.

The difference between the compiled versions of:

(defn foo [x]
  (if (> x 0)
    (inc x)
    (locking o
      (dec x))))

and

(defn bar [x]
  (if (> x 0)
    (inc x)
    (let [res (locking o
                (dec x))]
      res)))

is quite significant. foo gets compiled to a single class, with invocations handled by a single invoke method; bar gets compiled to a class for bar + an extra class for an inner function which handles the (locking o (dec x)) part – probably very similar to the output for the version with the hand-coded locking-part (although I haven't really looked at that yet). The inner function is a closure, so calling it involves an allocation of a closure object; its ctor receives the closed-over locals as arguments and stores them in two fields (lockee and x). Then they get loaded from the fields in the body of the closure's invoke method etc.

Note: The summary line may be too narrow a description of the root cause, and simply the first example of a case where this issue was noticed and examined. Please make the summary and this description more accurate if you diagnose this issue.

See discussion thread on Clojure group here: https://groups.google.com/forum/#!topic/clojure/x86VygZYf4Y



 Comments   
Comment by Kevin Downey [ 18/Apr/14 2:17 AM ]

maybe it is already clear to others, but this was not immediately clear to me:

the reason

(defn bar [x]
  (if (> x 0)
    (inc x)
    (let [res (locking o
                (dec x))]
      res)))

generates a second class is locking is a macro that contains a try/finally form in it's expansion.

binding the result of a try/finally form to a result (as in the let) would require some real tricky code gen without adding the extra function, so of course the clojure compile adds the extra function.





[CLJ-1295] Speed up dissoc on array-maps Created: 15/Nov/13  Updated: 15/Nov/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance

Attachments: File clj-1295-1.diff    
Patch: Code
Approval: Triaged

 Description   

In latest Clojure master as of Nov 15 2013, the method without() in PersistentArrayMap.java first searches for a matching key using indexOf(key) and saves the result in i.

If a matching key was found, the code then copies the old array to the new smaller one, but unnecessarily repeats the comparison of every key in the map to the key being removed, even though its location is already stored in i.



 Comments   
Comment by Andy Fingerhut [ 15/Nov/13 7:05 PM ]

The patch clj-1295-1.diff changes PersistentArrayMap's without() to use System.arraycopy to copy only the necessary parts from the current array to newArray, similar to PersistentHashMap's method removePair().

Benchmark 1 has strings for keys, which are relatively slow to compare to each other.

(def m1 (array-map "abcdef" 1 "abcdeg" 2 "abcdeh" 3 "abcdei" 4))
(time (dotimes [i 100000000] (dissoc m1 "abcdei")))

1.6.0-alpha2 with no changes:
"Elapsed time: 29663.443 msecs"
"Elapsed time: 29490.225 msecs"
"Elapsed time: 29600.138 msecs"
"Elapsed time: 29627.948 msecs"

1.6.0-alpha2 with patch clj-1295-1.diff:
"Elapsed time: 6362.006 msecs"
"Elapsed time: 6121.006 msecs"
"Elapsed time: 6163.377 msecs"
"Elapsed time: 6155.299 msecs"
"Elapsed time: 6395.224 msecs"

Averages about 21% of the run time before the change.

Benchmark 2 has keywords for keys, which are compared via Java ==, so as fast as comparison can get.

(def m2 (array-map :abcdef 1 :abcdeg 2 :abcdeh 3 :abcdei 4))
(time (dotimes [i 100000000] (dissoc m2 :abcdei)))

1.6.0-alpha2 with no changes:
"Elapsed time: 5033.863 msecs"
"Elapsed time: 5028.327 msecs"
"Elapsed time: 5045.019 msecs"
"Elapsed time: 5004.751 msecs"
"Elapsed time: 5039.143 msecs"

1.6.0-alpha2 with patch clj-1295-1.diff:
"Elapsed time: 2874.748 msecs"
"Elapsed time: 2862.878 msecs"
"Elapsed time: 2887.778 msecs"
"Elapsed time: 2874.196 msecs"
"Elapsed time: 2861.807 msecs"

Averages about 57% of the run time before the change.





[CLJ-1289] aset-* and aget perform poorly on multi-dimensional arrays even with type hints. Created: 01/Nov/13  Updated: 14/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Michael O. Church Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: arrays, performance
Environment:

Clojure 1.5.1.

Dependencies: criterium


Attachments: Text File CLJ-1289-p1.patch    
Patch: Code
Approval: Triaged

 Description   

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
             Execution time mean : 9.457308 µs
    Execution time std-deviation : 126.220954 ns
   Execution time lower quantile : 9.344450 µs ( 2.5%)
   Execution time upper quantile : 9.629202 µs (97.5%)
                   Overhead used : 2.477107 ns

One workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
             Execution time mean : 6.999273 ns
    Execution time std-deviation : 0.112703 ns
   Execution time lower quantile : 6.817782 ns ( 2.5%)
   Execution time upper quantile : 7.113845 ns (97.5%)
                   Overhead used : 2.477107 ns

Cause: The inlined version only applies to arity 2, and otherwise it reflects.



 Comments   
Comment by Gary Fredericks [ 08/Dec/13 9:28 PM ]

A glance at the source makes it obvious that the hypothesis is correct – the inlined version only applies to arity 2, and otherwise it reflects.

I thought this would be as simple as converting the inline function to be variadic (using reduce), but after trying it I realized this is tricky as you have to generate the correct type hints for each step. E.g., given ^"[[D" the inline function needs to type-hint the intermediate result with ^"[D". This isn't difficult if we're just dealing with strings that begin with square brackets, but I don't know for sure that those are the only possibilities.

Comment by Yaron Peleg [ 13/Feb/14 4:44 AM ]

Bump. I just got bitten bad by this.

There are two seperate issues here:
1) (aget 2d-array-doubles 0 0 ) doesn't emit a reflection warning.
2) It seems like the compiler has enough information to avoid the reflective call here.

Note this gets exp. worse as number of dimensions grows, i.e (get doubles3d 0 0 0)
will be 1M slower, etc' Not true, unless you iterate over all elements. it's
simply n_dims*1000x per lookup.

Nasty surprise, especially considering you often go to primitive arrays for speed,
and a common use case is an inner loop(s) that iterate over arrays.

Comment by Gary Fredericks [ 13/Feb/14 7:08 AM ]

I can probably take a stab at this.

Comment by Gary Fredericks [ 13/Feb/14 8:34 PM ]

I think the reflection warning problem is pretty much impossible to solve without changing code elsewhere in the compiler, because the reflection done in aget is a different kind than normal clojure reflection – it's explicitly in the function body rather than emitted by the compiler. Since the compiler isn't emitting it, it doesn't reasonably know it's even there. So even if aget is fixed for other arities, you still won't get the warning when it's not inlined.

I can imagine some sort of metadata that you could put on a function telling the compiler that it will reflect if not inlined. Or maybe a more generic not-inlined warning?

The global scope of adding another compiler flag seems about balanced by the seriousness of array functions not being able to warn on reflection.

Comment by Gary Fredericks [ 13/Feb/14 8:52 PM ]

Attached CLJ-1289-p1.patch which simply inlines variadic calls to aget. It assumes that if it sees a :tag on the array arg that is a string beginning with [, it can assume that the return value from one call to aget can be tagged with the same string with the leading [ stripped off.

I'm not a jvm expert, but having read through the spec a little bit I think this is a reasonable assumption.

Comment by Alex Miller [ 14/Feb/14 3:34 PM ]

I think this probably is actually true, but a more official way to ask that question would be to get the array class and ask for Class.getComponentType() (and less janky than string munging).

Comment by Gary Fredericks [ 14/Feb/14 3:40 PM ]

How would you get the array class based on the :tag type hint?

Comment by Gary Fredericks [ 14/Feb/14 7:05 PM ]

I see (-> s (Class/forName) (.getComponentType) (.getName)) does the same thing – is that route preferred, or is there another one?





[CLJ-1288] aset-* and aget: on multi-dimensional arrays (e.g. double[][]) these fn reflect (and, thus, perf. poorly) even with type hints. Created: 01/Nov/13  Updated: 01/Nov/13  Resolved: 01/Nov/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Michael O. Church Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: arrays, performance
Environment:

Clojure 1.5.1.

Dependencies: criterium



 Description   

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
Execution time mean : 9.457308 µs
Execution time std-deviation : 126.220954 ns
Execution time lower quantile : 9.344450 µs ( 2.5%)
Execution time upper quantile : 9.629202 µs (97.5%)
Overhead used : 2.477107 ns
nil

A (n ugly) workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
Execution time mean : 6.999273 ns
Execution time std-deviation : 0.112703 ns
Execution time lower quantile : 6.817782 ns ( 2.5%)
Execution time upper quantile : 7.113845 ns (97.5%)
Overhead used : 2.477107 ns
nil



 Comments   
Comment by Alex Miller [ 01/Nov/13 1:09 PM ]

Dupe of CLJ-1289.





[CLJ-1277] Speed up printing of time instants Created: 10/Oct/13  Updated: 10/Oct/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance

Attachments: Text File clj-1277-1.txt    
Patch: Code
Approval: Triaged

 Description   

There are several occurrences of reflection in instant.clj that slow down the printing of time instants.

Clojure Google group conversation link: https://mail.google.com/mail/u/0/?shva=1#label/clojure/1419e1e6f6cc5b3d

The addition of a few type hints is enough to speed the printing of time instants by a factor of about 3 to 4.5, in a few small benchmarks.



 Comments   
Comment by Andy Fingerhut [ 10/Oct/13 12:07 AM ]

Patch clj-1277-1.txt adds 4 type hints that eliminate all reflection occurrences in source file instant.clj. Benchmarks show that it speeds up printing of java.util.Date and java.sql.Timestamp objects by a factor of about 3 to 4.5.

Latest Clojure master as of Oct 9 2013:

user=> (time (let [d (java.util.Date.)] (dotimes [i 3000000] (pr-str d))))
"Elapsed time: 24094.282 msecs"
user=> (import 'java.sql.Timestamp)
user=> (time (let [d (java.sql.Timestamp. 1300000000000)] (dotimes [i 2000000] (pr-str d))))
"Elapsed time: 20856.957 msecs"

That version of Clojure plus the patch clj-1277-1.txt:

user=> (time (let [d (java.util.Date.)] (dotimes [i 3000000] (pr-str d))))
"Elapsed time: 5085.847 msecs"
user=> (time (let [d (java.sql.Timestamp. 1300000000000)] (dotimes [i 2000000] (pr-str d))))
"Elapsed time: 7582.233 msecs"

Comment by Alexander Kiel [ 10/Oct/13 4:54 AM ]

Thanks for the patch Andy. But I don't like the (set! warn-on-reflection true). I think its better to use it only in the dev profile in leiningen. Not in real production code.

Comment by Andy Fingerhut [ 10/Oct/13 4:06 PM ]

Alexander, Leiningen is not used for building Clojure itself. The two supported choices are Maven and ant. Several Clojure source files, e.g. core/protocols.clj and core/reducers.clj, set warn-on-reflection to true, I believe so that if code is changed in such a way as to introduce a warning, it will be caught more quickly.

If the screeners or Rich think it is inappropriate, it is easy enough to remove.

Comment by Alex Miller [ 10/Oct/13 4:48 PM ]

Setting that is not uncommon in core clojure code and seems fine to me here.





[CLJ-1266] Better primitive support for floats Created: 26/Sep/13  Updated: 05/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance

Attachments: File floats.diff     File floats-intrinsics.diff    
Patch: Code

 Description   

Clojure offers optimized arithmetic functions for long, int and doubles but none for floats.
Plus converting from integers (ints or longs) to floating point numbers (float or double) doesn't use the specialized bytecode.
This patch adds float-add/subtract/multiply/divide and more efficient coversion from integers to floating points numbers.



 Comments   
Comment by Alex Miller [ 05/Feb/14 12:29 PM ]

I think it's unlikely the arithmetic float ops will be accepted.
However, the intrinsics changes could be useful - could you split those into a new ticket?

Comment by Christophe Grand [ 05/Feb/14 4:20 PM ]

I attached a new patch with only intrinsics and more comprehensive primitive coercion. Is it the split you expected?

Comment by Alex Miller [ 05/Feb/14 5:22 PM ]

No, but totally my fault for saying the wrong words.

I think the changes in Compiler to get access to I2D, L2D, I2F, and L2F are potentially useful (most particularly L2D) - these would make sense in a new ticket.

The other changes in Intrinsics and Numbers to support float math are unlikely to be accepted.

Comment by Christophe Grand [ 05/Feb/14 5:40 PM ]

Godd thing I had already split coercions in a sparate commit then

See http://dev.clojure.org/jira/browse/CLJ-1340





[CLJ-1259] Speed up pprint Created: 09/Sep/13  Updated: 13/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, print

Attachments: Text File clj-1259-1.txt    
Patch: Code
Approval: Triaged

 Description   

There are many occurrences of reflection in the pprint implementation.

By eliminating all of them, I ran one benchmark of pprint'ing a Clojure map that resulted in a 300 Kbyte output. After eliminating reflection, the elapsed time to pprint was reduced by 18% (about 14.0 sec down to about 11.5 sec) on a recent model MacBook Pro.



 Comments   
Comment by Andy Fingerhut [ 09/Sep/13 11:36 PM ]

Patch clj-1259-1.txt eliminates all occurrences of reflection in pprint, and all files loaded from pprint.clj. It also sets warn-on-reflection to true for those files, in hopes of making it more obvious if a new use of reflection is added there.





[CLJ-1224] Records do not cache hash like normal maps Created: 24/Jun/13  Updated: 14/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Brandon Bloom Assignee: Unassigned
Resolution: Unresolved Votes: 7
Labels: defrecord, performance

Attachments: Text File 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch    
Patch: Code
Approval: Triaged

 Description   

Records do not cache their hash codes like normal Clojure maps, which affects their performance. This problem has been fixed in CLJS, but still affects JVM CLJ.

Approach: Cache hash values in record definitions, similar to maps.

Patch: 0001-CLJ-1224-cache-hasheq-and-hashCode-for-records.patch

Also see: http://dev.clojure.org/jira/browse/CLJS-281



 Comments   
Comment by Nicola Mometto [ 14/Feb/14 5:46 PM ]

I want to point out that my patch breaks ABI compatibility.
A possible approach to avoid this would be to have 3 constructors instead of 2, I can write the patch to support this if desired.





[CLJ-1192] vec function is substantially slower than into function Created: 06/Apr/13  Updated: 18/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Luke VanderHart Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Approval: Vetted

 Description   

(vec coll) and (into [] coll) do exactly the same thing. However, due to into using transients, it is substantially faster. On my machine:

(time (dotimes [_ 100] (vec (range 100000))))
"Elapsed time: 732.56 msecs"

(time (dotimes [_ 100] (into [] (range 100000))))
"Elapsed time: 491.411 msecs"

This is consistently repeatable.

Since vec's sole purpose is to transform collections into vectors, it should do so at the maximum speed available.



 Comments   
Comment by Andy Fingerhut [ 07/Apr/13 5:50 PM ]

I am pretty sure that Clojure 1.5.1 also uses transient vectors for (vec (range n)) (probably also some earlier versions of Clojure, too).

Look at vec in core.clj. It checks whether its arg is a java.util.Collection, which lazy seqs are, so calls (clojure.lang.LazilyPersistentVector/create coll).

LazilyPersistentVector's create method checks whether its argument is an ISeq, which lazy seqs are, so it calls PersistentVector.create(RT.seq(coll)).

All 3 of PersistentVector's create() methods use transient vectors to build up the result.

I suspect the difference in run times are not because of transients or not, but because of the way into uses reduce, and perhaps may also have something to do with the perhaps-unnecessary call to RT.seq in LazilyPersistentVector's create method (in this case, at least – it is likely needed for other types of arguments).

Comment by Alan Malloy [ 14/Jun/13 2:17 PM ]

I'm pretty sure the difference is that into uses reduce: since reducers were added in 1.5, chunked sequences know how to reduce themselves without creating unnecessary cons cells. PersistentVector/create doesn't use reduce, so it has to allocate a cons cell for each item in the sequence.

Comment by Gary Fredericks [ 08/Sep/13 1:55 PM ]

Is there any downside to (defn vec [coll] (into [] coll)) (or the inlined equivalent)?

Comment by Ghadi Shayban [ 11/Apr/14 5:13 PM ]

While I agree that there are improvements and possibly low-hanging fruit, FWIW https://github.com/clojure/tools.analyzer/commit/cf7dda81a22f4c9c1fe64c699ca17e7deed61db4#commitcomment-5989545

showed a 5% slowdown from a few callsites in tools.analyzer.

This ticket's benchmark is incomplete in that it covers a single type of argument (chunked range), and flawed as it timing the expense of realizing the range. (That could be a legit benchmark case, but it shouldn't be the only one).

Sorry to rain on a parade. I promise like speed too!





[CLJ-1096] Make destructuring emit direct keyword lookups Created: 29/Oct/12  Updated: 04/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Christophe Grand Assignee: Christophe Grand
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: File desctructure-keyword-lookup.diff     File inline-get-keyword.diff    
Patch: Code

 Description   

Currently associative destructuring emits calls to get. The attached patch modify desctruture to emit direct keyword lookups when possible.

Approved here https://groups.google.com/d/msg/clojure-dev/MaYcHQck8VA/nauMus4mzPgJ



 Comments   
Comment by Christophe Grand [ 04/Sep/13 3:40 AM ]

Rethinking about this patch now, it may be too specific: get's inline expansion should be modified when the key is a literal keyword.

Comment by Christophe Grand [ 04/Sep/13 3:41 AM ]

More generic patch (inline-get-keyword.diff): all get calls with literal keywords as keys are inlined to direct keyword lookup.





[CLJ-1090] Indirect function calls through Var instances fail to clear locals Created: 22/Oct/12  Updated: 25/Oct/13  Resolved: 25/Oct/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: Release 1.6

Type: Defect Priority: Minor
Reporter: Spencer Tipping Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance
Environment:

Probably all, but observed on Ubuntu 12.04, OpenJDK 6


Attachments: File var-clear-locals.diff     File var-clear-locals-patch-v2.diff     Text File var-clear-locals-patch-v2.txt    
Patch: Code
Approval: Ok

 Description   

If you make a function call indirectly by invoking a Var object (which derefs itself and invokes the result), the invocation parameters remain in the thread's local stack for the duration of the function call, even though they are needed only long enough to be passed into the deref'd function. As a result, passing a lazy seq into a function invoked in its Var form may run out of memory if the seq is forced inside that function. For example:

(defn total [xs] (reduce + 0 xs))
(total (range 1000000000))   ; this works, though takes a while
(#'total (range 1000000000)) ; this dies with out of memory error

Solution: Similar to RestFn, wrap each argN in Var inside a Util.ret1(argN, argN = null).

Patch: var-clear-locals-patch-v2.diff

Screened by: Alex Miller



 Comments   
Comment by Spencer Tipping [ 23/Oct/12 1:37 PM ]

Sorry, I typo'd the example. (defn total ...) should be (defn sum ...).

Comment by Timothy Baldridge [ 27/Nov/12 11:45 AM ]

fixed typeo in example

Comment by Timothy Baldridge [ 27/Nov/12 11:47 AM ]

Couldn't reproduce the exception, but the 2nd example did chew through about 4x the amount of memory. Vetting.

Comment by Timothy Baldridge [ 29/Nov/12 2:57 PM ]

adding a patch. Since most of Clojure ends up running this code in one way or another, I'd assert that tests are included as part of the normal Clojure test process.

Patch simply calls Util.ret1(argx, argx=null) on all invoke calls.

Comment by Timothy Baldridge [ 29/Nov/12 3:17 PM ]

And as a note, both examples in the original report now have extremely similar memory usages.

Comment by Spencer Tipping [ 30/Nov/12 2:22 PM ]

Sounds great, and the patch looks good too. Let me know if I need to do anything else.

Comment by Alex Miller [ 22/Aug/13 10:35 PM ]

Switching back to Triaged as afaict Rich has not Vetted this one.

Comment by Andy Fingerhut [ 05/Sep/13 6:18 PM ]

Patch var-clear-locals-patch-v2.txt is identical to var-clear-locals.diff (and preserves credit to its author), except it eliminates trailing whitespace in some added lines that cause git to give warnings when applying the patch.





[CLJ-1087] clojure.data/diff uses set union on key seqs Created: 15/Oct/12  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Tom Jack Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File clj-1087-diff-perf-enhance-patch-v1.txt    
Patch: Code

 Description   

clojure.data/diff, on line 118, defines:

java.util.Map
(diff-similar [a b]
  (diff-associative a b (set/union (keys a) (keys b))))

Since keys returns a key seq, this seems like an error. clojure.set/union has strange and inconsistent behavior with regard to non-sets, and in this case the two key seqs are concatenated. Based on a cursory benchmark, it seems that this bug is a slight performance gain when the maps have no common keys, and a significant performance loss when the maps have the same keys. The results are still correct because of the merging reduce in diff-associative.

The patch is easy (just call set on each key seq).



 Comments   
Comment by Andy Fingerhut [ 15/Oct/12 2:52 PM ]

clj-1087-diff-perf-enhance-patch-v1.txt dated Oct 15 2012 implements Tom's suggested performance enhancement, although not exactly in the way he suggested. It does calculate the union of the two key sequences.





[CLJ-1080] Eliminate many uses of reflection in Clojure code Created: 30/Sep/12  Updated: 10/Oct/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4, Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, typehints

Attachments: Text File clj-1080-v5.txt    
Patch: Code

 Description   

There are dozens of uses of reflection in Clojure code that can be eliminated by adding suitable type hints. This patch adds the necessary type hints for most of those.



 Comments   
Comment by Andy Fingerhut [ 30/Sep/12 11:39 AM ]

Patch clj-1080-eliminate-many-reflection-warnings-patch-v1.txt dated Sep 30 2012 adds type hints to eliminate many uses of reflection in Clojure core code.

Comment by Andy Fingerhut [ 14/Nov/12 1:26 PM ]

clj-1080-eliminate-many-reflection-warnings-patch-v2.txt dated Nov 14 2012 is identical to the previous patch (to be removed soon), except it applies cleanly to latest master.

Comment by Andy Fingerhut [ 07/Feb/13 8:54 AM ]

clj-1080-eliminate-many-reflection-warnings-patch-v3.txt dated Feb 7 2013 is identical to the previous patch (to be removed soon), except it applies cleanly to latest master. One type hint in the patch was added due to a different change, and was no longer needed in the patch.

Comment by Andy Fingerhut [ 09/Sep/13 11:37 PM ]

Patch clj-1080-v4.txt eliminates many, but not all, uses of reflection. To avoid overlap with CLJ-1259, it does not touch pprint or any of the files loaded from pprint.clj. See CLJ-1259 for those.

Comment by Andy Fingerhut [ 10/Oct/13 12:22 AM ]

Patch clj-1080-v5.txt eliminates many, but not all, uses of reflection. It does not touch pprint or any of the files loaded from pprint.clj – see CLJ-1259 for those. Similarly see CLJ-1277 for elimination of reflection in instant.clj





[CLJ-1050] Remove a lock in the common case of deref of delay Created: 26/Aug/12  Updated: 18/Sep/12  Resolved: 18/Sep/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: None

Type: Enhancement Priority: Trivial
Reporter: Nicolas Oury Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: enhancement, patch, performance

Attachments: Text File 0001-No-lock-in-the-common-path-for-delays.patch    

 Description   

Currently, when deref is called in Delay.java, a lock on the Delay is always acquired.
This is wasteful as most of the time you just want to read the val.

The attached patch changes this behaviour to the following:

  • val is initialised to a known secret value. (During its constructor so it is visible from any thread).
  • When deref is called, val is checked to the known secret value.
  • If it is not the secret value, then it has to be the value computed by the function and we return it.
  • If it is the secret value, then we lock this object and revert to the current behaviour.

This is faster than what is done currently and can be very much faster if there is contention on the delay.



 Comments   
Comment by Nicolas Oury [ 27/Aug/12 2:37 AM ]

Please close and reject. The patch is not working if val has non final fields.





[CLJ-1005] Use transient map in zipmap Created: 30/May/12  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Michał Marczyk Assignee: Aaron Bedra
Resolution: Unresolved Votes: 3
Labels: performance

Attachments: Text File 0001-Use-transient-map-in-zipmap.2.patch     Text File 0001-Use-transient-map-in-zipmap.patch    
Patch: Code
Approval: Triaged

 Description   

The attached patch changes zipmap to use a transient map internally. The definition is also moved so that it resides below that of #'transient. The original definition is commented out (like that of #'into).



 Comments   
Comment by Aaron Bedra [ 14/Aug/12 9:24 PM ]

Why is the old implementation left and commented out? If we are going to move to a new implementation, the old one should be removed.

Comment by Michał Marczyk [ 15/Aug/12 4:17 AM ]

As mentioned in the ticket description, the previously attached patch follows the pattern of into whose non-transient-enabled definition is left in core.clj with a #_ in front – I wasn't sure if that's something desirable in all cases.

Here's a new patch with the old impl removed.

Comment by Andy Fingerhut [ 15/Aug/12 10:37 AM ]

Thanks for the updated patch, Michal. Sorry to raise such a minor issue, but would you mind using a different name for the updated patch? I know JIRA can handle multiple attached files with the same name, but my prescreening code isn't quite that talented yet, and it can lead to confusion when discussing patches.

Comment by Michał Marczyk [ 15/Aug/12 10:42 AM ]

Thanks for the heads-up, Andy! I've reattached the new patch under a new name.

Comment by Andy Fingerhut [ 16/Aug/12 8:24 PM ]

Presumptuously changing Approval from Incomplete back to None after the Michal's updated patch was added, addressing the reason the ticket was marked incomplete.

Comment by Aaron Bedra [ 11/Apr/13 5:32 PM ]

The patch looks good and applies cleanly. Are there additional tests that we should run to verify that this is providing the improvement we think it is. Also, is there a discussion somewhere that started this ticket? There isn't a lot of context here.

Comment by Michał Marczyk [ 11/Apr/13 6:19 PM ]

Hi Aaron,

Thanks for looking into this!

From what I've been able to observe, this change hugely improves zipmap times for large maps. For small maps, there is a small improvement. Here are two basic Criterium benchmarks (transient-zipmap defined at the REPL as in the patch):

;;; large map
user=> (def xs (range 16384))
#'user/xs
user=> (last xs)
16383
user=> (c/bench (zipmap xs xs))
Evaluation count : 13920 in 60 samples of 232 calls.
             Execution time mean : 4.329635 ms
    Execution time std-deviation : 77.791989 us
   Execution time lower quantile : 4.215050 ms ( 2.5%)
   Execution time upper quantile : 4.494120 ms (97.5%)
nil
user=> (c/bench (transient-zipmap xs xs))
Evaluation count : 21180 in 60 samples of 353 calls.
             Execution time mean : 2.818339 ms
    Execution time std-deviation : 110.751493 us
   Execution time lower quantile : 2.618971 ms ( 2.5%)
   Execution time upper quantile : 3.025812 ms (97.5%)

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 25.4675 % Variance is moderately inflated by outliers
nil

;;; small map
user=> (def ys (range 16))
#'user/ys
user=> (last ys)
15
user=> (c/bench (zipmap ys ys))
Evaluation count : 16639020 in 60 samples of 277317 calls.
             Execution time mean : 3.803683 us
    Execution time std-deviation : 88.431220 ns
   Execution time lower quantile : 3.638146 us ( 2.5%)
   Execution time upper quantile : 3.935160 us (97.5%)
nil
user=> (c/bench (transient-zipmap ys ys))
Evaluation count : 18536880 in 60 samples of 308948 calls.
             Execution time mean : 3.412992 us
    Execution time std-deviation : 81.338284 ns
   Execution time lower quantile : 3.303888 us ( 2.5%)
   Execution time upper quantile : 3.545549 us (97.5%)
nil

Clearly the semantics are preserved provided transients satisfy their contract.

I think I might not have started a ggroup thread for this, sorry.





[CLJ-1000] Performance drop in PersistentHashMap.valAt(...) in v.1.4 -- Util.hasheq(...) ? Created: 21/May/12  Updated: 01/Mar/13  Resolved: 11/Dec/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: Release 1.5

Type: Enhancement Priority: Major
Reporter: Oleksandr Shyshko Assignee: Timothy Baldridge
Resolution: Completed Votes: 1
Labels: performance
Environment:

Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)


Attachments: File caching-hasheq-v3.diff     PNG File clj_13.png     PNG File clj_14.png    
Patch: Code
Approval: Ok

 Description   

It seems there is a 30-40% performance degradation of PersistentHashMap.valAt(...) in Clojure 1.4.
Possibly due to references to new CPU-hungry implementation of Util.hasheq(...).

I have created a demo project with more details and some profiling information here:
https://github.com/oshyshko/clj-perf



 Comments   
Comment by Christophe Grand [ 27/Nov/12 8:30 AM ]

I added a patch consisting of three commits:

  • one which adds caching to seqs, sets, maps, vectors and queues
  • one that aligns the shape of Util/hasheq on the one Util/equiv (to have a consistent behavior from the JIT compiler: without that deoptimization was more penalizing for hasheq than for equiv)
  • one that fix hasheq on records (which was non consistent with its equiv impl.) – and this commit relies on a static method introduced in the "caching hasheq" commit
Comment by Timothy Baldridge [ 30/Nov/12 12:10 PM ]

In the process of screening this, I'm not seeing much of a performance difference after applying the patch.

before patch:
user=> (-main)

Version: 1.5.0-master-SNAPSHOT
"Elapsed time: 6373.752 msecs"
"Elapsed time: 6578.037 msecs"
"Elapsed time: 6476.399 msecs"

after patch:
user=> (-main)

Version: 1.5.0-master-SNAPSHOT
"Elapsed time: 6182.699 msecs"
"Elapsed time: 6548.086 msecs"
"Elapsed time: 6496.711 msecs"

clojure 1.4:
user=> (-main)

Version: 1.4.0
"Elapsed time: 6484.234 msecs"
"Elapsed time: 6243.672 msecs"
"Elapsed time: 6248.898 msecs"

clojure 1.3
user=> (-main)

Version: 1.3.0
"Elapsed time: 3584.966 msecs"
"Elapsed time: 3618.189 msecs"
"Elapsed time: 3372.979 msecs"

I blew away my local clojure repo and re-applied the patch just to make sure, but the results are the same. Does this fix not optimize the case given in the original test project?

For reference I'm running this code:

(defn -main
[& args]

(println)
(println "Version: " (clojure-version))

(def mm 10000)

(def str-keys (map str (range mm)))
(def m (zipmap str-keys (range mm)))
(time (dotimes [i mm] (doseq [k str-keys] (m k))))

(def kw-keys (map #(keyword (str %)) (range mm)))
(def m (zipmap kw-keys (range mm)))
(time (dotimes [i mm] (doseq [k kw-keys] (m k))))

(def sym-keys (map #(symbol (str %)) (range mm)))
(def m (zipmap sym-keys (range mm)))
(time (dotimes [i mm] (doseq [k sym-keys] (m k))))

(println))

Comment by Christophe Grand [ 30/Nov/12 2:10 PM ]

Sorry, I was too quick to react on the ML (someone said it was related to hasheq caching and since I had the patch almost ready: on a project I noticed too much time spent computing hasheq on vectors).
So no, my patch doesn't improve anything with kws, syms or strs as keys. However when keys are collections, it fares better.

In 1.4, for a "regular" object, it must fails two instanceof tests before calling .hashCode().
If we make keywords and symbols implement IHashEq and reverse the test order (first IHashEq, then Number) it should improve the performance of the above benchmark – except for Strings.

Comment by Timothy Baldridge [ 30/Nov/12 2:16 PM ]

Marking as incomplete, should we also delete the patch as it seems like it should be in a different ticket?

Comment by Christophe Grand [ 03/Dec/12 10:00 AM ]

In 1.3, #'hash was going through Object.hashCode and thus was simple and fast. Plus collections hashes were cached.
In 1.4 and master, #'hash goes now through two instanceof test (Number and IHasheq in this order) before trying Object.hashCode in last resort. Plus collections hashes are not cached.
As such I'm not sure Util.hasheq inherent slowness (compared to Util.hash) and hasheq caching should be separated in two issues.

The caching-hasheq-v2.diff patchset reintroduces hashes caching for collections/hasheq and reorders the instanceof tests (to test for IHashEq before Number) and makes Keyword and Symbol implement IHashEq to branch fast in Util.hasheq.

I recommend adding a collection test to the current benchmark:

(defn -main
[& args]

(println)
(println "Version: " (clojure-version))

(def mm 10000)

(def str-keys (map str (range mm)))
(def m (zipmap str-keys (range mm)))
(time (dotimes [i mm] (doseq [k str-keys] (m k))))

(def kw-keys (map #(keyword (str %)) (range mm)))
(def m (zipmap kw-keys (range mm)))
(time (dotimes [i mm] (doseq [k kw-keys] (m k))))

(def sym-keys (map #(symbol (str %)) (range mm)))
(def m (zipmap sym-keys (range mm)))
(time (dotimes [i mm] (doseq [k sym-keys] (m k))))

(def vec-keys (map (comp (juxt keyword symbol identity) str) (range mm)))
(def m (zipmap vec-keys (range mm)))
(time (dotimes [i mm] (doseq [k vec-keys] (m k))))

(println))
Comment by Timothy Baldridge [ 03/Dec/12 10:38 AM ]

For some reason I can't get v2 to build against master. It applies cleanly, but fails to build.

Comment by Christophe Grand [ 03/Dec/12 11:30 AM ]

Timothy: I inadvertently deleted a "public" modifier before commiting... fixed in caching-hasheq-v3.diff

Comment by Timothy Baldridge [ 10/Dec/12 11:00 AM ]

I now get the following results:

Version: 1.4.0
"Elapsed time: 6281.345 msecs"
"Elapsed time: 6344.321 msecs"
"Elapsed time: 6108.55 msecs"
"Elapsed time: 36172.135 msecs"

Version: 1.5.0-master-SNAPSHOT (pre-patch)
"Elapsed time: 6126.337 msecs"
"Elapsed time: 6320.857 msecs"
"Elapsed time: 6237.251 msecs"
"Elapsed time: 18167.05 msecs"

Version: 1.5.0-master-SNAPSHOT (post-patch)
"Elapsed time: 6501.929 msecs"
"Elapsed time: 3861.987 msecs"
"Elapsed time: 3871.557 msecs"
"Elapsed time: 5049.067 msecs"

Marking as screened

Comment by Oleksandr Shyshko [ 10/Dec/12 3:53 PM ]

Please, could you add as a comment the bench result using 1.3 vs 1.5-master-post-patch?

Comment by Oleksandr Shyshko [ 11/Dec/12 2:13 PM ]

The performance with 1.5-master is now very close to 1.3 for 3/4 of the benchmark.

However, this code is still showing 43% performance drop (3411 ms VS 6030 ms – 1.3 VS 1.5-master):

(def str-keys (map str (range mm)))
(def m (zipmap str-keys (range mm)))
(time (dotimes [i mm] (doseq [k str-keys] (m k))))

Version: 1.3.0
"Elapsed time: 3411.353 msecs" <---
"Elapsed time: 3459.992 msecs"
"Elapsed time: 3365.182 msecs"
"Elapsed time: 3813.637 msecs"

Version: 1.4.0
"Elapsed time: 5710.073 msecs" <---
"Elapsed time: 5817.356 msecs"
"Elapsed time: 5774.856 msecs"
"Elapsed time: 18754.482 msecs"

Version: 1.5.0-master-SNAPSHOT
"Elapsed time: 6030.247 msecs" <---
"Elapsed time: 3372.637 msecs"
"Elapsed time: 3267.481 msecs"
"Elapsed time: 3852.927 msecs"

To reproduce:
$ git clone https://github.com/clojure/clojure.git
$ cd clojure
$ mvn install -Dmaven.test.skip=true

$ cd ..

$ git clone https://github.com/oshyshko/clj-perf.git
$ cd clj-perf
$ lein run-all





[CLJ-988] the locking in MultiFn.java (synchronized methods) can cause lots of contention in multithreaded programs Created: 08/May/12  Updated: 21/Sep/12  Resolved: 21/Sep/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.5

Type: Enhancement Priority: Major
Reporter: Kevin Downey Assignee: Stuart Sierra
Resolution: Completed Votes: 10
Labels: bug, performance

Attachments: Text File clj-988-tests-only-patch-v1.txt     File issue988-lockless-multifn+tests-120817.diff     File issue988-lockless-multifn+tests.diff    
Patch: Code and Test
Approval: Ok

 Description   

if you call a single multimethod a lot in multithreaded code you get lots of contention for the lock on the multimethod. this contention slows things down a lot.

this is due to getMethod being synchronized. it would be great if there was some fast non-locking path through the multimethod.



 Comments   
Comment by Kevin Downey [ 08/May/12 11:30 AM ]

http://groups.google.com/group/clojure-dev/browse_thread/thread/6a8219ae3d4cd0ae?hl=en

Comment by David Santiago [ 11/May/12 6:38 AM ]

Here's a stab in the dark attempt at rewriting MultiFn to use atoms to swap between immutable copies of its otherwise mutable state.

The four pieces of the MultiFn's state that are mutable and protected by locks are now contained in the MultiFn.State class, which is immutable and contains convenience functions for creating a new one with one field changed. An instance of this class is now held in an atom in the MultiFn called state. Changes to any of these four members are now done with an atomic swap of these State objects.

The getMethod/findAndCacheBestMethod complex was rewritten to better enable the atomic logic. findAndCacheBestMethod was replaced with findBestMethod, which merely looks up the method; the caching logic was moved to getMethod so that it can be retried easily as part of the work that method does.

As it was findAndCacheBestMethod seemingly had the potential to cause a stack overflow in a perfect storm of heavy concurrent modification, since it calls itself recursively if it finds that the hierarchy has changed while it has done its work. This logic is now done in the CAS looping of getMethod, so hopefully that is not even an unlikely possibility anymore.

There is still seemingly a small race here, since the check is done of a regular variable against the value in a ref. Now as before, the ref could be updated just after you do the check, but before the MultiFn's state is updated. Of course, only the method lookup part of a MultiFn call was synchronized before; it could already change after the lookup but before the method itself executed, having a stale method finish seemingly after the method had been changed. Things are no different now in general, with the atom-based approach, so perhaps this race is not a big deal, as a stale value can't persist for long.

The patch passes the tests and Clojure and multimethods seems to work.

Comment by Kevin Downey [ 12/May/12 8:59 PM ]

this patch gets rid of the ugly lock contention issues. I have not been able to benchmark it vs. the old multimethod implementation, but looking at it, I would not be surprised if it is faster when the system is in a steady state.

Comment by Stuart Halloway [ 08/Jun/12 12:11 PM ]

This looks straightforward, except for getMethod. Can somebody with context add more discussion of how method caching works?

Also, it would be great to have tests with this one.

Comment by David Santiago [ 15/Jun/12 4:44 AM ]

Obviously I didn't write the original code, so I'm not the ideal
person to explain this stuff. But I did work with it a bit recently,
so in the hopes that I can be helpful, I'm writing down my
understanding of the code as I worked with it. Since I came to the
code and sort of reverse engineered my way to this understanding,
hopefully laying this all out will make any mistakes or
misunderstandings I may have made easier to catch and correct. To
ensure some stability, I'll talk about the original MultiFn code as it
stands at this commit:
https://github.com/clojure/clojure/blob/8fda34e4c77cac079b711da59d5fe49b74605553/src/jvm/clojure/lang/MultiFn.java

There are four pieces of state that change as the multimethod is either
populated with methods or called.

  • methodTable: A persistent map from a dispatch value (Object) to
    a function (IFn). This is the obvious thing you think it is,
    determining which dispatch values call which function.
  • preferTable: A persistent map from a dispatch value (Object) to
    another value (Object), where the key "is preferred" to the value.
  • methodCache: A persistent map from a dispatch value (Object) to
    function (IFn). By default, the methodCache is assigned the same
    map value as the methodTable. If values are calculated out of the
    hierarchy during a dispatch, the originating value and the
    ultimate method it resolves to are inserted as additional items in
    the methodCache so that subsequent calls can jump right to the
    method without recursing down the hierarchy and preference table.
  • cachedHierarchy: An Object that refers to the hierarchy that is
    reflected in the latest cached values. It is used to check if the
    hierarchy has been updated since we last updated the cache. If it
    has been updated, then the cache is flushed.

I think reset(), addMethod(), removeMethod(), preferMethod(),
prefers(), isA(), dominates(), and resetCache() are extremely
straightforward in both the original code and the patch. In the
original code, the first four of those are synchronized, and the other
four are only called from methods that are synchronized (or from
methods only called from methods that are synchronized).

Invoking a multimethod through its invoke() function will call
getFn(). getFn() will call the getMethod() function, which is
synchronized. This means any call of the multimethod will wait for and
take a lock as part of method invocation. The goal of the patch in
this issue is to remove this lock on calls into the multimethod. It in
fact removes the locks on all operations, and instead keeps its
internal mutable state by atomically swapping a private immutable
State object held in an Atom called state.

The biggest change in the patch is to the
getFn()>getMethod()>findAndCacheBestMethod() complex from the
original code. I'll describe how that code works first.

In the original code, getFn() does nothing but bounce through
getMethod(). getMethod() tries three times to find a method to call,
after checking that the cache is up to date and flushing it if it
isn't:

1. It checks if there's a method for the dispatch value in the
methodCache.

2. If not, it calls findAndCacheBestMethod() on the
dispatch value. findAndCacheBestMethod() does the following:

1. It iterates through every map entry in the method table,
keeping at each step the best entry it has found so far
(according to the hierarchy and preference tables).

2. If it did not find a suitable entry, it returns null.

3. Otherwise, it checks if the hierarchy has been changed since the cache
was last updated. If it has not changed, it inserts the method
into the cache and returns it. If it has been changed, it
resets the cache and calls itself recursively to repeat the process.

3. Failing all else, it will return the method for the default
dispatch value.

Again, remember everything in the list above happens in a call to a
synchronized function. Also note that as it is currently written,
findAndCacheBestMethod() uses recursion for iteration in a way that
grows the stack. This seems unlikely to cause a stack overflow unless
the multimethod is getting its hierarchy updated very rapidly for a
sustained period while someone else tries to call it. Nonetheless, the
hierarchy is held in an IRef that is updated independently of the
locking of the MultiFn. Finally, note that the multimethod is locked
only while the method is being found. Once it is found, the lock is
released and the method actually gets called afterwards without any
synchronization, meaning that by the time the method actually
executes, the multimethod may have already been modified in a way that
suggests a different method should have been called. Presumably this
is intended, understood, and not a big deal.

Moving on now to the patch in this issue. As mentioned, the main
change is updating this entire apparatus to work with a single atomic
swap to control concurrency. This means that all updates to the
multimethod's state have to happen at one instant in time. Where the
original code could make multiple changes to the state at different
times, knowing it was safely protected by an exclusive lock, rewriting
for atom swaps requires us to reorganize the code so that all updates
to the state happen at the same time with a single CAS.

To implement this change, I pulled the implicit looping logic from
findAndCacheBestMethod() up into getMethod() itself, and broke the
"findBestMethod" part into its own function, findBestMethod(), which
makes no update to any state while implementing the same
logic. getMethod() now has an explicit loop to avoid stack-consuming
recursion on retries. This infinite for loop covers all of the logic
in getMethod() and retries until a method is successfully found and a
CAS operation succeeds, or we determine that the method cannot be
found and we return the default dispatch value's implementation.

I'll now describe the operation of the logic in the for loop. The
first two steps in the loop correspond to things getMethod() does
"before" its looping construct in the original code, but we have to do
in the loop to get the latest values.

1. First we dereference our state, and assign this value to both
oldState and newState. We also set a variable called needWrite to
false; this is so we can avoid doing a CAS (they're not free) when
we have not actually updated the state.

2. Check if the cache is stale, and flush it if so. If the cache
gets flushed, set needWrite to true, as the state has changed.

3. Check if the methodCache has an entry for this dispatch
value. If so, we are "finished" in the sense that we found the
value we wanted. However, we may need to update the state. So,

  • If needWrite is false, we can return without a CAS, so just
    break out of the loop and return the method.
  • Otherwise, we need to update the state object with a CAS. If
    the CAS is successful, break out of the loop and return the
    target function. Otherwise, continue on the next iteration
    of the loop, skipping any other attempts to fetch the method
    later in the loop (useless work, at this point).

4. The value was not in the methodCache, so call the function
findBestMethod() to attempt to find a suitable method based on the
hierarchy and preferences. If it does find us a suitable method,
we now need to cache it ourselves. We create a new state object
with the new method cache and attempt to update the state atom
with a CAS (we definitely need a write here, so no need to check
needWrite at this point).

The one thing that is possibly questionable is the check at this
point to make sure the hierarchy has not been updated since the
beginning of this method. I inserted this here to match the
identical check at the corresponding point in
findAndCacheBestMethod() in the original code. That is also a
second check, since the cache is originally checked for freshness
at the very beginning of getMethod() in the original code. That
initial check happens at the beginning of the loop in the
patch. Given that there is no synchronization with the method
hierarchy, it is not clear to me that this second check is needed,
since we are already proceeding with a snapshot from the beginning
of the loop. Nonetheless, it can't hurt as far as I can tell, it
is how the original code worked, and I assume there was some
reason for that, so I kept the second check.

5. Finally, if findBestMethod() failed to find us a method for the
dispatch value, find the method for the default dispatch value and
return that by breaking out of the loop.

So the organization of getMethod() in the patch is complicated by two
factors: (1) making the retry looping explicit and stackless, (2)
skipping the CAS when we don't need to update state, and (3) skipping
needless work later in the retry loop if we find a value but are
unable to succeed in our attempt to CAS. Invoking a multimethod that
has a stable hierarchy and a populated cache should not even have a
CAS operation (or memory allocation) on this code path, just a cache
lookup after the dispatch value is calculated.

Comment by David Santiago [ 15/Jun/12 4:45 AM ]

I've updated this patch (removing the old version, which is entirely superseded by this one). The actual changes to MultiFn.java are identical (modulo any thing that came through in the rebase), but this newer patch has tests of basic multimethod usage, including defmulti, defmethod, remove-method, prefer-method and usage of these in a hierarchy that updates in a way interleaved with calls.

Comment by David Santiago [ 15/Jun/12 6:38 AM ]

I created a really, really simple benchmark to make sure this was an improvement. The following tests were on a quad-core hyper-threaded 2012 MBP.

With two threads contending for a simple multimethod:
The lock-based multifns run at an average of 606ms, with about 12% user, 15% system CPU at around 150%.
The lockless multifns run at an average of 159ms, with about 25% user, 3% system CPU at around 195%.

With four threads contending for a simple multimethod:
The lock-based multifns run at an average of 1.2s, with about 12% user, 15% system, CPU at around 150%.
The lockless multifns run at an average of 219ms, with about 50% user, 4% system, CPU at around 330%.

You can get the code at https://github.com/davidsantiago/multifn-perf

Comment by David Santiago [ 14/Aug/12 10:02 PM ]

It's been a couple of months, and so I just wanted to check in and see if there was anything else needed to move this along.

Also, Alan Malloy pointed out to me that my benchmarks above did not mention single-threaded performance. I actually wrote this into the tests above, but I neglected to report them at the time. Here are the results on the same machine as above (multithreaded versions are basically the same as the previous comment).

With a single thread performing the same work:
The lock-based multifns run at an average of 142ms.
The lockless multifns run at an average of 115ms.

So the lockless multimethods are still faster even in a single-threaded case, although the speedup is more modest compared to the speedups in the multithreaded cases above. This is not surprising, but it is good to know.

Comment by Stuart Sierra [ 17/Aug/12 2:58 PM ]

Screened. The approach is sound.

I can confirm similar performance measurements using David Santiago's benchmark, compared with Clojure 1.5 master as of commit f5f4faf.

Mean runtime (ms) of a multimethod when called repeatedly from N threads:

|            | N=1 | N=2 | N=4 |
|------------+-----+-----+-----|
| 1.5 master |  80 | 302 | 765 |
| lockless   |  63 |  88 | 125 |

My only concern is that the extra allocations of the State class will create more garbage, but this is probably not significant if we are already using persistent maps. It would be interesting to compare this implementation with one using thread-safe mutable data structures (e.g. ConcurrentHashMap) for the cache.

Comment by David Santiago [ 17/Aug/12 7:05 PM ]

I think your assessment that it's not significant compared to the current implementation using persistent maps is correct. Regarding the extra garbage, note that the new State is only created when the hierarchy has changed or there's a cache miss (related, obviously)... situations where you're already screwed. Then it won't have to happen again for the same method (until another change to the multimethod). So for most code, it won't happen very often.

ConcurrentHashMap might be faster, it'd be interesting to see. My instinct is to keep it as close to being "made out of Clojure" as possible. In fact, it's hard to see why this couldn't be rewritten in Clojure itself some day, as I believe Chas Emerick has suggested. Also, I would point out that two of the three maps are used from the Clojure side in core.clj. I assume they would be happier if they were persistent maps.

Funny story: I was going to point out the parts of the code that were called from the clojure side just now, and alarmingly cannot find two of the functions. I think I must have misplaced them while rewriting the state into an immutable object. Going to attach a new patch with the fix and some tests for it in a few minutes.

Comment by David Santiago [ 17/Aug/12 7:44 PM ]

Latest patch for this issue. Supersedes issue988-lockless-multifn+tests.diff as of 08/17/2012.

Comment by David Santiago [ 17/Aug/12 7:49 PM ]

As promised, I reimplemented those two functions. I also added more multimethod tests to the test suite. The new tests should definitely prevent a similar mistake. While I was at it, I went through core.clj and found all the multimethod API functions I could and ensured that there were at least some basic functionality tests for all of them. The ones I found were: defmulti, defmethod, remove-all-methods, remove-method, prefer-method, methods, get-method, prefers (Some of those already had tests from the earlier version of the patch).

Really sorry about catching this right after you vetted the patch. 12771 test assertions were apparently not affected by prefers and methods ceasing to function, but now there are 12780 to help prevent a similar error. Since you just went through it, I'm leaving the older version of the patch up so you can easily see the difference to what I've added.

Comment by Rich Hickey [ 15/Sep/12 9:05 AM ]

https://github.com/clojure/clojure/commit/83ebf814d5d6663c49c1b2d0d076b57638bff673 should fix these issues. The patch here was too much of a change to properly vet.

If you could though, I'd appreciate a patch with just the multimethod tests.

Comment by Andy Fingerhut [ 15/Sep/12 10:59 AM ]

Patch clj-988-tests-only-patch-v1.txt dated Sep 15 2012 is a subset of David Santiago's
patch issue988-lockless-multifn+tests-120817.diff dated Aug 17 2012. It includes only the tests from that patch. Applies cleanly and passes tests with latest master after Rich's read/write lock change for multimethods was committed.

Comment by Rich Hickey [ 17/Sep/12 9:20 AM ]

tests-only patch ok





[CLJ-958] Make APersistentVector.iterator slightly more efficient Created: 23/Mar/12  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Chris Gray Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: Text File 0001-Make-APersistentVector.iterator-slightly-more-effici.patch    
Patch: Code

 Description   

The attached patch uses a sequence to create an iterator for persistent vectors. It should be slightly more efficient for PersistentVector objects than repeatedly calling nth (for reasons that I outline in the commit message).






[CLJ-910] [Patch] Allow for type-hinting the method receiver in memfn Created: 13/Jan/12  Updated: 01/Sep/12  Resolved: 01/Sep/12

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.5

Type: Enhancement Priority: Minor
Reporter: Tassilo Horn Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: performance, reflection

Attachments: Text File 0001-Make-memfn-allow-for-type-hinting-the-method-receive.patch    
Patch: Code
Approval: Ok

 Description   

The attached patch copies metadata given to the method name symbol of memfn to the method receiver in the expansion. That way, memfn is able to be used even for type-hinted calls resulting in a big performance win.

user> (time (dotimes [i 1000000] ((memfn intValue) 1)))
Reflection warning, NO_SOURCE_FILE:1 - call to intValue can't be resolved.
"Elapsed time: 2579.229115 msecs"
nil
user> (time (dotimes [i 1000000] ((memfn ^Number intValue) 1)))
"Elapsed time: 12.015235 msecs"
nil


 Comments   
Comment by Tassilo Horn [ 08/Mar/12 3:56 AM ]

Updated patch.

Comment by Aaron Bedra [ 21/Aug/12 6:06 PM ]

Applies properly against d4170e65d001c8c2976f1bd7159484056b9a9d6d. Things look good to me.





[CLJ-858] Improve speed of STM by removing System.currentTimeMillis Created: 17/Oct/11  Updated: 22/Nov/13  Resolved: 22/Nov/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3
Fix Version/s: Release 1.6

Type: Enhancement Priority: Minor
Reporter: Stefan Kamphausen Assignee: Unassigned
Resolution: Completed Votes: 2
Labels: performance
Environment:

Tested on Ubuntu and OSX


Attachments: File stm-rm-msecs-patch.diff    
Patch: Code
Approval: Ok

 Description   

The inner class TVal in LockingTransaction and Ref has an unused milliseconds field (that is populated with System.currentTimeMillis()).

This patch removes the milliseconds from inner class TVal in LockingTransaction.java and Ref.java. Using a little test suite[1] a increase of performance by up to 25% could be measured.

Original post: https://groups.google.com/d/topic/clojure/kc99LcUK8Tk/discussion

References:
[1] https://github.com/ska2342/clj-stm-perf-test/

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 08/Aug/13 2:48 AM ]

Patch applies cleanly. Author is a contributor.

Did not verify performance claims but I can't see how removing fields and calls to System.currentTimeMillis() could do anything but make STM usage smaller and faster.

Marking Screened.





[CLJ-669] clojure.java.io/do-copy: use java.nio for Files Created: 01/Nov/10  Updated: 22/Nov/13  Resolved: 22/Nov/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3
Fix Version/s: Release 1.6

Type: Enhancement Priority: Major
Reporter: Jürgen Hötzel Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: performance

Attachments: Text File 0001-use-java.nio-in-do-copy-method-for-Files.patch     Text File clj-669-use-java.nio-in-do-copy-for-files-patch-v2.txt     File clj-669-use-java.nio-in-do-copy-for-files-patch-v3.diff     Text File clj-669-use-java.nio-in-do-copy-for-files-patch-v3.txt    
Patch: Code
Approval: Ok

 Description   

NIO Channels reduce CPU/Disk load when copying Files (by using syscalls like sendfile internally on Linux/Solaris).

CPU-Load goes from 100% to 0% on my system when copying large files, because no userspace-copying is involved.

Patch: clj-669-use-java.nio-in-do-copy-for-files-patch-v3.diff

Screened by: Alex Miller



 Comments   
Comment by Jürgen Hötzel [ 27/Nov/10 5:05 AM ]

Patch instead of linking to github

Comment by Andy Fingerhut [ 24/May/13 12:45 PM ]

Patch clj-669-use-java.nio-in-do-copy-for-files-patch-v2.txt dated May 24 2013 is identical to patch 0001-use-java.nio-in-do-copy-method-for-Files.patch dated Nov 27 2010 except it applies cleanly to latest master. The older patch conflicts with a recent commit for CLJ-1072 that updates the obsolete #^ syntax for metadata.

Comment by Stuart Sierra [ 02/Aug/13 2:05 PM ]

Marked as incomplete pending a question: do we know that transferTo will always copy the entire file in one call? Nothing in the Javadoc for FileChannel.transferTo guarantees that it will.

Just to be safe, I think we should call transferTo in a loop, keeping track of how many bytes have been copied.

As a side note, I verified that the transferTo method is available in JDK 1.5.

Comment by Andy Fingerhut [ 10/Aug/13 3:37 AM ]

Good catch, Stuart. I verified by testing the -v2.txt patch on a 7 Gbyte file, and at least with a couple of OS/JDK combos transferTo never copied more than just under 2 Gbytes per call. It definitely needs a loop around transferTo calls to be correct.

Patch clj-669-use-java.nio-in-do-copy-for-files-patch-v3.txt dated Aug 10 2013 implements such a loop. I've tested it with OSX10.8.4+OracleJDK1.7.0_15, Ubuntu 12.04.1+OpenJDK1.6.0_27, and Ubuntu 12.04.1+OracleJDK1.7.0_25. In all cases it worked correctly on the 7 Gbyte file, and it reduced CPU load in the Java process as compared to the original version of do-copy.





[CLJ-668] Improve slurp performance by using native Java StringWriter and jio/copy Created: 01/Nov/10  Updated: 21/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jürgen Hötzel Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: io, newbie, performance

Approval: Triaged

 Description   

Instead of copying each character from InputReader to StringBuffer.

Performance improvement:

From:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 4269.980863 msecs"
104857600
To:
user> (time (count (slurp "/home/juergen/test.dat")))
"Elapsed time: 1140.854023 msecs"
104857600

Patch: http://github.com/juergenhoetzel/clojure/commit/af1a5713cd485ba5f1db25c37906ebaf7d474b47



 Comments   
Comment by Alex Miller [ 21/Apr/14 3:28 PM ]

This is double-better with the changes in Clojure 1.6 to improve jio/copy performance by using the NIO impl. Rough timing difference on a 25M file: old= 2316.021 msecs, new= 93.319 msecs.

Filer did not supply a patch and is not a contributor. If someone wants to make a patch (and better timing info demonstrating performance improvements), that would be great.





[CLJ-400] A faster flatten Created: 13/Jul/10  Updated: 03/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Anonymous Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance


 Description   

As discussed in this thread, I am submitting a more performant version of flatten for review. It has the same semantics as the current core/flatten. I have also updated the doc string to say that "(flatten nil) returns the empty list", because that's what the current version of core/flatten does as well.

I haven't mailed in a CA yet, but I will tomorrow morning.

Edit: Of course I'd mess my first ticket up . I am not immediately seeing an option to edit this to add the "patch" tag or add the "ready to test" action. Sorry folks



 Comments   
Comment by Assembla Importer [ 24/Aug/10 12:19 AM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/400
Attachments:
flatten-enhancement.diff - https://www.assembla.com/spaces/clojure/documents/c5chtAJQir353NeJe5cbLr/download/c5chtAJQir353NeJe5cbLr





[CLJ-15] Incremental hashcode calculation for collections Created: 17/Jun/09  Updated: 14/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Rich Hickey Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance

Attachments: File lazy-incremental-hashes.diff    
Patch: Code

 Description   
Reported by richhickey, Dec 17, 2008
So hachCode can be final, more efficient to calc as you go.
Formerly Google Code Issue 11


 Comments   
Comment by Assembla Importer [ 24/Aug/10 3:44 AM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/15

Comment by Christophe Grand [ 08/Mar/13 6:20 AM ]

Wouldn't the naive approach incur realizing lazy sequences when adding them to a list or a vector or as values in a map?

Comment by Christophe Grand [ 26/Aug/13 3:40 AM ]

The lazy-incremental-hashes.diff introduces lazy incremental hashes based on structural sharing.

Comment by Christophe Grand [ 26/Aug/13 3:42 AM ]

Why is this identified as Blocker?

Comment by Christophe Grand [ 26/Aug/13 3:43 AM ]

setting priority to minor

Comment by Andy Fingerhut [ 26/Aug/13 7:46 AM ]

I've seen this "edit a ticket, it changes to Priority=Blocker" behavior before. I believe some of the older tickets have no Priority field at all, and when you edit any of their properties, it creates the priority field with a default value of Blocker.

Comment by Alex Miller [ 26/Aug/13 8:06 AM ]

Yes, concur with Andy's explanation on priority change. I just bulk-edited all open CLJ tickets with null priority and set their priority.

Comment by Andy Fingerhut [ 30/Jan/14 8:01 PM ]

Patch lazy-incremental-hashes.diff still applies cleanly as of Jan 30 2014 latest Clojure master, but now fails tests due to recent commits involving hash changes. I have not checked how difficult or easy it might be to update the patch to pass tests again.





[CCACHE-27] Missing LU and LRU is linear complexity - performance Created: 04/Sep/12  Updated: 12/Sep/12

Status: Open
Project: core.cache
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Ghadi Shayban Assignee: Fogus
Resolution: Unresolved Votes: 0
Labels: enhancement, performance
Environment:

Mac Oracle JDK, Linux OpenJDK


Attachments: Text File priority-map-fixed.patch     Text File priority-map.patch    
Patch: Code

 Description   

Profiling some code with YourKit showed a hotspot in cache statistics on (miss) for LU and LRU caches.

Basically two issues: (count (keys {....})) is a count of a seq, not efficient. Replaced with a count of the map.

Secondly, (apply f anything) is slow. Profiler showed that the (apply min-key) was really slow. This is mitigated by using a c.d.priority-map instead. On a priority-map, (ffirst {}) or (first (peek {}).

Also inverted the logic for threshold comparison. Since there is a (build-leastness-queue) that populates statistics, should caches should favor codepaths for the state of being full all the time?



 Comments   
Comment by Ghadi Shayban [ 06/Sep/12 10:49 PM ]

I take back the part about (apply) being slow. I think it's more the fact that it's doing a linear scan on keys every single time.

(apply) does show up a lot in a profiler, but I haven't figured out why. (apply + (range big)) seems to even be slightly faster than (reduce +) on the same range.

Comment by Ghadi Shayban [ 12/Sep/12 9:27 AM ]

Patch in proper format





[ALGOM-4] algo.monad state-m fetch-val bug and efficiency issue Created: 08/Sep/12  Updated: 05/Feb/14  Resolved: 05/Feb/14

Status: Closed
Project: algo.monads
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Sacha De Vos Assignee: Konrad Hinsen
Resolution: Completed Votes: 0
Labels: bug, performance
Environment:

irrelevant



 Description   

;; the bug

(defn fetch-val
"Return a state-monad function that assumes the state to be a map and
returns the value corresponding to the given key. The state is not modified."
[key]
(domonad state-m
[s (fetch-state)]
(key s))) ;; does not work for integer or string keys

;; I propose replacing it with (get s key)

;; the efficiency issue :
;;
;; domonad with monad parameter binds all the monad functions,
;; looking these up in the state-m map on each call
;;
;; solution :

(defn fetch-val
"Return a state-monad function that assumes the state to be a map and
returns the value corresponding to the given key. The state is not modified."
[key]
(fn [s]
[(get s key) s]))

;; - we avoid the monad map lookups
;; - coding style brought up to par with the rest of state-m functions



 Comments   
Comment by Konrad Hinsen [ 05/Feb/14 5:52 AM ]

This was handled long ago: https://github.com/clojure/algo.monads/commit/90909e30965bb59fccc8e527d31f8aaabc9d404b





Generated at Wed Apr 23 15:36:25 CDT 2014 using JIRA 4.4#649-r158309.