[CLJ-1090] Indirect function calls through Var instances fail to clear locals Created: 22/Oct/12 Updated: 30/Nov/12 |
|
| Status: | Open |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | Release 1.4 |
| Fix Version/s: | None |
| Type: | Defect | Priority: | Minor |
| Reporter: | Spencer Tipping | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
Probably all, but observed on Ubuntu 12.04, OpenJDK 6 |
||
| Attachments: |
|
| Patch: | Code |
| Approval: | Vetted |
| Description |
|
If you make a function call indirectly by invoking a Var object (which derefs itself and invokes the result), the invocation parameters remain in the thread's local stack for the duration of the function call, even though they are needed only long enough to be passed into the deref'd function. As a result, passing a lazy seq into a function invoked in its Var form may run out of memory if the seq is forced inside that function. For example: (defn total [xs] (reduce + 0 xs)) I can provide a patch if it would be useful. The fix should be trivial, something along the lines of wrapping each argN in clojure/lang/Var.java inside a Util.ret1(argN, argN = null) as is done in RestFn.java. |
| Comments |
| Comment by Spencer Tipping [ 23/Oct/12 1:37 PM ] |
|
Sorry, I typo'd the example. (defn total ...) should be (defn sum ...). |
| Comment by Timothy Baldridge [ 27/Nov/12 11:45 AM ] |
|
fixed typeo in example |
| Comment by Timothy Baldridge [ 27/Nov/12 11:47 AM ] |
|
Couldn't reproduce the exception, but the 2nd example did chew through about 4x the amount of memory. Vetting. |
| Comment by Timothy Baldridge [ 29/Nov/12 2:57 PM ] |
|
adding a patch. Since most of Clojure ends up running this code in one way or another, I'd assert that tests are included as part of the normal Clojure test process. Patch simply calls Util.ret1(argx, argx=null) on all invoke calls. |
| Comment by Timothy Baldridge [ 29/Nov/12 3:17 PM ] |
|
And as a note, both examples in the original report now have extremely similar memory usages. |
| Comment by Spencer Tipping [ 30/Nov/12 2:22 PM ] |
|
Sounds great, and the patch looks good too. Let me know if I need to do anything else. |
[CLJ-1087] clojure.data/diff uses set union on key seqs Created: 15/Oct/12 Updated: 15/Oct/12 |
|
| Status: | Open |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Enhancement | Priority: | Minor |
| Reporter: | Tom Jack | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | bug, performance | ||
| Attachments: |
|
| Patch: | Code |
| Description |
|
clojure.data/diff, on line 118, defines: java.util.Map (diff-similar [a b] (diff-associative a b (set/union (keys a) (keys b)))) Since keys returns a key seq, this seems like an error. clojure.set/union has strange and inconsistent behavior with regard to non-sets, and in this case the two key seqs are concatenated. Based on a cursory benchmark, it seems that this bug The patch is easy (just call set on each key seq). |
| Comments |
| Comment by Andy Fingerhut [ 15/Oct/12 2:52 PM ] |
|
clj-1087-diff-perf-enhance-patch-v1.txt dated Oct 15 2012 implements Tom's suggested performance enhancement, although not exactly in the way he suggested. It does calculate the union of the two key sequences. |
[CLJ-1050] Remove a lock in the common case of deref of delay Created: 26/Aug/12 Updated: 18/Sep/12 Resolved: 18/Sep/12 |
|
| Status: | Closed |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | Release 1.4 |
| Fix Version/s: | None |
| Type: | Enhancement | Priority: | Trivial |
| Reporter: | Nicolas Oury | Assignee: | Unassigned |
| Resolution: | Declined | Votes: | 0 |
| Labels: | enhancement, patch, performance | ||
| Attachments: |
|
| Description |
|
Currently, when deref is called in Delay.java, a lock on the Delay is always acquired. The attached patch changes this behaviour to the following:
This is faster than what is done currently and can be very much faster if there is contention on the delay. |
| Comments |
| Comment by Nicolas Oury [ 27/Aug/12 2:37 AM ] |
|
Please close and reject. The patch is not working if val has non final fields. |
[CLJ-1000] Performance drop in PersistentHashMap.valAt(...) in v.1.4 -- Util.hasheq(...) ? Created: 21/May/12 Updated: 01/Mar/13 Resolved: 11/Dec/12 |
|
| Status: | Closed |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | Release 1.4 |
| Fix Version/s: | Release 1.5 |
| Type: | Enhancement | Priority: | Major |
| Reporter: | Oleksandr Shyshko | Assignee: | Timothy Baldridge |
| Resolution: | Completed | Votes: | 1 |
| Labels: | performance | ||
| Environment: |
Java(TM) SE Runtime Environment (build 1.7.0_04-b21) |
||
| Attachments: |
|
| Patch: | Code |
| Approval: | Ok |
| Description |
|
It seems there is a 30-40% performance degradation of PersistentHashMap.valAt(...) in Clojure 1.4. I have created a demo project with more details and some profiling information here: |
| Comments |
| Comment by Christophe Grand [ 27/Nov/12 8:30 AM ] |
|
I added a patch consisting of three commits:
|
| Comment by Timothy Baldridge [ 30/Nov/12 12:10 PM ] |
|
In the process of screening this, I'm not seeing much of a performance difference after applying the patch. before patch: Version: 1.5.0-master-SNAPSHOT after patch: Version: 1.5.0-master-SNAPSHOT clojure 1.4: Version: 1.4.0 clojure 1.3 Version: 1.3.0 I blew away my local clojure repo and re-applied the patch just to make sure, but the results are the same. Does this fix not optimize the case given in the original test project? For reference I'm running this code: (defn -main (println) (def mm 10000) (def str-keys (map str (range mm))) (def kw-keys (map #(keyword (str %)) (range mm))) (def sym-keys (map #(symbol (str %)) (range mm))) (println)) |
| Comment by Christophe Grand [ 30/Nov/12 2:10 PM ] |
|
Sorry, I was too quick to react on the ML (someone said it was related to hasheq caching and since I had the patch almost ready: on a project I noticed too much time spent computing hasheq on vectors). In 1.4, for a "regular" object, it must fails two instanceof tests before calling .hashCode(). |
| Comment by Timothy Baldridge [ 30/Nov/12 2:16 PM ] |
|
Marking as incomplete, should we also delete the patch as it seems like it should be in a different ticket? |
| Comment by Christophe Grand [ 03/Dec/12 10:00 AM ] |
|
In 1.3, #'hash was going through Object.hashCode and thus was simple and fast. Plus collections hashes were cached. The caching-hasheq-v2.diff patchset reintroduces hashes caching for collections/hasheq and reorders the instanceof tests (to test for IHashEq before Number) and makes Keyword and Symbol implement IHashEq to branch fast in Util.hasheq. I recommend adding a collection test to the current benchmark: (defn -main
[& args]
(println)
(println "Version: " (clojure-version))
(def mm 10000)
(def str-keys (map str (range mm)))
(def m (zipmap str-keys (range mm)))
(time (dotimes [i mm] (doseq [k str-keys] (m k))))
(def kw-keys (map #(keyword (str %)) (range mm)))
(def m (zipmap kw-keys (range mm)))
(time (dotimes [i mm] (doseq [k kw-keys] (m k))))
(def sym-keys (map #(symbol (str %)) (range mm)))
(def m (zipmap sym-keys (range mm)))
(time (dotimes [i mm] (doseq [k sym-keys] (m k))))
(def vec-keys (map (comp (juxt keyword symbol identity) str) (range mm)))
(def m (zipmap vec-keys (range mm)))
(time (dotimes [i mm] (doseq [k vec-keys] (m k))))
(println))
|
| Comment by Timothy Baldridge [ 03/Dec/12 10:38 AM ] |
|
For some reason I can't get v2 to build against master. It applies cleanly, but fails to build. |
| Comment by Christophe Grand [ 03/Dec/12 11:30 AM ] |
|
Timothy: I inadvertently deleted a "public" modifier before commiting... fixed in caching-hasheq-v3.diff |
| Comment by Timothy Baldridge [ 10/Dec/12 11:00 AM ] |
|
I now get the following results: Version: 1.4.0 Version: 1.5.0-master-SNAPSHOT (pre-patch) Version: 1.5.0-master-SNAPSHOT (post-patch) Marking as screened |
| Comment by Oleksandr Shyshko [ 10/Dec/12 3:53 PM ] |
|
Please, could you add as a comment the bench result using 1.3 vs 1.5-master-post-patch? |
| Comment by Oleksandr Shyshko [ 11/Dec/12 2:13 PM ] |
|
The performance with 1.5-master is now very close to 1.3 for 3/4 of the benchmark. However, this code is still showing 43% performance drop (3411 ms VS 6030 ms – 1.3 VS 1.5-master): (def str-keys (map str (range mm))) Version: 1.3.0 Version: 1.4.0 Version: 1.5.0-master-SNAPSHOT To reproduce: $ cd .. $ git clone https://github.com/oshyshko/clj-perf.git |
[CLJ-988] the locking in MultiFn.java (synchronized methods) can cause lots of contention in multithreaded programs Created: 08/May/12 Updated: 21/Sep/12 Resolved: 21/Sep/12 |
|
| Status: | Closed |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Release 1.5 |
| Type: | Enhancement | Priority: | Major |
| Reporter: | Kevin Downey | Assignee: | Stuart Sierra |
| Resolution: | Completed | Votes: | 10 |
| Labels: | bug, performance | ||
| Attachments: |
|
| Patch: | Code and Test |
| Approval: | Ok |
| Description |
|
if you call a single multimethod a lot in multithreaded code you get lots of contention for the lock on the multimethod. this contention slows things down a lot. this is due to getMethod being synchronized. it would be great if there was some fast non-locking path through the multimethod. |
| Comments |
| Comment by Kevin Downey [ 08/May/12 11:30 AM ] |
|
http://groups.google.com/group/clojure-dev/browse_thread/thread/6a8219ae3d4cd0ae?hl=en |
| Comment by David Santiago [ 11/May/12 6:38 AM ] |
|
Here's a stab in the dark attempt at rewriting MultiFn to use atoms to swap between immutable copies of its otherwise mutable state. The four pieces of the MultiFn's state that are mutable and protected by locks are now contained in the MultiFn.State class, which is immutable and contains convenience functions for creating a new one with one field changed. An instance of this class is now held in an atom in the MultiFn called state. Changes to any of these four members are now done with an atomic swap of these State objects. The getMethod/findAndCacheBestMethod complex was rewritten to better enable the atomic logic. findAndCacheBestMethod was replaced with findBestMethod, which merely looks up the method; the caching logic was moved to getMethod so that it can be retried easily as part of the work that method does. As it was findAndCacheBestMethod seemingly had the potential to cause a stack overflow in a perfect storm of heavy concurrent modification, since it calls itself recursively if it finds that the hierarchy has changed while it has done its work. This logic is now done in the CAS looping of getMethod, so hopefully that is not even an unlikely possibility anymore. There is still seemingly a small race here, since the check is done of a regular variable against the value in a ref. Now as before, the ref could be updated just after you do the check, but before the MultiFn's state is updated. Of course, only the method lookup part of a MultiFn call was synchronized before; it could already change after the lookup but before the method itself executed, having a stale method finish seemingly after the method had been changed. Things are no different now in general, with the atom-based approach, so perhaps this race is not a big deal, as a stale value can't persist for long. The patch passes the tests and Clojure and multimethods seems to work. |
| Comment by Kevin Downey [ 12/May/12 8:59 PM ] |
|
this patch gets rid of the ugly lock contention issues. I have not been able to benchmark it vs. the old multimethod implementation, but looking at it, I would not be surprised if it is faster when the system is in a steady state. |
| Comment by Stuart Halloway [ 08/Jun/12 12:11 PM ] |
|
This looks straightforward, except for getMethod. Can somebody with context add more discussion of how method caching works? Also, it would be great to have tests with this one. |
| Comment by David Santiago [ 15/Jun/12 4:44 AM ] |
|
Obviously I didn't write the original code, so I'm not the ideal There are four pieces of state that change as the multimethod is either
I think reset(), addMethod(), removeMethod(), preferMethod(), Invoking a multimethod through its invoke() function will call The biggest change in the patch is to the In the original code, getFn() does nothing but bounce through 1. It checks if there's a method for the dispatch value in the 2. If not, it calls findAndCacheBestMethod() on the 1. It iterates through every map entry in the method table, 2. If it did not find a suitable entry, it returns null. 3. Otherwise, it checks if the hierarchy has been changed since the cache 3. Failing all else, it will return the method for the default Again, remember everything in the list above happens in a call to a Moving on now to the patch in this issue. As mentioned, the main To implement this change, I pulled the implicit looping logic from I'll now describe the operation of the logic in the for loop. The 1. First we dereference our state, and assign this value to both 2. Check if the cache is stale, and flush it if so. If the cache 3. Check if the methodCache has an entry for this dispatch
4. The value was not in the methodCache, so call the function The one thing that is possibly questionable is the check at this 5. Finally, if findBestMethod() failed to find us a method for the So the organization of getMethod() in the patch is complicated by two |
| Comment by David Santiago [ 15/Jun/12 4:45 AM ] |
|
I've updated this patch (removing the old version, which is entirely superseded by this one). The actual changes to MultiFn.java are identical (modulo any thing that came through in the rebase), but this newer patch has tests of basic multimethod usage, including defmulti, defmethod, remove-method, prefer-method and usage of these in a hierarchy that updates in a way interleaved with calls. |
| Comment by David Santiago [ 15/Jun/12 6:38 AM ] |
|
I created a really, really simple benchmark to make sure this was an improvement. The following tests were on a quad-core hyper-threaded 2012 MBP. With two threads contending for a simple multimethod: With four threads contending for a simple multimethod: You can get the code at https://github.com/davidsantiago/multifn-perf |
| Comment by David Santiago [ 14/Aug/12 10:02 PM ] |
|
It's been a couple of months, and so I just wanted to check in and see if there was anything else needed to move this along. Also, Alan Malloy pointed out to me that my benchmarks above did not mention single-threaded performance. I actually wrote this into the tests above, but I neglected to report them at the time. Here are the results on the same machine as above (multithreaded versions are basically the same as the previous comment). With a single thread performing the same work: So the lockless multimethods are still faster even in a single-threaded case, although the speedup is more modest compared to the speedups in the multithreaded cases above. This is not surprising, but it is good to know. |
| Comment by Stuart Sierra [ 17/Aug/12 2:58 PM ] |
|
Screened. The approach is sound. I can confirm similar performance measurements using David Santiago's benchmark, compared with Clojure 1.5 master as of commit f5f4faf. Mean runtime (ms) of a multimethod when called repeatedly from N threads: | | N=1 | N=2 | N=4 | |------------+-----+-----+-----| | 1.5 master | 80 | 302 | 765 | | lockless | 63 | 88 | 125 | My only concern is that the extra allocations of the State class will create more garbage, but this is probably not significant if we are already using persistent maps. It would be interesting to compare this implementation with one using thread-safe mutable data structures (e.g. ConcurrentHashMap) for the cache. |
| Comment by David Santiago [ 17/Aug/12 7:05 PM ] |
|
I think your assessment that it's not significant compared to the current implementation using persistent maps is correct. Regarding the extra garbage, note that the new State is only created when the hierarchy has changed or there's a cache miss (related, obviously)... situations where you're already screwed. Then it won't have to happen again for the same method (until another change to the multimethod). So for most code, it won't happen very often. ConcurrentHashMap might be faster, it'd be interesting to see. My instinct is to keep it as close to being "made out of Clojure" as possible. In fact, it's hard to see why this couldn't be rewritten in Clojure itself some day, as I believe Chas Emerick has suggested. Also, I would point out that two of the three maps are used from the Clojure side in core.clj. I assume they would be happier if they were persistent maps. Funny story: I was going to point out the parts of the code that were called from the clojure side just now, and alarmingly cannot find two of the functions. I think I must have misplaced them while rewriting the state into an immutable object. Going to attach a new patch with the fix and some tests for it in a few minutes. |
| Comment by David Santiago [ 17/Aug/12 7:44 PM ] |
|
Latest patch for this issue. Supersedes issue988-lockless-multifn+tests.diff as of 08/17/2012. |
| Comment by David Santiago [ 17/Aug/12 7:49 PM ] |
|
As promised, I reimplemented those two functions. I also added more multimethod tests to the test suite. The new tests should definitely prevent a similar mistake. While I was at it, I went through core.clj and found all the multimethod API functions I could and ensured that there were at least some basic functionality tests for all of them. The ones I found were: defmulti, defmethod, remove-all-methods, remove-method, prefer-method, methods, get-method, prefers (Some of those already had tests from the earlier version of the patch). Really sorry about catching this right after you vetted the patch. 12771 test assertions were apparently not affected by prefers and methods ceasing to function, but now there are 12780 to help prevent a similar error. Since you just went through it, I'm leaving the older version of the patch up so you can easily see the difference to what I've added. |
| Comment by Rich Hickey [ 15/Sep/12 9:05 AM ] |
|
https://github.com/clojure/clojure/commit/83ebf814d5d6663c49c1b2d0d076b57638bff673 should fix these issues. The patch here was too much of a change to properly vet. If you could though, I'd appreciate a patch with just the multimethod tests. |
| Comment by Andy Fingerhut [ 15/Sep/12 10:59 AM ] |
|
Patch clj-988-tests-only-patch-v1.txt dated Sep 15 2012 is a subset of David Santiago's |
| Comment by Rich Hickey [ 17/Sep/12 9:20 AM ] |
|
tests-only patch ok |
[CLJ-910] [Patch] Allow for type-hinting the method receiver in memfn Created: 13/Jan/12 Updated: 01/Sep/12 Resolved: 01/Sep/12 |
|
| Status: | Closed |
| Project: | Clojure |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Release 1.5 |
| Type: | Enhancement | Priority: | Minor |
| Reporter: | Tassilo Horn | Assignee: | Unassigned |
| Resolution: | Completed | Votes: | 1 |
| Labels: | performance, reflection | ||
| Attachments: |
|
| Patch: | Code |
| Approval: | Ok |
| Description |
|
The attached patch copies metadata given to the method name symbol of memfn to the method receiver in the expansion. That way, memfn is able to be used even for type-hinted calls resulting in a big performance win. user> (time (dotimes [i 1000000] ((memfn intValue) 1))) Reflection warning, NO_SOURCE_FILE:1 - call to intValue can't be resolved. "Elapsed time: 2579.229115 msecs" nil user> (time (dotimes [i 1000000] ((memfn ^Number intValue) 1))) "Elapsed time: 12.015235 msecs" nil |
| Comments |
| Comment by Tassilo Horn [ 08/Mar/12 3:56 AM ] |
|
Updated patch. |
| Comment by Aaron Bedra [ 21/Aug/12 6:06 PM ] |
|
Applies properly against d4170e65d001c8c2976f1bd7159484056b9a9d6d. Things look good to me. |