Details
-
Type:
Defect
-
Status:
Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: Release 1.4
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
There is a PermGen memory leak that we have tracked down to protocol methods and multimethods called inside an eval, because of the caches these methods use. The problem only arises when the value being cached is an instance of a class (such as a function or reify) that was defined inside the eval. Thus extending IFn or dispatching a multimethod on an IFn are likely triggers.
My fellow LonoClouder, Jeff Dik describes how to reproduce and work around the problem:
The easiest way that I have found to test this is to set "-XX:MaxPermSize" to a reasonable value so you don't have to wait too long for the PermGen space to fill up, and to use "-XX:+TraceClassLoading" and "-XX:+TraceClassUnloading" to see the classes being loaded and unloaded.
(defproject permgen-scratch "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.5.0-RC1"]] :jvm-opts ["-XX:MaxPermSize=32M" "-XX:+TraceClassLoading" "-XX:+TraceClassUnloading"])
You can use lein swank 45678 and connect with slime in emacs via M-x slime-connect.
To monitor the PermGen usage, you can find the Java process to watch with "jps -lmvV" and then run "jstat -gcold <PROCESS_ID> 1s". According to the jstat docs, the first column (PC) is the "Current permanent space capacity (KB)" and the second column (PU) is the "Permanent space utilization (KB)". VisualVM is also a nice tool for monitoring this.
Multimethod leak
Evaluating the following code will run a loop that eval's (take* (fn foo [])).
(defmulti take* (fn [a] (type a))) (defmethod take* clojure.lang.Fn [a] '()) (def stop (atom false)) (def sleep-duration (atom 1000)) (defn run-loop [] (when-not @stop (eval '(take* (fn foo []))) (Thread/sleep @sleep-duration) (recur))) (future (run-loop)) (reset! sleep-duration 0)
In the lein swank session, you will see many lines like below listing the classes being created and loaded.
[Loaded user$eval15802$foo__15803 from __JVM_DefineClass__] [Loaded user$eval15802 from __JVM_DefineClass__]
These lines will stop once the PermGen space fills up.
In the jstat monitoring, you'll see the amount of used PermGen space (PU) increase to the max and stay there.
- PC PU OC OU YGC FGC FGCT GCT 31616.0 31552.7 365952.0 0.0 4 0 0.000 0.129 32000.0 31914.0 365952.0 0.0 4 0 0.000 0.129 32768.0 32635.5 365952.0 0.0 4 0 0.000 0.129 32768.0 32767.6 365952.0 1872.0 5 1 0.000 0.177 32768.0 32108.2 291008.0 23681.8 6 2 0.827 1.006 32768.0 32470.4 291008.0 23681.8 6 2 0.827 1.006 32768.0 32767.2 698880.0 24013.8 8 4 1.073 1.258 32768.0 32767.2 698880.0 24013.8 8 4 1.073 1.258 32768.0 32767.2 698880.0 24013.8 8 4 1.073 1.258
A workaround is to run prefer-method before the PermGen space is all used up, e.g.
(prefer-method take* clojure.lang.Fn java.lang.Object)
Then, when the used PermGen space is close to the max, in the lein swank session, you will see the classes created by the eval'ing being unloaded.
[Unloading class user$eval5950$foo__5951] [Unloading class user$eval3814] [Unloading class user$eval2902$foo__2903] [Unloading class user$eval13414]
In the jstat monitoring, there will be a long pause when used PermGen space stays close to the max, and then it will drop down, and start increasing again when more eval'ing occurs.
- PC PU OC OU YGC FGC FGCT GCT 32768.0 32767.9 159680.0 24573.4 6 2 0.167 0.391 32768.0 32767.9 159680.0 24573.4 6 2 0.167 0.391 32768.0 17891.3 283776.0 17243.9 6 2 50.589 50.813 32768.0 18254.2 283776.0 17243.9 6 2 50.589 50.813
The defmulti defines a cache that uses the dispatch values as keys. Each eval call in the loop defines a new foo class which is then added to the cache when take* is called, preventing the class from ever being GCed.
The prefer-method workaround works because it calls clojure.lang.MultiFn.preferMethod, which calls the private MultiFn.resetCache method, which completely empties the cache.
Protocol leak
The leak with protocol methods similarly involves a cache. You see essentially the same behavior as the multimethod leak if you run the following code using protocols.
(defprotocol ITake (take* [a])) (extend-type clojure.lang.Fn ITake (take* [this] '())) (def stop (atom false)) (def sleep-duration (atom 1000)) (defn run-loop [] (when-not @stop (eval '(take* (fn foo []))) (Thread/sleep @sleep-duration) (recur))) (future (run-loop)) (reset! sleep-duration 0)
Again, the cache is in the take* method itself, using each new foo class as a key.
A workaround is to run -reset-methods on the protocol before the PermGen space is all used up, e.g.
(-reset-methods ITake)
This works because -reset-methods replaces the cache with an empty MethodImplCache.