<< Back to previous view

[DFRS-2] Make writing footer checksums less expensive or optional Created: 17/Dec/13  Updated: 18/Dec/13

Status: Open
Project: data.fressian
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Ghadi Shayban Assignee: Stuart Halloway
Resolution: Unresolved Votes: 0
Labels: None

Approval: Incomplete

 Description   

Problem:
JVM profiler indicates checksums as implemented are a significant bottleneck.

Cause:
impl.RawOutput wraps the provided OutputStream with a CheckedOutputStream. Every time a rawInt is written, CheckedOutputStream calls on its checksum to update itself.

Adler32's update method happens to be native, which may not be germane to the problem.
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/zip/Adler32.java#91

The read side of data.fressian already exposes a knob for checksums to be ignored in RawInput. No such knob exists on the write side.

Checksums are used in the footer methods. They may be extremely useful for data at rest, but may be redundant with other out-of-band mechanisms.

Possible solutions
Buffering so that checksums don't recalculate frequently.
Exposing a knob to control whether write checksums are enabled. This would potentially involve changes with the footer.



 Comments   
Comment by Stuart Halloway [ 18/Dec/13 8:33 AM ]

It is definitely possible that the checksum calculation dings perf. (And if so, another possible solution is just removing checksums entirely from Fressian.)

That said, I don't want to trust a profiler. To move this forward, would like to see a benchmark of a real-world use case without the profiler in play.





[CLJ-1773] Support for REPL commands for tooling Created: 01/Jul/15  Updated: 01/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: repl

Approval: Incomplete

 Description   

Per http://dev.clojure.org/display/design/Socket+Server+REPL, want to enhance repl to support "commands" useful for nested repls or for parallel tooling repls communicating over sockets (CLJ-1671).

Commands are defined as keywords in the "repl" namespace. The REPL will trap these after reading but before evaluation. Currently this is a closed set, but perhaps it could be open - the server socket repl could then install new ones if desired when repl is invoked.

Commands:

  • :repl/quit - same as Ctrl-D but works in terminal environments where that's not feasible. Allows for backing out of a nested REPL.
  • :repl/push - push the current repl "state" (tbd what that is, but at least ns) to a stateful map in the runtime. Returns coordinates that can be used to retrieve it elsewhere
  • :repl/pull <coords> - retrieves the repl state defined by the coordinates

In the tooling scenario, it is expected that there are two repls - the client repl and the tooling repl. The tooling can send :repl/push over the client repl before startup and retrieve the coordinates (which don't change). Then the tooling repl can call :repl/pull at any time to retrieve the state of the client repl.






[CLJ-1741] deftype class literals and instances loaded from different classloaders when recompiling namespace Created: 30/May/15  Updated: 18/Jun/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Stephen Nelson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: aot, classloader, compiler

Attachments: Text File 0001-CLJ-1714-Don-t-load-AOT-class-when-compiling-already.patch    
Patch: Code
Approval: Incomplete

 Description   

Scenario: Given two files:

src/dispatch/core.clj:

(ns dispatch.core (:require [dispatch.dispatch]))

src/dispatch/dispatch.clj:

(ns dispatch.dispatch)
(deftype T [])
(def t (->T))
(println "T = (class t):" (= T (class t)))

Compile first core, then dispatch:

java -cp src:target/classes:clojure.jar -Dclojure.compile.path=target/classes clojure.main
user=> (compile 'dispatch.core)
T = (class t): true
dispatch.core
user=> (compile 'dispatch.dispatch)
T = (class t): false     ;; expected true
dispatch.dispatch

This scenario more commonly occurs in a leiningen project with :aot :all. Files are compiled in alphabetical order with :all. In this case, dispatch.core will be compiled first, then dispatch.dispatch.

Cause:

(compile 'dispatch.core)

  • transitively compiles dispatch.dispatch
  • writes .class files to compile-path (which is on the classpath)
  • assertion passes

(compile 'dispatch.dispatch)

  • due to prior compile, load dispatch.dispatch__init is loaded via the appclassloader
  • ->T constructor will use new bytecode to instantiate a T instance - this uses appclassloader, loaded from compiled T on disk
  • however, T class literals are resolved with RT.classForName, which checks the dynamic classloader cache, so uses old runtime version of T, instead of on-disk version

In 1.6, RT.classForName() did not check dynamic classloader cache, so loaded T from disk as with instances. This was changed in CLJ-979 to support other redefinition and AOT mixing usages.

Approaches:

1) Compile in reverse dependency order to avoid compiling twice.

Either swap the order of compilation in the first example or specify the order in project.clj:

:aot [dispatch.dispatch dispatch.core]

This is a short-term workaround.

2) Move the deftype into a separate namespace from where it is used so it is not redefined on the second compile. This is another short-term workaround.

3) Do not put compile-path on the classpath (this violates current expectations, but avoids loading dispatch__init)

(set! *compile-path* "foo")
(compile 'dispatch.core)
(compile 'dispatch.dispatch)

This is not easy to set up via Leiningen currently.

4) Compile each file with an independent Clojure runtime - avoids using cached classes in DCL for class literals.

Probably too annoying to actually do right now in Leiningen or otherwise.

5) Make compilation non-transitive. This is in the ballpark of CLJ-322, which is another can of worms. Also possibly where we should be headed though.

Screening: I do not believe the proposed patch is a good idea - it papers over the symptom without addressing the root cause. I think we need to re-evaluate how compilation works with regard to compile-path (#3) and transitivity (CLJ-322) (#5), but I think we should do this after 1.7. - Alex

See also: CLJ-1650



 Comments   
Comment by Alex Miller [ 30/May/15 8:50 PM ]

Pulling into 1.7 for consideration.

Comment by Stephen Nelson [ 30/May/15 8:55 PM ]

I've added a debug flag to my example that causes type instance hashcodes and their class-loaders to be printed.

Compiling dispatch.core
deftype => 652433136 (clojure.lang.DynamicClassLoader@23c30a20)
defmethod => 652433136 (clojure.lang.DynamicClassLoader@23c30a20)
instance => 652433136 (clojure.lang.DynamicClassLoader@23c30a20)
dispatch:  :pass
Compiling dispatch.dispatch
deftype => 652433136 (clojure.lang.DynamicClassLoader@23c30a20)
defmethod => 652433136 (clojure.lang.DynamicClassLoader@23c30a20)
instance => 760357227 (sun.misc.Launcher$AppClassLoader@42a57993)
dispatch:  :fail
Comment by Nicola Mometto [ 01/Jun/15 7:23 AM ]

The compiler has weird loading rules when using `compile` and both a clj file and a class file are present in the classpath.

This bug happens because RT.load will load the AOT class file rebinding the ->Ctor to use the AOT deftype instance.

A fix for this would be making load "loaded libs" aware to avoid unnecessary/harmful reloadings.

Comment by Nicola Mometto [ 01/Jun/15 10:55 AM ]

The attached patch fixes this bug by keeping track of what has already been loaded and loading the AOT class only if necessary

Comment by Alex Miller [ 16/Jun/15 2:24 PM ]

Original description (since replaced):

Type-dispatching multimethods are defined using the wrong type instance

When using a multimethod that dispatches on types, such as print-dup/print-method, the type reference passed to addMethod in the presence of aot is incorrect on the second load of the namespace. This means that if the namespace has already been loaded as a dependency of another file, the second load when the namespace is loaded for aot compilation will produce a multimethod that fails to dispatch correctly.

I've created an example repository:
https://github.com/sfnelson/clj-mm-dispatch

To reproduce independently, create a namespace that contains a deftype and a multimethod dispatching on the type, and a second namespace that requires the first and sorts alphabetically before the first. Aot-compile both namespaces. When the type-defining namespace is loaded via require it produces a class file for the deftype. When it is loaded the second time for aot-compilation, the type corresponding to the existing class file is given to the defmethod, instead of the new class constructed by loading the namespace. This causes the multimethod it fail to dispatch correctly.

To me this issue seems similar to CLJ-979: the type passed to the multimethod is retrieved using the wrong classloader. This suggests that it might have wider implications than AOT and multimethod dispatch.

Comment by Nicola Mometto [ 18/Jun/15 11:09 AM ]

I just realized this ticket is a duplicate of CLJ-1650





[CLJ-1671] Clojure socket server Created: 09/Mar/15  Updated: 08/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: repl

Attachments: Text File clj-1671-2.patch     Text File clj-1671-3.patch     Text File clj-1671-4.patch     Text File clj-1671-5.patch     Text File clj-1671-6.patch    
Patch: Code and Test
Approval: Incomplete

 Description   

Programs often want to provide REPLs to users in contexts when a) network communication is desired, b) capturing stdio is difficult, or c) when more than one REPL session is desired. In addition, tools that want to support REPLs and simultaneous conversations with the host are difficult with a single stdio REPL as currently provided by Clojure.

Tooling and users often need to enable a REPL on a program without changing the program, e.g. without asking author or program to include code to start a REPL host of some sort. Thus a solution must be externally and declaratively configured (no user code changes). A REPL is just a special case of a socket service. Rather than provide a socket server REPL, provide a built-in socket server that composes with the existing repl function.

For design background, see: http://dev.clojure.org/display/design/Socket+Server+REPL

Start a socket server by supplying an extra system property:

java -cp clojure.jar:app.jar my.app -Dclojure.server.repl="{:address \"127.0.0.1\" :\port 5555 :accept clojure.repl/repl}"

where options are:

  • address = host or address, defaults to loopback
  • port = port, required
  • accept = namespaced function to invoke on socket accept, required
  • args = sequential collection of args to pass to accept
  • bind-err = defaults to true, binds err to out stream
  • server-daemon = defaults to true, socket server thread doesn't block exit
  • client-daemon = defaults to true, socket client threads don't block exit

If you want to test a repl client with the a repl server, telnet works:

$ telnet 127.0.0.1 5555
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
user=> (+ 1 1)
2
user=> (/ 1 0)
#error {:cause "Divide by zero",
 :via
 [{:type java.lang.ArithmeticException,
   :message "Divide by zero",
   :at [clojure.lang.Numbers divide "Numbers.java" 158]}],
 :trace
 [[clojure.lang.Numbers divide "Numbers.java" 158]  
  [clojure.lang.Numbers divide "Numbers.java" 3808]
  [user1$eval1 invoke "NO_SOURCE_FILE" 1]
  [clojure.lang.Compiler eval "Compiler.java" 6784]
  [clojure.lang.Compiler eval "Compiler.java" 6747]
  [clojure.core$eval invoke "core.clj" 3078]
  [clojure.main$repl$read_eval_print__8287$fn__8290 invoke "main.clj" 265]
  [clojure.main$repl$read_eval_print__8287 invoke "main.clj" 265]
  [clojure.main$repl$fn__8296 invoke "main.clj" 283]
  [clojure.main$repl doInvoke "main.clj" 283]
  [clojure.lang.RestFn invoke "RestFn.java" 619]
  [clojure.main$socket_repl_server$fn__8342$fn__8344 invoke "main.clj" 450]
  [clojure.lang.AFn run "AFn.java" 22]
  [java.lang.Thread run "Thread.java" 724]]}
user1=> (println "hello")
hello
nil

Patch: clj-1671-6.patch (wip - not yet complete)



 Comments   
Comment by Timothy Baldridge [ 09/Mar/15 5:50 PM ]

Could we perhaps keep this as a contrib library? This ticket simply states "The goal is to provide a simple streaming socket repl as part of Clojure." What is the rationale for the "part of Clojure" bit?

Comment by Alex Miller [ 09/Mar/15 7:33 PM ]

We want this to be available as a Clojure.main option. It's all additive - why wouldn't you want it in the box?

Comment by Timothy Baldridge [ 09/Mar/15 10:19 PM ]

It never has really been too clear to me why some features are included in core, while others are kept in contrib. I understand that some are simply for historical reasons, but aside from that there doesn't seem to be too much of a philosophy behind it.

However it should be noted that since patches to clojure are much more guarded it's sometimes nice to have certain features in contrib, that way they can evolve with more rapidity than the one release a year that clojure has been going through.

But aside from those issues, I've found that breaking functionality into modules forces the core of a system to become more configurable. Perhaps I would like to use this repl socket feature, but pipe the data over a different communication protocol, or through a different serializer. If this feature were to be coded as a contrib library it would expose extension points that others could use to add additional functionality.

So I guess, all that to say, I'd prefer a tool I can compose rather than a pre-built solution.

Comment by Rich Hickey [ 10/Mar/15 6:25 AM ]

Please move discussions on the merits of the idea to the dev list. Comments should be about the work of resolving the ticket, approach taken by the patch, quality/perf issues etc.

Comment by Colin Jones [ 11/Mar/15 1:33 PM ]

I see that context (a) of the rationale is that network communication is desired, which sounds to me like users of this feature may want to communicate across hosts (whether in VMs or otherwise). Is that the case?

If so, it seems like specifying the address to bind to (e.g. "0.0.0.0", "::", "127.0.0.1", etc.) may become important as well as the existing port option. This way, someone who wants to communicate across hosts (or conversely, lock down access to local-only) can make that decision.

Comment by Alex Miller [ 11/Mar/15 2:07 PM ]

Colin - agreed. There are many ways to potentially customize what's in there so we need to figure out what's worth doing, both in the function and via the command line.

I think address is clearly worth having via the function and possibly in the command line too.

Comment by Kevin Downey [ 11/Mar/15 5:49 PM ]

I find the exception printing behavior really odd. for a machine you want an exception as data, but you also want some indication of if the data is an error or not, for a human you wanted a pretty printed stacktrace. making the socket repl default to printing errors this way seems to optimize for neither.

Comment by Rich Hickey [ 12/Mar/15 12:29 PM ]

Did you miss the #error tag? That indicates the data is an error. It is likely we will pprint the error data, making it not bad for both purposes

Comment by Alex Miller [ 13/Mar/15 11:29 AM ]

New -4 patch changes:

  • clojure.core/throwable-as-map now public and named clojure.core/Throwable->map
  • catch and ignore SocketException without printing in socket server repl (for client disconnect)
  • functions to print as message and as data are now: clojure.main/err-print and clojure.main/err->map. All defaults and docs updated.
Comment by David Nolen [ 18/Mar/15 12:44 PM ]

Is there any reason to not allow supplying :eval in addition to :use-prompt? In the case of projects like ClojureCLR + Unity eval generally must happen on the main thread. With :eval as something which can be configured, REPL sessions can queue forms to be eval'ed with the needed context (current ns etc.) to the main thread.

Comment by Kevin Downey [ 20/Mar/15 2:12 PM ]

I did see the #error tag, but throwables print with that tag regardless of if they are actually thrown or if they are just the value returned from a function. Admittedly returning exceptions as values is not something generally done, but the jvm does distinguish between a return value and a thrown exception. Having a repl that doesn't distinguish between the two strikes me as an odd design. The repl you get from clojure.main currently prints the message from a thrown uncaught throwable, and on master prints with #error if you have a throwable value, so it distinguishes between an uncaught thrown throwable and a throwable value. That obviously isn't great for tooling because you don't get a good data representation in the uncaught case.

It looks like the most recent patch does pretty print uncaught throwables, which is helpful for humans to distinguish between a returned value and an uncaught throwable.

Comment by Kevin Downey [ 25/Mar/15 1:10 PM ]

alex: saying this is all additive, when it has driven changes to how things are printed, using the global print-method, rings false to me

Comment by Sam Ritchie [ 25/Mar/15 1:15 PM ]

This seems like a pretty big last minute addition for 1.7. What's the rationale for adding it here vs deferring to 1.8, or trying it out as a contrib first?

Comment by Alex Miller [ 25/Mar/15 2:13 PM ]

Kevin: changing the fallthrough printing for things that are unreadable to be readable should be useful regardless of the socket repl. It shouldn't be a change for existing programs (unless they're relying on the toString of objects without print formats).

Sam: Rich wants it in the box as a substrate for tools.

Comment by Alex Miller [ 26/Mar/15 10:03 AM ]

Marking incomplete, pending at least the repl exit question.

Comment by Laurent Petit [ 29/Apr/15 2:18 PM ]

Hello, I intend to work on this, if it appears it still has a good probability of being included in clojure 1.7.
There hasn't been much visible activity on it lately.
What is the current status of the pending question, and do you think it will still make it in 1.7?

Comment by Alex Miller [ 29/Apr/15 2:29 PM ]

This has been pushed to 1.8 and is on my plate. The direction has diverged quite a bit from the original description and we don't expect to modify clojure.main as is done in the prior patches. So, I would recommend not working on it as described here.

Comment by Laurent Petit [ 01/May/15 7:24 AM ]

OK thanks for the update.

Is the discussion about the new design / goal (you say the direction has diverged) available somewhere so that I can keep in touch with what the Hammock Time is producing? Because on my own hammock time I'm doing some mental projections for CCW support of this, based on what is publicly available here -

Also, as soon as you have something available for testing please don't hesitate to ping me, I'll see what I can do to help depending on my schedule. Cheers.

Comment by Alex Miller [ 01/May/15 8:44 AM ]

Some design work is here - http://dev.clojure.org/display/design/Socket+Server+REPL.

Comment by Laurent Petit [ 05/May/15 11:41 AM ]

Thanks for the link. It seems that the design is totally revamped indeed. Better to wait then.

Comment by Andy Fingerhut [ 04/Jul/15 1:21 PM ]

Alex, just a note that the Java method getLoopbackAddress [1] appears to have been added with Java 1.7, so your patches that use that method do not compile with Java 1.6. If the plan was for the next release of Clojure to drop support for Java 1.6 anyway, then no worries.

[1] http://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getLoopbackAddress%28%29

Comment by Stuart Halloway [ 08/Jul/15 8:31 AM ]

Lifecycle concerns

1. atoms are weak when actions (starting threads / sockets) are coordinated with recorded state (the map)
2. I see why the plumbing needs access to socket, but what is the motivation for expoosing it to outside code, seems like opportunity to break stuff
3. stating a server has a race condition
4. what happens if somebody wants to call start-server explicitly – how do they know whether that happens before or after config-driven process launches
5. what guarantees about when config-driven launch happens, vis-a-vis other startup-y things
6. is there a use for stop-servers other than shutdown?
7. does Clojure now need a shutdown-clojure-resources? I don't want to have to remember shutdown-agents, plus stop-servers, plus whatever we add next year
8. what happens on misconfiguration (e.g. nonexistent namespace)? will other servers still launch? what thread dies, and where does is report? will the app main process die before even reaching the user?

Comment by Alex Miller [ 08/Jul/15 9:12 AM ]

1. it's not in the patch, but the intention is that in the runtime on startup there is a call to (start-servers (System/getProperties)) and generally you're not starting servers on the fly (although it's broken out to make that possible).
2. just trying to make resources available, not sure how much locking down we want/need
3. I'm expecting this to be a startup thing primarily
4. I'm assuming the config-driven process will start in the runtime and thus it will happen first. Not sure how much we need to support the manual stuff - it just seemed like a good idea to make it possible.
5. dunno, haven't looked at where to do that yet. Probably somewhere similar to data_readers stuff?
6. no
7. the threads are daemons by default meaning that it will shut down regardless. If you set the daemon properties to false, then you're in control and need to call stop-servers where it's appropriate.
8. I thought about these questions and do not have good answers. Lots more of that stuff needs to be handled.





[CLJ-1620] Constants are leaked in case of a reentrant eval Created: 18/Dec/14  Updated: 19/May/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: aot, compiler

Attachments: Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer.patch     Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer-v2.patch     Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer-v3.patch     Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer-v4.patch     Text File clj-1620-v5.patch     Text File eval-bindings.patch    
Patch: Code
Approval: Incomplete

 Description   

Compiling a function that references a non loaded (or uninitialized) class triggers its init static. When the init static loads clojure code, some constants (source code I think) are leaked into the constants pool of the function under compilation.

It prevented CCW from working in some environments (Rational) because the static init of the resulting function was over 64K.

Steps to reproduce:

Load the leak.main ns and run the code in comments: the first function has 15 extra fields despite being identical to the second one.

(ns leak.main)

(defn first-to-load []
  leak.Klass/foo)

(defn second-to-load []
  leak.Klass/foo)

(comment
=> (map (comp count #(.getFields %) class) [first-to-load second-to-load])
(16 1)
)
package leak;
 
import clojure.lang.IFn;
import clojure.lang.RT;
import clojure.lang.Symbol;
 
public class Klass {
  static {
    RT.var("clojure.core", "require").invoke(Symbol.intern("leak.leaky"));
  }
  public static IFn foo = RT.var("leak.leaky", "foo");
}
(ns leak.leaky)

(defn foo
  "Some doc"
  []
  "hello")

(def unrelated 42)

https://gist.github.com/cgrand/5dcb6fe5b269aecc6a5b#file-main-clj-L10

Patch: clj-1620-v5.patch



 Comments   
Comment by Christophe Grand [ 18/Dec/14 3:56 PM ]

Patch from Nicola Mometto

Comment by Nicola Mometto [ 18/Dec/14 4:01 PM ]

Attached the same patch with a more informative better commit message

Comment by Laurent Petit [ 18/Dec/14 4:03 PM ]

I'd like to thank Christophe and Alex for their invaluable help in understanding what was happening, formulating the right hypothesis and then finding a fix.

I would also mention that even if non IBM rational environments where not affected by the bug to the point were CCW would not work, they were still affected. For instance the class for a one-liner function wrapping an interop call weighs 700bytes once the patch is applied, when it weighed 90kbytes with current 1.6 or 1.7.

Comment by Laurent Petit [ 18/Dec/14 5:07 PM ]

In CCW for the initial problematic function, the -v2 patch produces exactly the same bytecode as if the referenced class does not load any namespace in its static initializers.
That is, the patch is valid. I will test it live in the IBM Rational environment ASAP.

Comment by Laurent Petit [ 19/Dec/14 12:10 AM ]

I confirm the patch fixes the issue detected initially in the IBM Rational environment

Comment by Michael Blume [ 06/Jan/15 4:03 PM ]

I have absolutely no idea why, but if I apply this patch, and the patch for CLJ-1544 to master, and then try to build a war from this test project https://github.com/pdenhaan/extend-test I get a scary-looking traceback:

$ lein do clean, war!
Exception in thread "main" java.lang.NoSuchFieldError: __thunk__0__, compiling:(route.clj:1:1)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3606)
	at clojure.lang.Compiler.compile1(Compiler.java:7299)
	at clojure.lang.Compiler.compile1(Compiler.java:7289)
	at clojure.lang.Compiler.compile(Compiler.java:7365)
	at clojure.lang.RT.compile(RT.java:398)
	at clojure.lang.RT.load(RT.java:438)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5362.invoke(core.clj:5668)
	at clojure.core$load_lib.doInvoke(core.clj:5667)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5706)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5789)
	at clojure.lang.RestFn.invoke(RestFn.java:436)
	at extend_test.core.handler$loading__5301__auto____66.invoke(handler.clj:1)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3601)
	at clojure.lang.Compiler.compile1(Compiler.java:7299)
	at clojure.lang.Compiler.compile1(Compiler.java:7289)
	at clojure.lang.Compiler.compile(Compiler.java:7365)
	at clojure.lang.RT.compile(RT.java:398)
	at clojure.lang.RT.load(RT.java:438)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5362.invoke(core.clj:5668)
	at clojure.core$load_lib.doInvoke(core.clj:5667)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5706)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5789)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at extend_test.core.servlet$loading__5301__auto____7.invoke(servlet.clj:1)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3601)
	at clojure.lang.Compiler.compile1(Compiler.java:7299)
	at clojure.lang.Compiler.compile1(Compiler.java:7289)
	at clojure.lang.Compiler.compile1(Compiler.java:7289)
	at clojure.lang.Compiler.compile(Compiler.java:7365)
	at clojure.lang.RT.compile(RT.java:398)
	at clojure.lang.RT.load(RT.java:438)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$compile$fn__5420.invoke(core.clj:5834)
	at clojure.core$compile.invoke(core.clj:5833)
	at user$eval5.invoke(form-init180441230737245034.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6776)
	at clojure.lang.Compiler.eval(Compiler.java:6765)
	at clojure.lang.Compiler.eval(Compiler.java:6766)
	at clojure.lang.Compiler.load(Compiler.java:7203)
	at clojure.lang.Compiler.loadFile(Compiler.java:7159)
	at clojure.main$load_script.invoke(main.clj:274)
	at clojure.main$init_opt.invoke(main.clj:279)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NoSuchFieldError: __thunk__0__
	at instaparse.core__init.load(Unknown Source)
	at instaparse.core__init.<clinit>(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at clojure.lang.RT.loadClassForName(RT.java:2141)
	at clojure.lang.RT.load(RT.java:430)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5362.invoke(core.clj:5668)
	at clojure.core$load_lib.doInvoke(core.clj:5667)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5706)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5789)
	at clojure.lang.RestFn.invoke(RestFn.java:436)
	at clout.core$loading__5301__auto____273.invoke(core.clj:1)
	at clout.core__init.load(Unknown Source)
	at clout.core__init.<clinit>(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at clojure.lang.RT.loadClassForName(RT.java:2141)
	at clojure.lang.RT.load(RT.java:430)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5362.invoke(core.clj:5668)
	at clojure.core$load_lib.doInvoke(core.clj:5667)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5706)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5789)
	at clojure.lang.RestFn.invoke(RestFn.java:482)
	at compojure.core$loading__5301__auto____68.invoke(core.clj:1)
	at compojure.core__init.load(Unknown Source)
	at compojure.core__init.<clinit>(Unknown Source)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at clojure.lang.RT.loadClassForName(RT.java:2141)
	at clojure.lang.RT.load(RT.java:430)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5415.invoke(core.clj:5823)
	at clojure.core$load.doInvoke(core.clj:5822)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5362.invoke(core.clj:5668)
	at clojure.core$load_lib.doInvoke(core.clj:5667)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5706)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5789)
	at clojure.lang.RestFn.invoke(RestFn.java:457)
	at compojure.route$loading__5301__auto____1508.invoke(route.clj:1)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3601)
	... 75 more
Subprocess failed
Comment by Michael Blume [ 06/Jan/15 4:06 PM ]

https://github.com/MichaelBlume/clojure/tree/no-field
https://github.com/MichaelBlume/extend-test/tree/no-field

mvn clean install in the one, lein ring uberwar in the other.

Comment by Nicola Mometto [ 06/Jan/15 6:09 PM ]

Michael, thanks for the report, I've tried investigating this a bit but the big amount of moving parts involved make it really hard to figure out why the combination of the two patches causes this issue.

A helpful minimal case would require no lein and no external dependencies, I'd appreciate some help in debugging this issue if anybody has time.

Comment by Michael Blume [ 06/Jan/15 10:56 PM ]

Ok, looks like the minimal case is

(ns foo (:require [instaparse.core]))

(ns bar (:require [foo]))

and then attempt to AOT-compile both foo and bar.

I don't yet know what's special about instaparse.core.

Comment by Michael Blume [ 06/Jan/15 11:30 PM ]

Well, not a minimal case, of course, but one without lein, at least.

Comment by Michael Blume [ 06/Jan/15 11:51 PM ]

ok, problem is instaparse's defclone macro, I've extracted it to a test repo

https://github.com/MichaelBlume/thunk-fail

lein do clean, compile will get you a failure, but the repo has no dependencies so I'm sure there's a way to do that without lein.

Comment by Ghadi Shayban [ 06/Jan/15 11:56 PM ]

Sorry for the barrage of questions, but these classloader bugs are subtle (and close to being solved I hope). Your report is immensely valuable, and yet it will help to be even more specific. There are a cluster of these bugs – and keeping them laser-focused is key.

The minimal case to which you refer is the NoSuchFieldError?
How are is this being invoked this without lein?
What are you calling to AOT? (compile 'bar) ?
What is the classpath? When you invoke originally, is ./target/classes empty?
Does the problem go away with CLJ-979-7 applied?

Comment by Michael Blume [ 07/Jan/15 12:16 AM ]

I have tried and failed to replicate without leiningen. When I just run

java -Dclojure.compile.path=target -cp src:../clojure/target/clojure-1.7.0-aot-SNAPSHOT.jar clojure.lang.Compile thunk-fail.first thunk-fail.second

everything works fine.

Comment by Ghadi Shayban [ 07/Jan/15 12:30 AM ]

The NoSuchFieldError is related to the keyword lookup sites.

Replacing defclone's body with
`(do (:foo {})) is enough to trigger it, with the same ns structure.

Comment by Nicola Mometto [ 07/Jan/15 4:47 AM ]

I have updated the patch for CLJ-1544, now the combination of the new patch + the patch from this ticket should not cause any exception.

That said, a bug in this patch still exists since while the patch for CLJ-1544 had a bug, it was causing a perfectly valid (albeit hardly reproducible) compilation scenario so we should keep debugging this patch with the help of the bugged patch for CLJ-1544.

I guess the first thing to do is figure out what lein compile is doing differently than clojure.Compile

Comment by Nicola Mometto [ 07/Jan/15 4:49 AM ]

Also Ghadi is right, infact replacing the whole body of thunk-fail.core with (:foo {}) is enough.

It would seem like the issue is with AOT (re)compiling top-level keyword lookup sites, my guess is that for some reason this patch is preventing correct generation of the __init static initializer.

Comment by Nicola Mometto [ 07/Jan/15 5:35 AM ]

I still have absolutely no idea what lein compile is doing but I figured out the issue.
The updated patch binds (in eval) the appropriate vars only when already bounded.

Comment by Alex Miller [ 07/Jan/15 9:00 AM ]

Would it be worth using transients on the bindings map now?

Comment by Nicola Mometto [ 07/Jan/15 9:11 AM ]

Makes sense, updated the patch to use a transient map

Comment by Michael Blume [ 07/Jan/15 12:25 PM ]

Is there a test we can add that'll fail in the presence of the v2 patch? preferably independent of the CLJ-1544 patch? I can try to write one myself, but I don't have a lot of familiarity with the Clojure compiler internals.

Comment by Nicola Mometto [ 07/Jan/15 12:32 PM ]

I'll have to think about a way to reproduce that bug, it's not a simple scenario to reproduce.
It involves compiling a namespace from an evaluated context.

Comment by Laurent Petit [ 15/Apr/15 11:14 AM ]

Hello, is there any chance left that this issue will make it to 1.7 ?

Comment by Alex Miller [ 15/Apr/15 11:18 AM ]

Wasn't planning on it - what's the impact for you?

Comment by Laurent Petit [ 29/Apr/15 2:14 PM ]

The impact is that I need to use a patched version of Clojure for CCW.
While it's currently not that hard to follow clojure's main branch and regularly rebase on it or reapply the patch, it's still a waste of time.

Comment by Alex Miller [ 29/Apr/15 2:31 PM ]

I will check with Rich whether it can be screened for 1.7 before we get to RC.

Comment by Alex Miller [ 29/Apr/15 3:49 PM ]

same as v4 patch, but just has more diff context

Comment by Laurent Petit [ 01/May/15 7:25 AM ]

the file mentioned in the patch field is not the right one IMHO

Comment by Alex Miller [ 01/May/15 8:42 AM ]

which one is?

Comment by Laurent Petit [ 01/May/15 8:58 AM ]

I think you previous comment relates to clj-1620-v5.patch, but at the end of the description there's the following line:

Patch: 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer-v4.patch

Comment by Alex Miller [ 01/May/15 9:30 AM ]

Those patches are equivalent with respect to the change they introduce; they just differ in how much diff context they have.

Comment by Alex Miller [ 18/May/15 2:25 PM ]

Rich has ok'ed screening this one for 1.7 but I do not feel that I can mark it screened without understanding it much better than I do. The description, code, and cause information here is not sufficient for me to understand what the problem actually is or why the fix is the right one. The fix seems to address the symptom but I worry that it is just a symptom and that a better understanding of the actual cause would lead to a different or better fix.

The evolution of the patches was driven by bugs in CLJ-1544 (a patch which has been pulled out for being suspect for other reasons). Starting fresh, were those modifications necessary and correct?

Why does this set of vars need to push clean impls into the bindings? Why not some of the other vars (like those pushed in load())? The set chosen here seems to match that from the ReifyParser - why? Why should they only be pushed if they are bound (that is, why is "not bound" not the same as "bound but empty")? Are we affecting performance?

Popping all the way out, is the thing being done by CCW even a thing that should be doable? The description says "Compiling a function that references a non loaded (or uninitialized) class triggers its init static" - should this load even happen? Can we get an example that actually demonstrates what CCW was doing originally?

Comment by Laurent Petit [ 19/May/15 7:12 AM ]

Alex, the question of "should what CCW is doing be doable" can be answered if you answer it on the given example, I think.

The question "should the initialization of the class occur when it could just be loaded" is a good one. Several reports have been made on the Clojure list about this problem, and I guess there is at least one CLJ issue about changing some more classForName into classForNameNonLoading here and there in Clojure.
For instance, it prevents referencing java classes which have code in their static initializers as soon as the code does some supposition about the runtime it is initialized in. This is a problem with Eclipse / SWT, this a problem with Cursive as I remember Colin mentioning a similar issue. And will probably is a problem that can appear each time one tries to AOT compile clojure code interoperating with java classes who happen to have, somewhere within static initializers triggered by the compilation (and this is transitive), assumptions that they are initialized in the proper target runtime environment.

What I don't know is if preventing the initialization to occur in the first place would be sufficient to get rid of the class of problems this bug and the proposed patch tried to solve. I do not claim to totally what is happening either (Christophe and Nicolas were of great help to analyze the issue and create the patch), but as I understand it, it's a kind of "Inception-the-movie-like" bug. Compiling a fn which triggers compiling another fn (here through the loading of clojure namespaces via a java initializer).

If preventing the initialization of class static methods when they are referenced (through interop calls - constructor, field, method, static field, static method-) is the last remaining bit that could cause such "compilation during compilation" scenario, then yes, protecting the compilation process like Nicolas tried to do may not be necessary, and just fixing the undesired loading may be enough.





[CLJ-1544] AOT bug involving namespaces loaded before AOT compilation started Created: 01/Oct/14  Updated: 20/Feb/15

Status: Reopened
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Allen Rohner Assignee: Unassigned
Resolution: Unresolved Votes: 9
Labels: aot

Attachments: Text File 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co.patch     Text File 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co-v2.patch     Text File 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co-v3.patch     Text File 0001-CLJ-1641-disallow-circular-dependencies-even-if-the-.patch    
Patch: Code
Approval: Incomplete

 Description   

If namespace "a" that is being AOT compiled requires a namespace "b" that has been loaded but not AOT compiled, the classfile for that namespace will never be emitted on disk, causing errors when compiling uberjars or in other cases.

A minimal reproducible case is described in the following comment: http://dev.clojure.org/jira/browse/CLJ-1544?focusedCommentId=36734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-36734

Other examples of the bug:
https://github.com/arohner/clj-aot-repro
https://github.com/methylene/class-not-found

A real issue triggered by this bug: https://github.com/cemerick/austin/issues/23

Related ticket: CLJ-1641 contains descriptions and comments about some potentially unwanted consequences of applying proposed patch 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co-v3.patch

Approach: The approach taken by the attached patch is to force reloading of namespaces during AOT compilation if no matching classfile is found in the compile-path or in the classpath

Patch: 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co-v3.patch

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 04/Dec/14 12:45 PM ]

Possibly related: CLJ-1457

Comment by Nicola Mometto [ 05/Dec/14 4:51 AM ]

Has anyone been able to reproduce this bug from a bare clojure repl? I have been trying to take lein out of the equation for an hour but I don't seem to be able to reproduce it – this makes me think that it's possible that this is a lein/classlojure/nrepl issue rather than a compiler/classloader bug

Comment by Nicola Mometto [ 06/Dec/14 4:20 PM ]

I was actually able to reproduce and understand this bug thanks to a minimal example reduced from a testcase for CLJ-1413.

>cat error.sh
#!/bin/sh

rm -rf target && mkdir target

java -cp src:clojure.jar clojure.main - <<EOF
(require 'myrecord)
(set! *compile-path* "target")
(compile 'core)
EOF

java -cp target:clojure.jar clojure.main -e "(use 'core)"

> cat src/core.clj
(in-ns 'core)
(clojure.core/require 'myrecord)
(clojure.core/import myrecord.somerecord)

>cat src/myrecord.clj
(in-ns 'myrecord)
(clojure.core/defrecord somerecord [])

> ./error.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at clojure.lang.RT.classForName(RT.java:2113)
	at clojure.lang.RT.classForName(RT.java:2122)
	at clojure.lang.RT.loadClassForName(RT.java:2141)
	at clojure.lang.RT.load(RT.java:430)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5403.invoke(core.clj:5808)
	at clojure.core$load.doInvoke(core.clj:5807)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5352.invoke(core.clj:5653)
	at clojure.core$load_lib.doInvoke(core.clj:5652)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5691)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:630)
	at clojure.core$use.doInvoke(core.clj:5785)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at user$eval212.invoke(NO_SOURCE_FILE:1)
	at clojure.lang.Compiler.eval(Compiler.java:6767)
	at clojure.lang.Compiler.eval(Compiler.java:6730)
	at clojure.core$eval.invoke(core.clj:3076)
	at clojure.main$eval_opt.invoke(main.clj:288)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.io.FileNotFoundException: Could not locate myrecord__init.class or myrecord.clj on classpath.
	at clojure.lang.RT.load(RT.java:443)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5403.invoke(core.clj:5808)
	at clojure.core$load.doInvoke(core.clj:5807)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5352.invoke(core.clj:5653)
	at clojure.core$load_lib.doInvoke(core.clj:5652)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5691)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5774)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at core__init.load(Unknown Source)
	at core__init.<clinit>(Unknown Source)
	... 33 more

This bug also has also affected Austin: https://github.com/cemerick/austin/issues/23

Essentially this bug manifests itself when a namespace defining a protocol or a type/record has been JIT loaded and a namespace that needs the protocol/type/record class is being AOT compiled later. Since the namespace defining the class has already been loaded the class is never emitted on disk.

Comment by Nicola Mometto [ 06/Dec/14 6:51 PM ]

I've attached a tentative patch fixing the issue in the only way I found reasonable: forcing the reloading of namespaces during AOT compilation if the compiled classfile is not found in the compile-path or in the classpath

Comment by Nicola Mometto [ 06/Dec/14 7:30 PM ]

Updated patch forces reloading of the namespace even if a classfile exists in the compile-path but the source file is newer, mimicking the logic of clojure.lang.RT/load

Comment by Nicola Mometto [ 06/Dec/14 7:39 PM ]

Further testing demonstrated that this bug is not only scoped to deftypes/defprotocols but can manifest itself in the general case of a namespace "a" requiring a namespace "b" already loaded, and AOT compiling the namespace "a"

Comment by Tassilo Horn [ 08/Dec/14 4:46 AM ]

I'm also affected by this bug. Is there some workaround I can apply in the meantime, e.g., by dictating the order in which namespaces are going to be loaded/compiled in project.clj?

Comment by Nicola Mometto [ 15/Dec/14 10:58 AM ]

Tassilo, if you don't have control over whether or not a namespace that an AOT namespace depends on has already been loaded before compilation starts, requiring those namespaces with :reload-all should be enough to work around this issue

Comment by Tassilo Horn [ 15/Dec/14 11:36 AM ]

Nicola, thanks! But in the meantime I've switched to using clojure.java.api and omit AOT-compilation. That works just fine, too.

Comment by Michael Blume [ 15/Dec/14 5:05 PM ]

Tassilo, that's often a good solution, another is to use a shim clojure class

(ns myproject.main-shim (:gen-class))

(defn -main [& args]
  (require 'myproject.main)
  ((resolve 'myproject.main) args))

then your shim namespace is AOT-compiled but nothing else in your project is.

Comment by Tassilo Horn [ 16/Dec/14 1:07 AM ]

Thanks Michael, that's a very good suggestion. In fact, I've always used AOT only as a means to export some functions to Java-land. Basically, I did as you suggest but required the to-be-exported fn's namespace in the ns-form which then causes AOT-compilation of that namespace and its own deps recursively. So your approach seems to be as convenient from the Java side (no need to clojure.java.require `require` in order to require the namespace with the fn I wanna call ) while still omitting AOT. Awesome!

Comment by Nicola Mometto [ 06/Jan/15 6:07 PM ]

I'm marking this as incomplete to prevent further screening until the bug reported here: http://dev.clojure.org/jira/browse/CLJ-1620?focusedCommentId=37232&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-37232 is figured out

Comment by Nicola Mometto [ 07/Jan/15 4:43 AM ]

Fixed the patch, I'm re marking the tickets as Vetted as it was before.

Comment by Alex Miller [ 16/Jan/15 12:54 PM ]

This patch is being rolled back for 1.7.0-alpha6 pending further investigation into underlying problems and possible solutions.

Comment by Colin Fleming [ 19/Jan/15 4:41 AM ]

I'm not 100% sure, but this looks a lot like Cursive issue 369. It had a case that I could reproduce with JDK 7 but not JDK 8, has the same mysterious missing namespace class symptom, and involves mixed AOT/non-AOT namespaces. However it's happening at runtime, not at compile time, which doesn't seem consistent.

Comment by Alex Miller [ 19/Jan/15 7:29 AM ]

My error report above was incorrectly tied to this issue (see CLJ-1636). I will delete the comment.

Comment by Nicola Mometto [ 29/Jan/15 12:23 PM ]

Since ticket CLJ-1641 has been closed, I'll repost here a comment I posted in that ticket + the patch I proposed, arguing why I think the patch I proposed for this ticket should not have been reverted:

Zach, I agree that having different behaviour between AOT and JIT is wrong.

But I also don't agree that having clojure error out on circular dependencies should be considered a bug, I would argue that the way manifold used to implement the circular dependency between manifold.stream and manifold.stream.graph was a just a hack around lack of validation in require.

My proposal to fix this disparity between AOT and JIT is by making require/use check for circular dependencies before checking for already-loaded namespaces.

This way, both under JIT and AOT code like

(ns foo.a (:require foo.b))
(ns foo.b)
(require 'foo.a)

will fail with a circular depdenency error.

This is what the patch I just attached (0001-CLJ-1641disallow-circular-dependencies-even-if-the.patch) does.





[CLJ-1517] Unrolled small vectors Created: 01/Sep/14  Updated: 31/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Backlog

Type: Enhancement Priority: Critical
Reporter: Zach Tellman Assignee: Unassigned
Resolution: Unresolved Votes: 20
Labels: collections, performance

Attachments: File unrolled-collections-2.diff     File unrolled-collections.diff     Text File unrolled-vector-2.patch     Text File unrolled-vector.patch    
Patch: Code
Approval: Incomplete

 Description   

As discussed on the mailing list [1], this patch has two unrolled variants of vectors and maps, with special inner classes for each cardinality. Currently both grow to six elements before spilling over into the general versions of the data structures, which is based on rough testing but can be easily changed. At Rich's request, I haven't included any integration into the rest of the code, and there are top-level static create() methods for each.

The sole reason for this patch is performance, both in terms of creating data structures and performing operations on them. This can be seen as a more verbose version of the trick currently played with PersistentArrayMap spilling over into PersistentHashMap. Based on the benchmarks, which can be run by cloning cambrian-collections [2] and running 'lein test :benchmark', this should supplant PersistentArrayMap. Performance is at least on par with PAM, and often much faster. Especially noteworthy is the creation time, which is 5x faster for maps of all sizes (lein test :only cambrian-collections.map-test/benchmark-construction), and on par for 3-vectors, but 20x faster for 5-vectors. There are similar benefits for hash and equality calculations, as well as calls to reduce().

This is a big patch (over 5k lines), and will be kind of a pain to review. My assumption of correctness is based on the use of collection-check, and the fact that the underlying approach is very simple. I'm happy to provide a high-level description of the approach taken, though, if that will help the review process.

I'm hoping to get this into 1.7, so please let me know if there's anything I can do to help accomplish that.

[1] https://groups.google.com/forum/#!topic/clojure-dev/pDhYoELjrcs
[2] https://github.com/ztellman/cambrian-collections

Patch: unrolled-vector-2.patch

Screener Notes: The approach is clear and understandable. Given the volume of generated code, I believe that best way to improve confidence in this code is to get people using it asap, and add collection-test [3] to the Clojure test suite. I would also like to get the generator [4] included in the Clojure repo. We don't need to necessarily automate running it, but would be nice to have it nearby if we want to tweak something later.

[3] https://github.com/ztellman/collection-check/blob/master/src/collection_check.clj
[4] https://github.com/ztellman/cambrian-collections/blob/master/generate/cambrian_collections/vector.clj



 Comments   
Comment by Zach Tellman [ 01/Sep/14 10:13 PM ]

Oh, I forgot to mention that I didn't make a PersistentUnrolledSet, since the existing wrappers can use the unrolled map implementation. However, it would be moderately faster and more memory efficient to have one, so let me know if it seems worthwhile.

Comment by Nicola Mometto [ 02/Sep/14 5:23 AM ]

Zach, the patch you added isn't in the correct format, they need to be created using `git format-patch`

Comment by Nicola Mometto [ 02/Sep/14 5:31 AM ]

Also, I'm not sure if this is on-scope with the ticket but those patches break with *print-dup*, as it expects a static create(x) method for each inner class.

I'd suggest adding a create(Map x) static method for the inner PersistentUnrolledMap classes and a create(ISeq x) one for the inner PersistentUnrolledVector classes

Comment by Alex Miller [ 02/Sep/14 8:14 AM ]

Re making patches, see: http://dev.clojure.org/display/community/Developing+Patches

Comment by Jozef Wagner [ 02/Sep/14 9:16 AM ]

I wonder what is the overhead of having meta and 2 hash fields in the class. Have you considered a version where the hash is computed on the fly and where you have two sets of collections, one with meta field and one without, using former when the actual metadata is attached to the collection?

Comment by Zach Tellman [ 02/Sep/14 12:13 PM ]

I've attached a patch using the proper method. Somehow I missed the detailed explanation for how to do this, sorry. I know the guidelines say not to delete previous patches, but since the first one isn't useful I've deleted it to minimize confusion.

I did the print-dup friendly create methods, and then realized that once these are properly integrated, 'pr' will just emit these as vectors. I'm fairly sure the create methods aren't necessary, so I've commented them out, but I'm happy to add them back in if they're useful for some reason I can't see.

I haven't given a lot of thought to memory efficiency, but I think caching the hashes are worthwhile. I can see an argument for creating a "with-meta" version of each collection, but since that would double the size of an already enormous patch, I think that should probably wait.

Comment by Zach Tellman [ 03/Sep/14 4:31 PM ]

I found a bug! Like PersistentArrayMap, I have a special code path for comparing keywords, but my generators for collection-check were previously using only integer keys. There was an off-by-one error in the transient map implementation [1], which was not present for non-keyword lookups.

I've taken a close look for other gaps in my test coverage, and can't find any. I don't think this substantively changes the risk of this patch (an updated version of which has been uploaded as 'unrolled-collections-2.diff'), but obviously where there's one bug, there may be others.

[1] https://github.com/ztellman/cambrian-collections/commit/eb7dfe6d12e6774512dbab22a148202052442c6d#diff-4bf78dbf5b453f84ed59795a3bffe5fcR559

Comment by Zach Tellman [ 03/Oct/14 2:34 PM ]

As an additional data point, I swapped out the data structures in the Cheshire JSON library. On the "no keyword-fn decode" benchmark, the current implementation takes 6us, with the unrolled data structures takes 4us, and with no data structures (just lexing the JSON via Jackson) takes 2us. Other benchmarks had similar results. So at least in this scenario, it halves the overhead.

Benchmarks can be run by cloning https://github.com/dakrone/cheshire, unrolled collections can be tested by using the 'unrolled-collections' branch. The pure lexing benchmark can be reproduced by messing around with the cheshire.parse namespace a bit.

Comment by Zach Tellman [ 06/Oct/14 1:31 PM ]

Is there no way to get this into 1.7? It's an awfully big win to push off for another year.

Comment by Alex Miller [ 07/Oct/14 2:08 PM ]

Hey Zach, it's definitely considered important but we have decided to drop almost everything not fully done for 1.7. Timeframe for following release is unknown, but certainly expected to be significantly less than a year.

Comment by John Szakmeister [ 30/Oct/14 2:53 PM ]

You are all free to determine the time table, but I thought I'd point out that Zach is not entirely off-base. Clojure 1.4.0 was released April 5th, 2012. Clojure 1.5.0 was released March 1st, 2013 with 1.6.0 showing up March 25th, 2014. So it appears that the current cadence is around a year.

Comment by Alex Miller [ 30/Oct/14 3:40 PM ]

John, there is no point to comments like this. Let's please keep issue comments focused on the issue.

Comment by Zach Tellman [ 13/Nov/14 12:23 PM ]

I did a small write-up on this patch which should help in the eventual code review: http://blog.factual.com/using-clojure-to-generate-java-to-reimplement-clojure

Comment by Zach Tellman [ 07/Dec/14 10:34 PM ]

Per my conversation with Alex at the Conj, here's a patch that only contains the unrolled vectors, and uses the more efficient constructor for PersistentVector when spilling over.

Comment by Alex Miller [ 08/Dec/14 1:10 PM ]

Zach, I created a new placeholder for the map work at http://dev.clojure.org/jira/browse/CLJ-1610.

Comment by Jean Niklas L'orange [ 09/Dec/14 1:52 PM ]

It should probably be noted that core.rrb-vector will break for small vectors by this patch, as it peeks into the underlying structure. This will also break other libraries which peeks into the vector implementation internals, although I'm not aware of any other – certainly not any other contrib library.

Also, two comments on unrolled-vector.patch:

private transient boolean edit = true;
in the Transient class should probably be
private volatile boolean edit = true;
as transient means something entirely different in Java.

conj in the Transient implementation could invalidate itself without any problems (edit = false;) if it is converted into a TransientVector (i.e. spills over) – unless it has a notable overhead. The invalidation can prevent some subtle bugs related to erroneous transient usage.

Comment by Alex Miller [ 09/Dec/14 1:58 PM ]

Jean - understanding the scope of the impact will certainly be part of the integration process for this patch. I appreciate the heads-up. While we try to minimize breakage for things like this, it may be unavoidable for libraries that rely on implementation internals.

Comment by Michał Marczyk [ 09/Dec/14 2:03 PM ]

I'll add support for unrolled vectors to core.rrb-vector the moment they land on master. (Probably with some conditional compilation so as not to break compatibility with earlier versions of Clojure – we'll see when the time comes.)

Comment by Michał Marczyk [ 09/Dec/14 2:06 PM ]

I should say that it'd be possible to add generic support for any "vector lookalikes" by pouring them into regular vectors in linear time. At first glance it seems to me that that'd be out of line with the basic promise of the library, but I'll give it some more thought before the changes actually land.

Comment by Zach Tellman [ 09/Dec/14 5:43 PM ]

Somewhat predictably, the day after I cut the previous patch, someone found an issue [1]. In short, my use of the ArrayChunk wrapper applied the offset twice.

This was not caught by collection-check, which has been updated to catch this particular failure. It was, however, uncovered by Michael Blume's attempts to merge the change into Clojure, which tripped a bunch of alarms in Clojure's test suite. My own attempt to do the same to "prove" that it worked was before I added in the chunked seq functionality, hence this issue persisting until now.

As always, there may be more issues lurking. I hope we can get as many eyeballs on the code between now and 1.8 as possible.

[1] https://github.com/ztellman/cambrian-collections/commit/2e70bbd14640b312db77590d8224e6ed0f535b43
[2] https://github.com/MichaelBlume/clojure/tree/test-vector

Comment by Zach Tellman [ 10/Jul/15 1:54 PM ]

As a companion to the performance analysis in the unrolled map issue, I've run the benchmarks and posted the results at https://gist.github.com/ztellman/10e8959501fb666dc35e. Some notable results:

Comment by Alex Miller [ 13/Jul/15 9:02 AM ]

Stu: I do not think this patch should be marked "screened" until the actual integration and build work (if the generator is integrated) has been completed.

Comment by Alex Miller [ 14/Jul/15 4:33 PM ]

FYI, we have "reset" all big features for 1.8 for the moment (except the socket repl work). We may still include it - that determination will be made later.

Comment by Zach Tellman [ 14/Jul/15 4:43 PM ]

Okay, any idea when the determination will be made? I was excited that we seemed to be finally moving forward on this.

Comment by Alex Miller [ 14/Jul/15 4:51 PM ]

No, but it is now on my work list.

Comment by Rich Hickey [ 15/Jul/15 8:17 AM ]

I wonder if all of the overriding of APersistentVector yields important benefits - e.g. iterator, hashcode etc.

Comment by Zach Tellman [ 15/Jul/15 11:51 AM ]

In the case of hashcode, definitely: https://gist.github.com/ztellman/10e8959501fb666dc35e#file-gistfile1-txt-L1013-L1076. This was actually one of the original things I wanted to speed up.

In the case of the iterator, probably not. I'd be fine removing that.

Comment by Zach Tellman [ 16/Jul/15 5:17 PM ]

So am I to infer from https://github.com/clojure/clojure/commit/36d665793b43f62cfd22354aced4c6892088abd6 that this issue is defunct? If so, I think there's a lot of improvements being left on the table for no particular reason.

Comment by Rich Hickey [ 16/Jul/15 6:34 PM ]

Yes, that commit covers this functionality. It takes a different approach from the patch in building up from a small core, and maximizing improvements to the bases rather than having a lot of redundant definitions per class. That also allowed for immediate integration without as much concern for correctness, as there is little new code. It also emphasizes the use case for tuples, e.g. small vectors used as values that won't be changed, thus de-emphasizing the 'mutable' functions. I disagree that many necessary improvements are being left out. The patch 'optimized' many things that don't matter. Further, there are not big improvements to the pervasive inlining. In addition, the commit includes the integration work at a fraction of the size of the patch. In all, it would have taken much more back and forth to get the patch to conform with this approach than to just get it all done, but I appreciate the inspiration and instigation - thanks!

Comment by Rich Hickey [ 16/Jul/15 6:46 PM ]

That said, this commit need not be the last word - it can serve as a baseline for further optimization. But I'd rather that be driven by need. Clojure could become 10x as large optimizing things that don't matter.

Comment by Zach Tellman [ 19/Jul/15 1:36 PM ]

What is our reference for "relevant" performance? I (or anyone else) can provide microbenchmarks for calculating hashes or whatever else, but that doesn't prove that it's an important improvement. I previously provided benchmarks for JSON decoding in Cheshire, but that's just one of many possible real-world benchmarks. It might be useful to have an agreed-upon list of benchmarks that we can use when debating what is and isn't useful.

Comment by Mike Anderson [ 19/Jul/15 11:14 PM ]

I was interested in this implementation so created a branch that integrates Zach's unrolled vectors on top of clojure 1.8.0-alpha2. I also simplified some of the code (I don't think the metadata handling or unrolled seqs are worthwhile, for example)

Github branch: https://github.com/mikera/clojure/tree/clj-1517

Then I ran a set of micro-benchmarks created by Peter Taoussanis

Results: https://gist.github.com/mikera/72a739c84dd52fa3b6d6

My findings from this testing:

  • Performance is comparable (within +/- 20%) on the majority of tests
  • Zach's approach is noticeably faster (by 70-150%) for 4 operations (reduce, mapv, into, equality)

My view is that these additional optimisations are worthwhile. In particular, I think that reduce and into are very important operations. I also care about mapv quite a lot for core.matrix (It's fundamental to many numerical operations on arrays implemented using Clojure vectors).

Happy to create a patch along these lines if it would be acceptable.

Comment by Zach Tellman [ 19/Jul/15 11:45 PM ]

The `reduce` improvements are likely due to the unrolled reduce and kvreduce impls, but the others are probably because of the unrolled transient implementation. The extra code required to add these would be pretty modest.

Comment by Mike Anderson [ 20/Jul/15 9:20 PM ]

I actually condensed the code down to a single implementation for `Transient` and `TupleSeq`. I don't think these really need to be fully unrolled for each Tuple type. That helps by making the code even smaller (and probably also helps performance, given JVM inline caching etc.)

Comment by Peter Taoussanis [ 21/Jul/15 11:46 AM ]

Hey folks,

Silly question: is there actually a particular set of reference benchmarks that everyone's currently using to test the work on tuples? It took me a while to notice how bad the variance was with my own set of micro benchmarks.

Bumping all the run counts up till the noise starts ~dying down, I'm actually seeing numbers now that don't seem to agree with others here .

Google Docs link: https://docs.google.com/spreadsheets/d/1QHY3lehVF-aKrlOwDQfyDO5SLkGeb_uaj85NZ7tnuL0/edit?usp=sharing
gist with the benchmarks: https://gist.github.com/ptaoussanis/0a294809bc9075b6b02d

Thanks, cheers!

Comment by Zach Tellman [ 21/Jul/15 6:52 PM ]

Hey Peter, I can't reproduce your results, and some of them are so far off what I'd expect that I have to think there was some data gathering error. For instance, the assoc operation being slower is kind of inexplicable, considering the unrolled version doesn't do any copying, etc. Also, all of your numbers are significantly slower than the ones on my 4 year old laptop, which is also a bit strange.

Just to make sure that we're comparing apples to apples, I've adapted your benchmarks into something that pretty-prints the mean runtime and variance for 1.7, 1.8-alpha2, and Mike's 1517 fork. It can be found at https://github.com/ztellman/tuple-benchmark, and the results of a run at https://gist.github.com/ztellman/3701d965228fb9eda084.

Comment by Mike Anderson [ 22/Jul/15 2:24 AM ]

Hey Zach just looked at your benchmarks and they are definitely more consistent with what I am seeing. The overall nanosecond timings look about right from my experience with similar code (e.g. working with small vectors in vectorz-clj).

Comment by Peter Taoussanis [ 22/Jul/15 2:41 AM ]

Hi Zach, thanks for that!

Have updated the results -
Gist: https://gist.github.com/ptaoussanis/0a294809bc9075b6b02d
Google docs: https://goo.gl/khgT83

Note that I've added an extra sheet/tab to the Google doc for your own numbers at https://gist.github.com/ztellman/3701d965228fb9eda084.

Am still struggling to produce results that show any consistent+significant post-JIT benefit to either of the tuple implementations against the micro benchmarks and one larger small-vec-heavy system I had handy.

It's looking to me like it's maybe possible that the JIT's actually optimising away most of the non-tuple inefficiencies in practice?

Of course it's very possible that my results are off, or my expectations wrong. The numbers have been difficult to pin down.

It definitely helped to have a standardised reference micro benchmark to work against (https://github.com/ztellman/tuple-benchmark). Could I perhaps suggest a similar reference macro benchmark (maybe something from core.matrix, Mike?)

Might also be a good idea to define a worthwhile target performance delta for ops like these that run in the nanosecond range (or for the larger reference macro benchmark)?

Just some thoughts from someone passing through in case they're at all useful; know many of you have been deeply involved in this for some time so please feel free to ignore any input that's not helpful

Appreciate all the efforts, cheers!

Comment by Rich Hickey [ 22/Jul/15 9:24 AM ]

I think everyone should back off on their enthusiasm for this approach. After much analysis, I am seeing definite negative impacts to tuples, especially the multiple class approach proposed by Zach. What happens in real code is that the many tuple classes cause call sites that see different sized vectors to become megamorphic, and nothing gets adequately optimized. In particular, call sites that will see tuple-sized and large vectors (i.e. a lot of library code) will optimize differently depending upon which they see often first. So, if you run your first tight loop on vector code that sees tuples, that code could later do much worse (50-100%) on large vectors than before the tuples patch was in place. Being much slower on large collections is a poor tradeoff for being slightly faster on small ones.

Certainly different tuple classes for arity 0-6 is a dead idea. You get as good or better optimization (at some cost in size) from a single class e.g. one with four fields, covering sizes 0-4. I have a working version of this in a local branch. It is better in that sites that include pvectors are only bi-morphic, but I am still somewhat skittish given what I've seen.

The other takeaway is that the micro benchmarks are nearly worthless for detecting these issues.

Comment by Zach Tellman [ 22/Jul/15 11:07 AM ]

I'm pretty sure that all of my real-world applications of the tuples (via clj-tuple) have been fixed cardinality, and wouldn't have surfaced any such issue. Thanks for putting the idea through its paces.

Comment by Mike Anderson [ 22/Jul/15 10:37 PM ]

Rich these are good insights - do you have a benchmark that you are using as representative of real world code?

I agree that it is great if we can avoid call sites becoming megamorphic, though I also believe the ship has sailed on that one already when you consider the multiple types of IPersistentVector that already exist (MapEntry, PersistentVector, SubVector plus any library-defined IPersistentVector instances such as clojure.core.rrb-vector). As a consequence, the JVM is usually not going to be able to prove that a specific IPersistentVector interface invocation is monomorphic, which is when the really big optimisations happen.

In most of the real world code that I've been working with, the same size/type of vector gets used repeatedly (Examples: iterating over map entries, working with a sequence of size-N vectors), so in such cases we should be able to rely on the polymorphic inline cache doing the right thing.

The idea of a single Tuple class for sizes 0-4 is interesting, though I can't help thinking that a lot of the performance gain from this may stem from the fact that a lot of code does stuff like (reduce conj [] .....) or the transient equivalent which is a particularly bad use case for Tuples, at least from the call site caching perspective. There may be a better way to optimise such cases rather than simply trying to make Tuples faster.... e.g. calling asTransient() on a Tuple0 could perhaps switch straight into the PersistentVector implementation.





[CLJ-1400] Error "Can't refer to qualified var that doesn't exist" should name the bad symbol Created: 09/Apr/14  Updated: 07/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.8

Type: Enhancement Priority: Minor
Reporter: Howard Lewis Ship Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: Compiler, errormsgs
Environment:

OS X


Attachments: File clj-1400-2.diff     File clj-1400-3.diff     File clj-1400-4.diff    
Patch: Code and Test
Approval: Incomplete

 Description   

Def of var with a ns that doesn't exist will yield this error:

user> (def foo/bar 1)
CompilerException java.lang.RuntimeException: Can't refer to qualified var that doesn't exist, compiling:(NO_SOURCE_PATH:1:1)

Cause: Compiler.lookupVar() returns null if the ns in a qualified var does not exist yet.

Proposed: The error message would be improved by naming the symbol and throwing a CompilerException with file/line/col info. It's not obvious, but this may be the only case where this error occurs. If so, the error message could be more specific that the ns is the part that doesn't exist.

Patch: clj-1400-4.diff

Screened by: Alex Miller



 Comments   
Comment by Scott Bale [ 25/Jun/14 9:58 AM ]

This looks to me like relatively low hanging fruit unless I'm missing something; assigning to myself.

Comment by Scott Bale [ 26/Jun/14 11:23 PM ]

Patch clj-1400-1.diff to Compiler.java.

With this patch the example would now look like:

user> (def foo/bar 1)
CompilerException java.lang.RuntimeException: Qualified symbol foo/bar refers to nonexistent namespace: foo, compiling:(NO_SOURCE_PATH:1:1)

I'm not sure the if(namesStaticMember(sym)) [see below], and the 2nd branch, is even necessary. Just by inspection I suspect it is not.

[footnote]

public static boolean namesStaticMember(Symbol sym){
	return sym.ns != null && namespaceFor(sym) == null;
}
Comment by Scott Bale [ 26/Jun/14 11:24 PM ]

patch: code and test

Comment by Scott Bale [ 26/Jun/14 11:27 PM ]

I tested on an actual source file, and the exception message included the file/line/col info as desired:

user=> CompilerException java.lang.RuntimeException: Qualified symbol goo/bar refers to nonexistent namespace: goo, compiling:(/home/scott/dev/foo.clj:3:1)
Comment by Andy Fingerhut [ 29/Aug/14 4:46 PM ]

Patch clj-1400-1.diff dated Jun 26 2014 no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. It did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch. See section "Updating Stale Patches" on this wiki page for some tips on updating patches: http://dev.clojure.org/display/community/Developing+Patches

Comment by Scott Bale [ 31/Aug/14 3:53 PM ]

Attached is an updated patch: "clj-1400-2.diff". I removed the stale patch.

Comment by Alex Miller [ 09/Sep/14 9:29 AM ]

Few comments to address:

  • Compiler diff was using spaces, not tabs, which makes it harder to diff. I attached a -3.diff that fixes this.
  • the call to namesStaticMember seems weird. The name of that method is confusing for this use. Beyond that, I think it's doing more than you need. That method is going to attempt resolve the qualified name in terms of the current ns, but I think you don't even want to do that. Rather you just want to know if the sym has a ns (sym.ns != null) - isn't that enough?
  • In what case will the other error "Var doesn't exist" occur? In other words, in what case will lookupVar not succeed in creating a new var here? If there is no such case, then remove this case. If there is such a case, then add a test.
Comment by Scott Bale [ 11/Sep/14 11:19 PM ]

Agree with all three of your bullets. Attached is an updated patch, clj-1400-4.diff.

  • I used tabs in Compiler.java
  • After close inspection of call to lookupVar(...), I believe null is returned only in the case of exactly this ticket (the symbol having a non-null namespace which has not been loaded yet). So I've taken out the conditional and the 2nd branch.
  • (Test is unchanged)
Comment by Scott Bale [ 11/Sep/14 11:22 PM ]

(properly named patch)

Comment by Alex Miller [ 11/Sep/14 11:37 PM ]

You could throw a CompilerException with the location of the problem instead (as the ticket description suggests).

Comment by Scott Bale [ 19/Sep/14 2:37 PM ]

Sorry, I should've mentioned because this wasn't obvious to me either (and in fact I forgot until just now): the RuntimeException is already caught and wrapped in a CompilerException.

I'm not sure which try-catch block within Compiler.java this is happening in, there are multiple. But you can see in the output that the exception is a CompilerException and the file|line|col info is there:

In the Repl...

user> (def foo/bar 1)
CompilerException java.lang.RuntimeException: Qualified symbol foo/bar refers to nonexistent namespace: foo, compiling:(NO_SOURCE_PATH:1:1)

...or in a source file

user=> CompilerException java.lang.RuntimeException: Qualified symbol goo/bar refers to nonexistent namespace: goo, compiling:(/home/scott/dev/foo.clj:3:1)

Also, at the point at which the RuntimeException of this patch is being thrown, the source line and col params to CompilerException are not available, or at least not afaict.

Comment by Alex Miller [ 07/Oct/14 12:34 PM ]

I'll follow up on this patch later - Rich thought it was making too many assumptions. I think we validated many of those but need to double-check those.





[CLJ-1322] doseq with several bindings causes "ClassFormatError: Invalid Method Code length" Created: 10/Jan/14  Updated: 25/Mar/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Miikka Koskinen Assignee: Unassigned
Resolution: Unresolved Votes: 6
Labels: None
Environment:

Clojure 1.5.1, java 1.7.0_25, OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)


Attachments: Text File doseq-bench.txt     Text File doseq.patch     File script.clj    
Patch: Code
Approval: Incomplete

 Description   

Important Perf Note the new impl is faster for collections that are custom-reducible but not chunked, and is also faster for large numbers of bindings. The original implementation is hand tuned for chunked collections, and wins for larger chunked coll/smaller binding count scenarios, presumably due to the fn call/return tracking overhead of reduce. Details are in the comments.
Screened By
Patch doseq.patch

user=> (def a1 (range 10))
#'user/a1
user=> (doseq [x1 a1 x2 a1 x3 a1 x4 a1 x5 a1 x6 a1 x7 a1 x8 a1] (do))
CompilerException java.lang.ClassFormatError: Invalid method Code length 69883 in class file user$eval1032, compiling:(NO_SOURCE_PATH:2:1)

While this example is silly, it's a problem we've hit a couple of times. It's pretty surprising when you have just a couple of lines of code and suddenly you get the code length error.



 Comments   
Comment by Kevin Downey [ 18/Apr/14 12:20 AM ]

reproduces with jdk 1.8.0 and clojure 1.6

Comment by Nicola Mometto [ 22/Apr/14 5:35 PM ]

A potential fix for this is to make doseq generate intermediate fns like `for` does instead of expanding all the code directly.

Comment by Ghadi Shayban [ 25/Jun/14 8:39 PM ]

Existing doseq handles chunked-traversal internally, deciding the
mechanics of traversal for a seq. In addition to possibly conflating
concerns, this is causing a code explosion blowup when more bindings are
added, approx 240 bytes of bytecode per binding (without modifiers).

This approach redefs doseq later in core.clj, after protocol-based
reduce (and other modern conveniences like destructuring.)

It supports the existing :let, :while, and :when modifiers.

New is a stronger assertion that modifiers cannot come before binding
expressions. (Same semantics as let, i.e. left to right)

valid: [x coll :when (foo x)]
invalid: [:when (foo x) x coll]

This implementation does not suffer from the code explosion problem.
About 25 bytes of bytecode + 1 fn per binding.

Implementing this without destructuring was not a party, luckily reduce
is defined later in core.

Comment by Andy Fingerhut [ 26/Jun/14 12:25 AM ]

For anyone reviewing this patch, note that there are already many tests for correct functionality of doseq in file test/clojure/test_clojure/for.clj. It may not be immediately obvious, but every test for 'for' defined with deftest-both is a test for 'for' and also for 'doseq'.

Regarding the current implementation of doseq: it in't simply that it is too many bytes per binding, it is that the code size doubles with each additional binding. See these results, which measures size of the macroexpanded form rather than byte code size, but those two things should be fairly linearly related to each other here:

(defn formsize [form]
  (count (with-out-str (print (macroexpand form)))))

user=> (formsize '(doseq [x (range 10)] (print x)))
652
user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y)))
1960
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z)))
4584
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w)))
9947
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p)))
20997

Here are results for the same expressions after Ghadi's patch doseq.patch dated June 25 2014:

user=> (formsize '(doseq [x (range 10)] (print x)))
93
user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y)))
170
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z)))
247
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w)))
324
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p)))
401

It would be good to see some performance results with and without this patch, too.

Comment by Stuart Halloway [ 28/Jun/14 2:21 PM ]

In the tests below, the new impl is called "doseq2", vs. the original impl "doseq"

(def hund (into [] (range 100)))
(def ten (into [] (range 10)))
(def arr (int-array 100))
(def s "superduper")

;; big seq, few bindings: doseq2 LOSES
(dotimes [_ 5]
  (time (doseq [a (range 100000000)])))
;; 1.2 sec

(dotimes [_ 5]
  (time (doseq2 [a (range 100000000)])))
;; 1.8 sec

;; small unchunked reducible, few bindings: doseq2 wins
(dotimes [_ 5]
  (time (doseq [a s b s c s])))
;; 0.5 sec

(dotimes [_ 5]
  (time (doseq2 [a s b s c s])))
;; 0.2 sec

(dotimes [_ 5]
  (time (doseq [a arr b arr c arr])))
;; 40 msec

(dotimes [_ 5]
  (time (doseq2 [a arr b arr c arr])))
;; 8 msec

;; small chunked reducible, few bindings: doseq2 LOSES
(dotimes [_ 5]
  (time (doseq [a hund b hund c hund])))
;; 2 msec

(dotimes [_ 5]
  (time (doseq2 [a hund b hund c hund])))
;; 8 msec

;; more bindings: doseq2 wins bigger and bigger
(dotimes [_ 5]
  (time (doseq [a ten b ten c ten d ten ])))
;; 2 msec

(dotimes [_ 5]
  (time (doseq2 [a ten b ten c ten d ten ])))
;; 0.4 msec

(dotimes [_ 5]
  (time (doseq [a ten b ten c ten d ten e ten])))
;; 18 msec

(dotimes [_ 5]
  (time (doseq2 [a ten b ten c ten d ten e ten])))
;; 1 msec
Comment by Ghadi Shayban [ 28/Jun/14 6:23 PM ]

Hmm, I cannot reproduce your results.

I'm not sure whether you are testing with lein, on what platform, what jvm opts.

Can we test using this little harness instead directly against clojure.jar? I've attached a the harness and two runs of results (one w/ default heap, the other 3GB w/ G1GC)

I added a medium and small (range) too.

Anecdotally, I see doseq2 outperform in all cases except the small range. Using criterium shows a wider performance gap favoring doseq2.

I pasted the results side by side for easier viewing.

core/doseq                          doseq2
"Elapsed time: 1610.865146 msecs"   "Elapsed time: 2315.427573 msecs"
"Elapsed time: 2561.079069 msecs"   "Elapsed time: 2232.479584 msecs"
"Elapsed time: 2446.674237 msecs"   "Elapsed time: 2234.556301 msecs"
"Elapsed time: 2443.129809 msecs"   "Elapsed time: 2224.302855 msecs"
"Elapsed time: 2456.406103 msecs"   "Elapsed time: 2210.383112 msecs"

;; med range, few bindings:
core/doseq                          doseq2
"Elapsed time: 28.383197 msecs"     "Elapsed time: 31.676448 msecs"
"Elapsed time: 13.908323 msecs"     "Elapsed time: 11.136818 msecs"
"Elapsed time: 18.956345 msecs"     "Elapsed time: 11.137122 msecs"
"Elapsed time: 12.367901 msecs"     "Elapsed time: 11.049121 msecs"
"Elapsed time: 13.449006 msecs"     "Elapsed time: 11.141385 msecs"

;; small range, few bindings:
core/doseq                          doseq2
"Elapsed time: 0.386334 msecs"      "Elapsed time: 0.372388 msecs"
"Elapsed time: 0.10521 msecs"       "Elapsed time: 0.203328 msecs"
"Elapsed time: 0.083378 msecs"      "Elapsed time: 0.179116 msecs"
"Elapsed time: 0.097281 msecs"      "Elapsed time: 0.150563 msecs"
"Elapsed time: 0.095649 msecs"      "Elapsed time: 0.167609 msecs"

;; small unchunked reducible, few bindings:
core/doseq                          doseq2
"Elapsed time: 2.351466 msecs"      "Elapsed time: 2.749858 msecs"
"Elapsed time: 0.755616 msecs"      "Elapsed time: 0.80578 msecs"
"Elapsed time: 0.664072 msecs"      "Elapsed time: 0.661074 msecs"
"Elapsed time: 0.549186 msecs"      "Elapsed time: 0.712239 msecs"
"Elapsed time: 0.551442 msecs"      "Elapsed time: 0.518207 msecs"

core/doseq                          doseq2
"Elapsed time: 95.237101 msecs"     "Elapsed time: 55.3067 msecs"
"Elapsed time: 41.030972 msecs"     "Elapsed time: 30.817747 msecs"
"Elapsed time: 42.107288 msecs"     "Elapsed time: 19.535747 msecs"
"Elapsed time: 41.088291 msecs"     "Elapsed time: 4.099174 msecs"
"Elapsed time: 41.03616 msecs"      "Elapsed time: 4.084832 msecs"

;; small chunked reducible, few bindings:
core/doseq                          doseq2
"Elapsed time: 31.793603 msecs"     "Elapsed time: 40.082492 msecs"
"Elapsed time: 17.302798 msecs"     "Elapsed time: 28.286991 msecs"
"Elapsed time: 17.212189 msecs"     "Elapsed time: 14.897374 msecs"
"Elapsed time: 17.266534 msecs"     "Elapsed time: 10.248547 msecs"
"Elapsed time: 17.227381 msecs"     "Elapsed time: 10.022326 msecs"

;; more bindings:
core/doseq                          doseq2
"Elapsed time: 4.418727 msecs"      "Elapsed time: 2.685198 msecs"
"Elapsed time: 2.421063 msecs"      "Elapsed time: 2.384134 msecs"
"Elapsed time: 2.210393 msecs"      "Elapsed time: 2.341696 msecs"
"Elapsed time: 2.450744 msecs"      "Elapsed time: 2.339638 msecs"
"Elapsed time: 2.223919 msecs"      "Elapsed time: 2.372942 msecs"

core/doseq                          doseq2
"Elapsed time: 28.869393 msecs"     "Elapsed time: 2.997713 msecs"
"Elapsed time: 22.414038 msecs"     "Elapsed time: 1.807955 msecs"
"Elapsed time: 21.913959 msecs"     "Elapsed time: 1.870567 msecs"
"Elapsed time: 22.357315 msecs"     "Elapsed time: 1.904163 msecs"
"Elapsed time: 21.138915 msecs"     "Elapsed time: 1.694175 msecs"
Comment by Ghadi Shayban [ 28/Jun/14 6:47 PM ]

It's good that the benchmarks contain empty doseq bodies in order to isolate the overhead of traversal. However, that represents 0% of actual real-world code.

At least for the first benchmark (large chunked seq), adding in some tiny amount of work did not change results signifantly. Neither for (map str [a])

(range 10000000) =>  (map str [a])
core/doseq
"Elapsed time: 586.822389 msecs"
"Elapsed time: 563.640203 msecs"
"Elapsed time: 369.922975 msecs"
"Elapsed time: 366.164601 msecs"
"Elapsed time: 373.27327 msecs"
doseq2
"Elapsed time: 419.704021 msecs"
"Elapsed time: 371.065783 msecs"
"Elapsed time: 358.779231 msecs"
"Elapsed time: 363.874448 msecs"
"Elapsed time: 368.059586 msecs"

nor for intrisics like (inc a)

(range 10000000)
core/doseq
"Elapsed time: 317.091849 msecs"
"Elapsed time: 272.360988 msecs"
"Elapsed time: 215.501737 msecs"
"Elapsed time: 206.639181 msecs"
"Elapsed time: 206.883343 msecs"
doseq2
"Elapsed time: 241.475974 msecs"
"Elapsed time: 193.154832 msecs"
"Elapsed time: 198.757873 msecs"
"Elapsed time: 197.803042 msecs"
"Elapsed time: 200.603786 msecs"

I still see reduce-based doseq ahead of the original, except for small seqs

Comment by Ghadi Shayban [ 04/Aug/14 2:55 PM ]

A form like the following will not work with this patch:

(go (doseq [c chs] (>! c :foo)))

as the go macro doesn't traverse fn boundaries. The only such code I know is core.async/mapcat*, a private fn supporting a fn that is marked deprecated.

Comment by Ghadi Shayban [ 07/Aug/14 2:09 PM ]

I see #'clojure.core/run! was just added, which has a similar limitation

Comment by Rich Hickey [ 29/Aug/14 8:19 AM ]

Please consider Ghadi's feedback, esp re: closures.

Comment by Ghadi Shayban [ 22/Sep/14 4:36 PM ]

The current expansion of a doseq [1] under a go form is less than ideal due to the amount of control flow. 14 states in the state machine vs. 7 with loop/recur

[1] Comparison of macroexpansion of (go ... doseq) vs (go ... loop/recur)
https://gist.github.com/ghadishayban/639009900ce1933256a1

Comment by Nicola Mometto [ 25/Mar/15 6:07 PM ]

Related: CLJ-77





[CLJ-1161] sources jar has bad versions.properties resource Created: 11/Feb/13  Updated: 21/Jul/15

Status: Reopened
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4, Release 1.5
Fix Version/s: Release 1.8

Type: Defect Priority: Minor
Reporter: Steve Miner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-CLJ-1161-Remove-version.properties-from-sources-JAR.patch    
Patch: Code
Approval: Incomplete

 Description   

The "sources" jar (at least since Clojure 1.4 and including 1.5 RC) has a bad version.properties file in it. The resource clojure/version.properties is literally:

version=${version}

The regular Clojure jar has the correct version string in that resource.

I came across a problem when I was experimenting with the sources jar (as used by IDEs). I naively added the sources jar to my classpath, and Clojure died on start up. The bad clojure/versions.properties file was found first, which led to a parse error as the clojure version was being set.

Solution: patch leaves version.properties file out of sources JAR, where it causes problems for tools.



 Comments   
Comment by Steve Miner [ 11/Feb/13 10:04 AM ]

Notes from the dev mailing list:

The "sources" JAR is generated by another Maven plugin, configured here:
https://github.com/clojure/clojure/blob/clojure-1.5.0-RC15/pom.xml#L169-L181

The simplest solution might be to just exclude the file from the sources jar. It looks like maven-source-plugin has an excludes option which would do the trick:

http://maven.apache.org/plugins/maven-source-plugin/jar-mojo.html#excludes

Comment by Jeff Valk [ 21/Apr/14 8:20 AM ]

This issue is marked closed, but I'm still seeing it: the clojure-1.6.0-sources.jar, clojure-1.5.1-sources.jar, etc on Maven Central still contain the bad version.properties files. More specifically, it looks like the fix has been applied to builds in the SNAPSHOTS repository, but not to RELEASES.

Fix applied: https://oss.sonatype.org/content/repositories/snapshots/org/clojure/clojure/
Not fixed: https://oss.sonatype.org/content/repositories/releases/org/clojure/clojure/

Comment by Alex Miller [ 24/Apr/14 4:15 PM ]

Not sure what's needed here, but marking incomplete.

Comment by Jeff Valk [ 25/Apr/14 11:13 AM ]

Would a fix for this update existing sources jars (1.5.1, 1.6.0, etc) on Central? Or would any fix have to wait on the next Clojure release?

Comment by Alex Miller [ 25/Apr/14 12:37 PM ]

For all the same reasons that mutable state is undesirable, changing an existing release jar in the central Maven repository is also undesirable. While it's not technically impossible, we will not update existing releases and this will need to wait for the next. I've looked at this problem a little and I do not yet know enough to know how to fix it or why it even varies between snapshot and release. Help welcome!

In which tool do you see a resulting problem from this?

Comment by Jeff Valk [ 25/Apr/14 11:56 PM ]

Despite the way I phrased the question, I'd hoped that would be the answer. It's the right policy.

Unfortunately, this issue leaves the released sources jars essentially unusable from a tools standpoint. CIDER now has source code navigation from stacktraces – you can jump into both Clojure and Java function definitions from the error/trace. For the latter, the sources jar (for Clojure or any other Java library) needs to be on the classpath as a dev dependency. There's more host interop support in the works for CIDER too ("embrace the host platform"), but not being able to add a dependency on a stable Clojure sources jar presents a wrinkle.

Are the official Clojure releases built by Hudson? The Hudson build right before the 1.6.0 release (#532) and the one right after (#534) both show the exclusion fix, as does the git clojure-1.6.0 tag, which when I check out and build from source, is fine. The Hudson builds with release tags (e.g. 1.6 = #533, 1.6-RC1 = #512, etc), though, don't show any artifacts other than a pom.xml. This would seem to make it rather hard to audit builds...am I missing something?





[CLJ-1152] PermGen leak in multimethods and protocol fns when evaled Created: 30/Jan/13  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Chouser Assignee: Unassigned
Resolution: Unresolved Votes: 8
Labels: memory, protocols

Attachments: File naive-lru-for-multimethods-and-protocols.diff     File protocol_multifn_weak_ref_cache.diff    
Patch: Code
Approval: Incomplete

 Description   

There is a PermGen memory leak that we have tracked down to protocol methods and multimethods called inside an eval, because of the caches these methods use. The problem only arises when the value being cached is an instance of a class (such as a function or reify) that was defined inside the eval. Thus extending IFn or dispatching a multimethod on an IFn are likely triggers.

Reproducing: The easiest way that I have found to test this is to set "-XX:MaxPermSize" to a reasonable value so you don't have to wait too long for the PermGen spaaaaace to fill up, and to use "-XX:+TraceClassLoading" and "-XX:+TraceClassUnloading" to see the classes being loaded and unloaded.

leiningen project.clj
(defproject permgen-scratch "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.5.0-RC1"]]
  :jvm-opts ["-XX:MaxPermSize=32M"
             "-XX:+TraceClassLoading"
             "-XX:+TraceClassUnloading"])

You can use lein swank 45678 and connect with slime in emacs via M-x slime-connect.

To monitor the PermGen usage, you can find the Java process to watch with "jps -lmvV" and then run "jstat -gcold <PROCESS_ID> 1s". According to the jstat docs, the first column (PC) is the "Current permanent space capacity (KB)" and the second column (PU) is the "Permanent space utilization (KB)". VisualVM is also a nice tool for monitoring this.

Multimethod leak

Evaluating the following code will run a loop that eval's (take* (fn foo [])).

multimethod leak
(defmulti take* (fn [a] (type a)))

(defmethod take* clojure.lang.Fn
  [a]
  '())

(def stop (atom false))
(def sleep-duration (atom 1000))

(defn run-loop []
  (when-not @stop
    (eval '(take* (fn foo [])))
    (Thread/sleep @sleep-duration)
    (recur)))

(future (run-loop))

(reset! sleep-duration 0)

In the lein swank session, you will see many lines like below listing the classes being created and loaded.

[Loaded user$eval15802$foo__15803 from __JVM_DefineClass__]
[Loaded user$eval15802 from __JVM_DefineClass__]

These lines will stop once the PermGen space fills up.

In the jstat monitoring, you'll see the amount of used PermGen space (PU) increase to the max and stay there.

-    PC       PU        OC          OU       YGC    FGC    FGCT     GCT
 31616.0  31552.7    365952.0         0.0      4     0    0.000    0.129
 32000.0  31914.0    365952.0         0.0      4     0    0.000    0.129
 32768.0  32635.5    365952.0         0.0      4     0    0.000    0.129
 32768.0  32767.6    365952.0      1872.0      5     1    0.000    0.177
 32768.0  32108.2    291008.0     23681.8      6     2    0.827    1.006
 32768.0  32470.4    291008.0     23681.8      6     2    0.827    1.006
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258

A workaround is to run prefer-method before the PermGen space is all used up, e.g.

(prefer-method take* clojure.lang.Fn java.lang.Object)

Then, when the used PermGen space is close to the max, in the lein swank session, you will see the classes created by the eval'ing being unloaded.

[Unloading class user$eval5950$foo__5951]
[Unloading class user$eval3814]
[Unloading class user$eval2902$foo__2903]
[Unloading class user$eval13414]

In the jstat monitoring, there will be a long pause when used PermGen space stays close to the max, and then it will drop down, and start increasing again when more eval'ing occurs.

-    PC       PU        OC          OU       YGC    FGC    FGCT     GCT
 32768.0  32767.9    159680.0     24573.4      6     2    0.167    0.391
 32768.0  32767.9    159680.0     24573.4      6     2    0.167    0.391
 32768.0  17891.3    283776.0     17243.9      6     2   50.589   50.813
 32768.0  18254.2    283776.0     17243.9      6     2   50.589   50.813

The defmulti defines a cache that uses the dispatch values as keys. Each eval call in the loop defines a new foo class which is then added to the cache when take* is called, preventing the class from ever being GCed.

The prefer-method workaround works because it calls clojure.lang.MultiFn.preferMethod, which calls the private MultiFn.resetCache method, which completely empties the cache.

Protocol leak

The leak with protocol methods similarly involves a cache. You see essentially the same behavior as the multimethod leak if you run the following code using protocols.

protocol leak
(defprotocol ITake (take* [a]))

(extend-type clojure.lang.Fn
  ITake
  (take* [this] '()))

(def stop (atom false))
(def sleep-duration (atom 1000))

(defn run-loop []
  (when-not @stop
    (eval '(take* (fn foo [])))
    (Thread/sleep @sleep-duration)
    (recur)))

(future (run-loop))

(reset! sleep-duration 0)

Again, the cache is in the take* method itself, using each new foo class as a key.

Workaround: A workaround is to run -reset-methods on the protocol before the PermGen space is all used up, e.g.

(-reset-methods ITake)

This works because -reset-methods replaces the cache with an empty MethodImplCache.

Patch: protocol_multifn_weak_ref_cache.diff

Screened by:



 Comments   
Comment by Chouser [ 30/Jan/13 9:10 AM ]

I think the most obvious solution would be to constrain the size of the cache. Adding an item to the cache is already not the fastest path, so a bit more work could be done to prevent the cache from growing indefinitely large.

That does raise the question of what criteria to use. Keep the first n entries? Keep the n most recently used (which would require bookkeeping in the fast cache-hit path)? Keep the n most recently added?

Comment by Jamie Stephens [ 18/Oct/13 9:35 AM ]

At a minimum, perhaps a switch to disable the caches – with obvious performance impact caveats.

Seems like expensive LRU logic is probably the way to go, but maybe don't have it kick in fully until some threshold is crossed.

Comment by Alex Miller [ 18/Oct/13 4:28 PM ]

A report seeing this in production from mailing list:
https://groups.google.com/forum/#!topic/clojure/_n3HipchjCc

Comment by Adrian Medina [ 10/Dec/13 11:43 AM ]

So this is why we've been running into PermGen space exceptions! This is a fairly critical bug for us - I'm making extensive use of multimethods in our codebase and this exception will creep in at runtime randomly.

Comment by Kevin Downey [ 17/Apr/14 9:52 PM ]

it might be better to split this in to two issues, because at a very abstract level the two issues are the "same", but concretely they are distinct (protocols don't really share code paths with multimethods), keeping them together in one issue seems like a recipe for a large hard to read patch

Comment by Kevin Downey [ 26/Jul/14 5:49 PM ]

naive-lru-method-cache-for-multimethods.diff replaces the methodCache in multimethods with a very naive lru cache built on PersistentHashMap and PersistentQueue

Comment by Kevin Downey [ 28/Jul/14 7:09 PM ]

naive-lru-for-multimethods-and-protocols.diff creates a new class clojure.lang.LRUCache that provides an lru cache built using PHashMap and PQueue behind an IPMap interface.

changes MultiFn to use an LRUCache for its method cache.

changes expand-method-impl-cache to use an LRUCache for MethodImplCache's map case

Comment by Kevin Downey [ 30/Jul/14 3:10 PM ]

I suspect my patch naive-lru-for-multimethods-and-protocols.diff is just wrong, unless MethodImplCache really is being used as a cache we can't just toss out entries when it gets full.

looking at the deftype code again, it does look like MethidImplCache is being used as a cache, so maybe the patch is fine

if I am sure of anything it is that I am unsure so hopefully someone who is sure can chime in

Comment by Nicola Mometto [ 31/Jul/14 11:02 AM ]

I haven't looked at your patch, but I can confirm that the MethodImplCache in the protocol function is just being used as a cache

Comment by dennis zhuang [ 08/Aug/14 6:21 AM ]

I developed a new patch that convert the methodCache in MultiFn to use WeakReference for dispatch value,and clear the cache if necessary.

I've test it with the code in ticket,and it looks fine.The classes will be unloaded when perm gen is almost all used up.

Comment by Alex Miller [ 22/Aug/14 4:55 PM ]

I don't know which to evaluate here. Does multifn_weak_method_cache.diff supersede naive-lru-for-multimethods-and-protocols.diff or are these alternate approaches both under consideration?

Comment by Kevin Downey [ 22/Aug/14 8:26 PM ]

the most straight forward thing, I think, is to consider them as alternatives, I am not a huge fan of weakrefs, but of course not using weakrefs we have to pick some bounding size for the cache, and the cache has a strong reference that could prevent a gc, so there are trade offs. My reasons to stay away from weak refs in general are using them ties the behavior of whatever you are building to the behavior of the gc pretty strongly. that may be considered a matter of personal taste

Comment by Andy Fingerhut [ 29/Aug/14 4:31 PM ]

All patches dated Aug 8 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day.

I have not checked how easy or difficult it might be to update the patches.

Comment by Kevin Downey [ 29/Aug/14 7:00 PM ]

I've updated naive-lru-for-multimethods-and-protocols.diff to apply to the current master

Comment by Andy Fingerhut [ 29/Aug/14 7:34 PM ]

Thanks, Kevin. While JIRA allows multiple attachments to a ticket with the same filename but different contents, that can be confusing for people looking for a particular patch, and for a program I have that evaluates patches for things like whether they apply and build cleanly. Would you mind removing the older one, or in some other way making all the names unique?

Comment by Kevin Downey [ 29/Aug/14 8:43 PM ]

I deleted all of my attachments accept for my latest and greatest

Comment by dennis zhuang [ 30/Aug/14 9:51 AM ]

I updated multifn_weak_method_cache2.diff patch too.

I think using weak reference cache is better,because we have to keep one cache per multifn.When you have many multi-functions, there will be many LRU caches in memory,and they will consume too much memory and CPU for evictions. You can't choose a proper threshold for LRU cache in every environment.
But i don't have any benchmark data to support my opinion.

Comment by Alex Miller [ 10/Sep/14 2:38 PM ]

I'm going to set the LRU cache patch aside. I don't think it's possible to find a "correct" size for it and it seems weird to me to extend APersistentMap to build such a thing anyways.

I think it makes more sense to follow the same strategy used for other caches (such as the Keyword cache) - a combination ConcurrentHashMap with WeakReferences and a ReferenceQueue for clean-up. I don't see any compelling reason not to take the same path as other internal caches.

Comment by Alex Miller [ 10/Sep/14 3:44 PM ]

Stepping back a little to think about the problem.... our requirements are:
1) cache map of dispatch value (could be any Object) to multimethod function (IFn)
2) do we want keys to be compared based on equality or identity? identity-based opens up more reference-based caching options and is fine for most common dispatch types (Class, Keyword), but reduces (often eliminates?) cache hits for all other types where values are likely to be equiv but not identical (vector of strings for example)
3) concurrent access to cache
4) cache cannot grow without bound
5) cache cannot retain strong references to dispatch values (the cache keys) because the keys might be instances of classes that were loaded in another classloader which will prevent GC in permgen

multifn_weak_method_cache.diff uses a ConcurrentHashMap (#3) that maps RefWrapper around keys to IFn (#1). The patch uses Util.equals() (#2) for (Java) equality-based comparisons. The RefWrapper wraps them in WeakReferences to avoid #5. Cache clearing based on the ReferenceQueue is used to prevent #4.

A few things definitely need to be fixed:

  • Util.equals() should be Util.equiv()
  • methodCache and rq should be final
  • Why does RefWrapper have obj and expect rq to possibly be null?
  • RefWrapper fields should all be final
  • Whitespace errors in patch

Another idea entirely - instead of caching dispatch value, cache based on hasheq of dispatch value then equality check on value. Could then use WeakHashMap and no RefWrapper.

This patch does not cover the protocol cache. Is that just waiting for the multimethod case to look good?

Comment by dennis zhuang [ 10/Sep/14 7:18 PM ]

Hi, alex, thanks for your review.But the latest patch is multifn_weak_method_cache2.diff. I will update the patch soon by your review, but i have a few questions to be explained.

1) I will use Util.equiv() instead of Util.equals().But what's the difference of them?
2) When the RefWrapper is retained as key in ConcurrentHashMap, it wraps the obj in WeakReference.But when trying to find it in ConcurrentHashMap, it uses obj directly as strong reference, and create it with passing null ReferenceQueue.Please look at the multifn_weak_method_cache2.diff line number 112. It short, the patch stores the dispatch value as weak reference in cache,but uses strong reference for cache getting.

3) If caching dispatch value based on hasheq , can we avoid hasheq value conflicts? If two different dispatch value have a same hasheq( or why it doesn't happen?), we would be in trouble.

Sorry, the patch doesn't cover the protocol cache, i will add it ASAP.

Comment by dennis zhuang [ 11/Sep/14 2:02 AM ]

The new patch 'protocol_multifn_weak_ref_cache.diff' is uploaded.

1) Using Util.equiv() instead of Util.equals()
2) Moved the RefWrapper and it's associated methods to Util.java, and refactor the code based on alex's review.
3) Fixed whitespace errors.
4) Fixed PermGen leak in protocol fns.

Comment by Alex Miller [ 03/Oct/14 10:35 AM ]

I screened this ticket again with Brenton Ashworth and had the following comments:

1) We need to have a performance test to verify that we have not negatively impacted performance of multimethods or protocol invocation.
2) Because there are special cases around null keys in the multimethod cache, please verify that there are existing example tests using null dispatch values in the existing test coverage.
3) In Util$RefWrapper.getObj() - why does this return this.ref at the end? It was not clear to me that the comment was correct or that this was useful in any way.
4) In Util$RefWrapper.clearRefWrapCache() - can k == null in that if check? If not, can we omit that? Also, if you explicitly create the Iterator from the entry set, you can call .remove() on it more efficiently than calling .remove() on the cache itself.
5) In core_deftype / MethodImplCache, it appears that you are modifying a now-mutable field rather than the prior version that was going to great lengths to stay immutable. It's not clear to me what the implications of this change are and that concerns me. Can it use a different collection or code to stay immutable?
6) Please update the description of this ticket to include an approach section that describes the changes we are making.

Thanks!





[CLJ-787] transient blows up when passed a vector created by subvec Created: 03/May/11  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Alexander Redington Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: None

Attachments: Text File CLJ-787-p1.patch    
Patch: Code and Test
Approval: Incomplete

 Description   

Subvectors created with subvec from a PersistentVector cannot be made transient:

user=> (transient (subvec [1 2 3 4] 2))
ClassCastException clojure.lang.APersistentVector$SubVector cannot be cast to clojure.lang.IEditableCollection  clojure.core/transient (core.clj:2864)

Cause: APersistentVector$SubVector does not implement IEditableCollection

Patch: CLJ-787-p1.patch

Approach: Create a TransientSubVector based on an underlying TransientVector.

Two assumptions:

  • It's okay for TransientSubVector to delegate the ensureEditable functionality to the underlying TransientVector (sometimes explicitly, sometimes implicitly) - calling ensureEditable explicitly also requires that the field for the underlying vector be the concrete TransientVector type rather than the ITransientVector interface.
  • When an operation that would throw an exception on a PersistentVector happens from the wrong thread (or after persistent!), we throw that exception rather than the IllegalAccessError that transients throw when accessed inappropriately.


 Comments   
Comment by Stuart Sierra [ 31/May/11 9:28 AM ]

Confirmed. APersistentVector$SubVector does not implement IEditableCollection.

The current implementation of TransientVector depends on implementation details of PersistentVector, so it is not a trivial fix. The simplest fix might be to implement IEditableCollection.asTransient in SubVector by creating a new PersistentVector, but I do not know the performance implications.

Comment by Gary Fredericks [ 25/May/13 8:11 PM ]

We could get the same performance characteristics as SubVector by creating a TransientSubVector based on an underlying TransientVector, right?

Preparing a patch to that effect.

Comment by Gary Fredericks [ 25/May/13 10:58 PM ]

Text from the commit msg:

Made two assumptions:

  • It's okay for TransientSubVector to delegate the ensureEditable
    functionality to the underlying TransientVector (sometimes
    explicitely, sometimes implicitely) – calling ensureEditable
    explicitely also requires that the field for the underlying vector
    be the concrete TransientVector type rather than the
    ITransientVector interface.
  • When an operation that would throw an exception on a
    PersistentVector happens from the wrong thread (or after
    persistent!), we throw that exception rather than the
    IllegalAccessError that transients throw when accessed
    inappropriately.
Comment by Alex Miller [ 11/Oct/13 4:17 PM ]

I think there are some assumptions being made in this patch about the class structure here that do not hold. The structure is, admittedly, quite twisty.

A counter-example that highlights one of a few subtypes of APersistentVector that are not PersistentVector (like MapEntry):

user=> (transient (subvec (first {:a 1}) 0 1))
ClassCastException clojure.lang.MapEntry cannot be cast to clojure.lang.IEditableCollection  clojure.lang.APersistentVector$TransientSubVector.<init> (APersistentVector.java:592)

PersistentVector.SubVector expects to work on anything that implements IPersistentVector. Note that this includes concrete types such as MapEntry and LazilyPersistentVector, but could also be any user-implemented type IPersistentVector type too. TransientSubVector is making the assumption that the IPersistentVector in a SubVector question is also an IEditableCollection (that can be converted to be transient). Note that while PersistentVector implements TransientVector (and IEditableCollection), APersistentVector does not. To really implement this in tandem with SubVector, I think you would need to guarantee that IPersistentVector extended IEditableCollection and I don't think that's something we want to do.

I don't see an easy solution. Any time I see all these modifiers (Transient, Sub, etc) being created in different combinations, it is a clear sign that independent kinds of functionality are being remixed into single inheritance OO trees. You can see the same thing in most collection libraries (even Java's - need a ConcurrentIdentitySortedMap? too bad!).

Needs more thought.

Comment by Andy Fingerhut [ 08/Nov/13 10:17 AM ]

Patch CLJ-787-p1.patch no longer applies cleanly to latest master, but it is only because of some new tests added to the transients.clj file since the patch was created, so it is trivial to update in that sense. Not updating it now due to other more significant issues with the patch described above.

Comment by Alex Miller [ 17/Jan/14 10:19 AM ]

No good solution to consider right now, removing from 1.6.





[CLJ-700] contains? broken for transient collections Created: 01/Jan/11  Updated: 31/Jul/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.2
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Herwig Hochleitner Assignee: Rich Hickey
Resolution: Unresolved Votes: 17
Labels: transient

Attachments: Java Source File 0001-Refactor-of-some-of-the-clojure-.java-code-to-fix-CL.patch     File clj-700-7.diff     File clj-700-8.diff     Text File clj-700-9.patch     File clj-700.diff     Text File clj-700-patch4.txt     Text File clj-700-patch6.txt     Text File clj-700-rt.patch    
Patch: Code and Test
Approval: Incomplete

 Description   

Behavior with Clojure 1.6.0:

user=> (contains? (transient {:x "fine"}) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentArrayMap$TransientArrayMap  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient (hash-map :x "fine")) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentHashMap$TransientHashMap  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient [1 2 3]) 0)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentVector$TransientVector  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient #{:x}) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentHashSet$TransientHashSet  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (:x (transient #{:x}))
nil
;; expected: :x

user=> (get (transient #{:x}) :x)
nil
;; expected: :x

Cause: This is caused by expectations in clojure.lang.RT regarding the type of collections for some methods, e.g. contains() and getFrom(). Checking for contains looks to see if the instance passed in is Associative (a subinterface of PersistentCollection), or IPersistentSet.

Approach: Expand the types that RT.getFrom(), RT.contains(), and RT.find() can handle to cover the additional transient interfaces.

Alternative: Other older patches (prob best exemplified by clj-700-8.diff) restructure the collections type hierarchy. That is a much bigger change than the one taken here but is perhaps a better long-term path. That patch refactors several of the Clojure interfaces so that logic abstract from the issue of immutability is pulled out to a general interface (e.g. ISet, IAssociative), but preserves the contract specified (e.g. Associatives only return Associatives when calling assoc()). With more general interfaces in place the contains() and getFrom() methods were then altered to conditionally use the general interfaces which are agnostic of persistence vs. transience.

Screening Notes

  • the extra conditions in RT add branches to some key functions. get already has a getFrom optimization, but there is no similar containsFrom or findFrom. Is it worth measuring the possible impact of these?
  • I believe the interface refactoring approach (not taken here) is worth separate consideration as an enhancement. If this is done, I think leveraging valAt would be simpler, e.g. allowing HashMap and ArrayMap to share code
  • it is not evident (to me anyway) why some API fns consume ILookup and others do not, among e.g. contains?, get, and find. Possible doc enhancement?
  • there is test code already in place (data_structures.clj) that could easily be expanded to cover transients. It would be nice to do this, or better yet get some test.check tests in place

Patch: clj-700-9.patch



 Comments   
Comment by Herwig Hochleitner [ 01/Jan/11 8:01 PM ]

the same is also true for TransientVectors

{{(contains? (transient [1 2 3]) 0)}}

false

Comment by Herwig Hochleitner [ 01/Jan/11 8:25 PM ]

As expected, TransientSets have the same issue; plus an additional, probably related one.

(:x (transient #{:x}))

nil

(get (transient #{:x}) :x)

nil

Comment by Alexander Redington [ 07/Jan/11 2:07 PM ]

This is caused by expectations in clojure.lang.RT regarding the type of collections for some methods, e.g. contains() and getFrom(). Checking for contains looks to see if the instance passed in is Associative (a subinterface of PersistentCollection), or IPersistentSet.

This patch refactors several of the Clojure interfaces so that logic abstract from the issue of immutability is pulled out to a general interface (e.g. ISet, IAssociative), but preserves the contract specified (e.g. Associatives only return Associatives when calling assoc()).

With more general interfaces in place the contains() and getFrom() methods were then altered to conditionally use the general interfaces which are agnostic of persistence vs. transience. Includes tests in transients.clj to verify the changes fix this problem.

Comment by Stuart Halloway [ 28/Jan/11 10:35 AM ]

Rich: Patch doesn't currently apply, but I would like to get your take on approach here. In particular:

  1. this represents working back from the defect to rethinking abstractions (good!). Does it go far enough?
  2. what are good names for the interfaces introduced here?
Comment by Alexander Redington [ 25/Mar/11 7:44 AM ]

Rebased the patch off the latest pull of master as of 3/25/2011, it should apply cleanly now.

Comment by Stuart Sierra [ 17/Feb/12 2:59 PM ]

Latest patch does not apply as of f5bcf647

Comment by Andy Fingerhut [ 17/Feb/12 5:59 PM ]

clj-700-patch2.txt does patch cleanly to latest Clojure head as of a few mins ago. No changes to patch except in context around changed lines.

Comment by Andy Fingerhut [ 07/Mar/12 3:23 AM ]

Sigh. Git patches applied via 'git am' are fragile beasts indeed. Look at them the wrong way and they fail to apply.

clj-700-patch3.txt applies cleanly to latest master as of Mar 7, 2012, but not if you use this command:

git am -s < clj-700-patch3.txt

I am pretty sure this is because of DOS CR/LF line endings in the file src/jvm/clojure/lang/Associative.java. The patch does apply cleanly if you use this command:

git am --keep-cr -s < clj-700-patch3.txt

Comment by Andy Fingerhut [ 23/Mar/12 6:34 PM ]

This ticket was changed to Incomplete and waiting on Rich when Stuart Halloway asked for feedback on the approach on 28/Jan/2011. Stuart Sierra changed it to not waiting on Rich on 17/Feb/2012 when he noted the patch didn't apply cleanly. Latest patch clj-700-patch3.txt does apply cleanly, but doesn't change the approach used since the time Stuart Halloway's concern was raised. Should it be marked as waiting on Rich again? Something else?

Comment by Stuart Halloway [ 08/Jun/12 12:44 PM ]

Patch 4 incorporates patch 3, and brings it up to date on hashing (i.e. uses hasheq).

Comment by Andy Fingerhut [ 08/Jun/12 12:52 PM ]

Removed clj-700-patch3.txt in favor of Stuart Halloway's improved clj-700-patch4.txt dated June 8, 2012.

Comment by Andy Fingerhut [ 18/Jun/12 3:06 PM ]

clj-700-patch5.txt dated June 18, 2012 is the same as Stuart Halloway's clj-700-patch4.txt, except for context lines that have changed in Clojure master since Stuart's patch was created. clj-700-patch4.txt no longer applies cleanly.

Comment by Andy Fingerhut [ 19/Aug/12 4:47 AM ]

Adding clj-700-patch6.txt, which is identical to Stuart Halloway's clj-700-patch4.txt, except that it applies cleanly to latest master as of Aug 19, 2012. Note that as described above, you must use the --keep-cr option to 'git am' when applying this patch for it to succeed. Removing clj-700-patch5.txt, since it no longer applies cleanly.

Comment by Stuart Sierra [ 24/Aug/12 1:08 PM ]

Patch fails as of commit 1c8eb16a14ce5daefef1df68d2f6b1f143003140

Comment by Andy Fingerhut [ 24/Aug/12 1:53 PM ]

Which patch did you try, and what command did you use? I tried applying clj-700-patch6.txt to the same commit, using the following command, and it applied, albeit with the warning messages shown:

% git am --keep-cr -s < clj-700-patch6.txt
Applying: Refactor of some of the clojure .java code to fix CLJ-700.
/Users/jafinger/clj/latest-clj/clojure/.git/rebase-apply/patch:29: trailing whitespace.
public interface Associative extends IPersistentCollection, IAssociative{
warning: 1 line adds whitespace errors.
Applying: more CLJ-700: refresh to use hasheq

Note the --keep-cr option, which is necessary for this patch to succeed. It is recommended in the "Screening Tickets" section of the JIRA workflow wiki page here: http://dev.clojure.org/display/design/JIRA+workflow

Comment by Andy Fingerhut [ 28/Aug/12 5:48 PM ]

Presumptuously changing Approval from Incomplete back to None, since the latest patch does apply cleanly if the --keep-cr option is used. It was in Screened state recently, but I'm not so presumptuous as to change it to Screened

Comment by Alex Miller [ 19/Aug/13 12:26 PM ]

I think through a series of different hands on this ticket it got knocked way back in the list. Re-marking vetted as it's previously been all the way up through screening. Should also keep an eye on CLJ-787 as it may have some collisions with this one.

Comment by Andy Fingerhut [ 08/Nov/13 10:14 AM ]

clj-700-7.diff is identical to clj-700-patch6.txt, except it applies cleanly to latest master. Only some lines of context in a test file have changed.

When I say "applies cleanly", I mean that there is one warning when using the proper "git am" command from the dev wiki page. This is because one line replaced in Associative.java has a CR/LF at the end of the line, because all lines in that file do.

Comment by Herwig Hochleitner [ 17/Feb/14 9:54 AM ]

Since clojure 1.5, contains? throws an IllegalArgumentException on transients.
In 1.6.0-beta1, transients are no longer marked as alpha.

Does this mean, that we won't be able to distinguish between a nil value and no value on a transient?

Comment by Stuart Halloway [ 27/Jun/14 10:20 AM ]

Request for someone to (1) update patch to apply cleanly, and (2) summarize approach so I don't have to read through the comment history.

Comment by Andy Fingerhut [ 27/Jun/14 11:02 AM ]

The latest patch is clj-700-7.diff dated Nov 8, 2013. I believe it is impossible to create a patch that applies any more cleanly using git for source files that have carriage returns in them, which at least one modified source file does. Here is the command I used on latest Clojure master as of today (Jun 27 2014), which is the same as that of March 25 2014:

% git am -s --keep-cr --ignore-whitespace < ~/clj/patches/clj-700-7.diff 
Applying: Refactor of some of the clojure .java code to fix CLJ-700.
/Users/admin/clj/latest-clj/clojure/.git/rebase-apply/patch:29: trailing whitespace.
public interface Associative extends IPersistentCollection, IAssociative{
warning: 1 line adds whitespace errors.
Applying: more CLJ-700: refresh to use hasheq

If you want a patch that doesn't have the 'trailing whitespace' warning in it, I think someone would have to commit a change that removed the carriage returns from file Associative.java. If you want such a patch, let me know and we can remove all of them from every source file and be done with this annoyance.

Comment by Andy Fingerhut [ 27/Jun/14 11:19 AM ]

Updated description to contain a copy of only those comments that seemed 'interesting'. Most comments have simply been "attached an updated patch that applies cleanly", or "changed the state of this ticket for reason X".

Comment by Alex Miller [ 27/Jun/14 1:19 PM ]

Looks like Andy did as requested, moving back to Screenable.

Comment by Andy Fingerhut [ 29/Aug/14 4:27 PM ]

Patch clj-700-7.diff dated Nov 8 2013 no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. It did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch.

Comment by Andy Fingerhut [ 01/Sep/14 3:59 AM ]

Patch clj-700-8.diff dated Sep 1 2014 is identical to clj-700-7.diff, except that it applies "cleanly" to latest master, by which I mean it applies as cleanly as I think it is possible to apply for a git patch to a file with carriage return/line feed line endings, as one of the modified files still does.

Comment by Alex Miller [ 17/Dec/14 3:12 PM ]

Added new patch with alternate approach that just makes RT know about transients instead of refactoring the class hierarchy.

clj-700-rt.patch

In some ways I think the class hierarchy refactoring is due, but I'm not totally on board with all the changes in those patches and it has impacts on collections outside Clojure itself that are hard to reason about.

Comment by Rich Hickey [ 31/Jul/15 11:35 AM ]

I'd like to look at the type hierarchy myself





[ASYNC-103] promise-chan Created: 05/Nov/14  Updated: 08/Jul/15

Status: Reopened
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Unresolved Votes: 3
Labels: None

Attachments: Text File async-103-2.patch     Text File async-103-3.patch     Text File async-103-4.patch     Text File async-103-5.patch     Text File async-103.patch    
Patch: Code and Test
Approval: Incomplete
Waiting On: Alex Miller

 Description   

A promise-buf is a buffer that can be passed to a channel, giving the channel promise semantics:

  • buf can only take on one value ever
  • buf is never emptied: N consumers can all get the same, one and only value by reading from the channel

In order to make close! sensible with this new kind of buf

  • all buffers need to be closeable
  • closing a channel closes its buffer
  • all basic buffer types do nothing when closed
  • a promise-buf make a one-time transition to having no value when closed
  • closing a buffer is implementation detail not exposed to end users

Approach:

  • Buffer protocol now has a close-buf! function (close! would have overlapped the Channel function close!). close-buf! is invoked on a buffer when the channel is closed, allowing it to update itself if necessary.
  • Existing buffers implement close-buf! and do nothing (buffer still available for taking)
  • New promise-buffer implementation. Makes a one-time transition when value is supplied or buffer is closed. value is held in an unsynchronized-mutable deftype field - updates via add!* or close-buf! always happen under the containing channel mutex.
  • New api functions: promise-chan creates a channel with a promise-buffer and promise-buffer.

Patch: async-103-5.patch



 Comments   
Comment by Ghadi Shayban [ 07/Nov/14 4:06 PM ]

My initial gut reaction was that this is more related to semantics of the channel, not the buffer, and I'm wary of significant changes to esp. the impl protocols. But after seeing an impl, it looks straightforward and the changes aren't too significant. (Another possibility is to make another simpler implementation of a channel, with just slots for the value, lock and pending takers before the value is delivered. No slots for xfn or putters or buffer handling would be needed.)

Note an atom is not needed in the PromiseBuffer, just a set! on a mutable field to be inline with the other buffer flavors. If the patch continues using an atom, maybe guard val to not swap unnecessarily.

Comment by Alex Miller [ 07/Nov/14 11:09 PM ]

Good call on the atom - backed off to just using existing clojure.lang.Box. If that's too weird, will just deftype it.

Comment by Alex Miller [ 07/Nov/14 11:18 PM ]

Dur, just need the val field itself.

Comment by Fogus [ 09/Jan/15 4:27 PM ]

I'd really love to see the reason for the current impl over a more pointed promise channel (perhaps as described by Ghadi). This is a clear implementation, but with the addition of close-buf! some complexity is added for implementations in the future. Specifically, I'd like to at least see a dcostring in the protocol describing the meaning and desired semantics around close-buf! and what it should return and when. Like I said, this is a quality patch, but I'm just concerned that there are unstated assumptions around close-buf! that escape me.

Comment by Alex Miller [ 28/Jan/15 12:43 PM ]

Fogus, I looked into implementing the promise-chan a bit today but it requires replicating a significant portion of the existing chan implementation. I believe that's the primary reason to do it this way, just leveraging the majority chan code.

New -5 patch has two changes - I commented the Buffer protocol fns, and I removed promise-buffer from the api, as I think users should only use promise-chan.

Comment by Fogus [ 30/Jan/15 10:06 AM ]

The patch looks good. My only (minor) reservation is that the Buffer docstrings are written under the assumption that people will use instances only in the context of a guarded channel. I understand why of course, so I think I might be too pedantic here. Because of this I see no reason not to +1 this patch.

Comment by Alex Miller [ 23/Feb/15 8:32 AM ]

Applied patch.

Comment by Alex Miller [ 03/Apr/15 3:57 PM ]

Reopening (and likely reverting commit).

This example demonstrates that only one of the waiting blocks will be notified, instead of all, as we'd like:

(let [c (promise-chan)] 
  ;; takers
  (dotimes [_ 3] 
    (go (println (<! c)))) 

  ;; put, should get to all takers
  (go (>! c 1)))
Comment by Jozef Wagner [ 30/May/15 10:35 AM ]

ASYNC-124 fixes the issue mentioned by Alex

Comment by Leon Grapenthin [ 11/Jun/15 11:44 AM ]

Quote: "buf is never emptied: N consumers can all get the same, one and only value by reading from the channel"

This is not true with the new closing semantics introduced here, correct? If I deref a Clojure promise, I get the same value everytime. But what is the purpose in closing promise-chans after the promise has been delivered?

When is a good time to close a promise-chan? If I close after the put, which is what many core.async processes do, takers must have taken before the close which means it will often be a race condition unless explicit care is taken.

Is there a benefit or am I simply missing something here?

Comment by Herwig Hochleitner [ 08/Jul/15 2:23 PM ]

Here is my cljs implementation of a promise channel: https://github.com/webnf/webnf/blob/master/cljs/src/webnf/promise.cljs
I'm the sole author and you are free to use any of it.





Generated at Mon Aug 03 12:24:52 CDT 2015 using JIRA 4.4#649-r158309.