[CLJ-979] Clojure resolves to wrong deftype classes when AOT compiling or reloading Created: 03/May/12  Updated: 18/Feb/15  Resolved: 10/Jan/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3, Release 1.4, Release 1.5, Release 1.6, Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Critical
Reporter: Edmund Jackson Assignee: Unassigned
Resolution: Completed Votes: 14
Labels: aot, classloader, compiler

Attachments: Text File CLJ-979.patch     Text File clj-979-symptoms.patch     Text File CLJ-979-v2.patch     Text File CLJ-979-v3.patch     Text File CLJ-979-v4.patch     Text File CLJ-979-v5.patch     Text File CLJ-979-v6.patch     Text File CLJ-979-v7.patch    
Patch: Code and Test
Approval: Ok


Compiling a class via `deftype` during AOT compilation gives different results for the different constructors. These hashes should be identical.

user=> (binding [*compile-files* true] (eval '(deftype Abc [])))
user=> (hash Abc)
user=> (hash (class (->Abc)))
31966239 ;; should be 16446700

This also means that whenever there's a stale AOT compiled deftype class in the classpath, that class will be used rather then the JIT compiled one, breaking repl interaction.

Another demonstration of this classloader issue (from CLJ-1495) when reloading deftypes (no AOT) :

user> (defrecord Foo [bar])
user> (= (->Foo 42) #user.Foo{:bar 42}) ;;expect this to evaluate to true
user> (defrecord Foo [bar])
user> (= (->Foo 42) #user.Foo{:bar 42}) ;;expect this to evaluate to true also -- but it doesn't!

This bug also affects AOT compilation of multimethods that dispatch on a class, this affected core.match for years see http://dev.clojure.org/jira/browse/MATCH-86, http://dev.clojure.org/jira/browse/MATCH-98. David had to work-around this issue by using a bunch of protocols instead of multimethods.

Cause of the bug: currently clojure uses Class.forName to resolve a class from a class name, which ignores the class cache from DynamicClassLoader thus reloading deftypes or mixing AOT compilation at the repl with deftypes breaks, resolving to the wrong class.

Approach: the current patch (CLJ-979-v7.patch) addresses this issue in multiple ways:

  • it makes RT.classForName/classForNameNonLoading look in the class cache before delegating to Class/forName if the current classloader is not a DynamicClassLoader (this incidentally addresses also CLJ-1457)
  • it makes clojure use RT.classForName/classForNameNonLoading instead of Class/forName
  • it overrides Classloader/loadClass so that it's class cache aware – this method is used by the jvm to load classes
  • it changes gen-interface to always emit an in-memory interface along with the [optional] on-disk interface so that the in-memory class is always updated.

Patch: CLJ-979-v7.patch

Screened by: Alex Miller

Comment by Scott Lowe [ 12/May/12 9:05 PM ]

I can't reproduce this under Clojure 1.3 or 1.4, and Leiningen 1.7.1 on either Java 1.7.0-jdk7u4-b21 OpenJDK 64-Bit or Java 1.6.0_31 Java HotSpot 64-Bit. OS is Mac OS X 10.7.

Edmund, how are you running this AOT code? I wrapped your code in a main function and built an uberjar from it.

Comment by Edmund Jackson [ 13/May/12 2:20 AM ]

Hi Scott,


I have two use cases
1. AOT compile and call from repl.
My steps: git clone, lein compile, lein repl, (use 'aots.death), (in-ns 'aots.death), (= (class (Dontwork. nil)) (class (map->Dontwork {:a 1}))) => false

2. My original use case, which I've minimised here, is an AOT ns, producing a genclass that is called instantiated from other Java (no main). This produces the same error. I will produce an example of this and post it too.

Comment by Edmund Jackson [ 13/May/12 4:23 AM ]

Hi Scott,

Here is an example of it failing in the interop case: https://github.com/ejackson/aotquestion2
The steps I'm following to compile this all up are

git clone git@github.com:ejackson/aotquestion2.git
cd aotquestion2/cljside/
lein uberjar
lein install
cd ../javaside/
mvn package
java -jar ./target/aotquestion-1.0-SNAPSHOT.jar

and it dies with this:

Exception in thread "main" java.lang.ClassCastException: cljside.core.Dontwork cannot be cast to cljside.core.Dontwork
at cljside.MyClass.makeDontwork(Unknown Source)
at aotquestion.App.main(App.java:8)

The error message is really confusing (to me, anyway), but I think its the same root problem as for the REPL case.

What do you see when you run the above ?

Comment by Scott Lowe [ 13/May/12 8:41 AM ]

Ah, yes, looks like my initial attempt to reproduce was too simplistic. I used your second git repo, and can now confirm that it's failing for me with the same error.

Comment by Scott Lowe [ 13/May/12 10:35 PM ]

I looked into this a little further and the AOT generated code looks correct, in the sense that both code paths appear to be returning the same type.

However, I wonder if this is really a ClassLoader issue, whereby two definitions of the same class are being loaded at different times, because that would cause the x.y.Class cannot be cast to x.y.Class exception that we're seeing here.

Comment by Steve Miner [ 03/Sep/13 9:54 AM ]

This could be related to CLJ-1157 which deals with a ClassLoader issue with AOT compiled code.

Comment by Ambrose Bonnaire-Sergeant [ 29/Mar/14 1:11 PM ]

I've tried this patch attached to CLJ-1157 and it did not solve this issue.

Comment by Ambrose Bonnaire-Sergeant [ 29/Mar/14 2:27 PM ]

This bug seems to be rooted in different behaviour for do/let under compilation. Attached a patch showing these symptoms in the hope it helps people find the cause.

Comment by Peter Taoussanis [ 22/Sep/14 3:12 AM ]

Just a quick note to confirm that this still seems to be around as of Clojure 1.7.0-alpha2. Don't have any useful input on possible solutions, sorry.

Comment by Alex Miller [ 04/Dec/14 1:12 PM ]

Duplicates - CLJ-1495, CLJ-1132

Comment by Nicola Mometto [ 04/Dec/14 1:50 PM ]

The attached patch fixes the classloader issues by routing RT.classForName & variants through the DynamicClassLoader class cache before invoking Class.forName

Comment by Nicola Mometto [ 04/Dec/14 1:59 PM ]

Re-adding triaged status added by Alex Miller that got accidentaly nuked by a race-condition between my edits to the ticket description and Alex's ones

Comment by Nicola Mometto [ 04/Dec/14 2:30 PM ]

0001-CLJ-979-make-clojure-resolve-to-the-correct-Class-in-v2.patch is the same as 0001-CLJ-979-make-clojure-resolve-to-the-correct-Class-in.patch except it unconditionally looks for classes in the class cache of DynamicClassLoader, even if baseLoader() is not a DynamicClassLoader.
This fixes the bug of CLJ-1457 but might just be a workaround

Comment by Michael Blume [ 11/Dec/14 3:29 PM ]

Current patch blows up my Clojure build


Comment by Nicola Mometto [ 11/Dec/14 3:45 PM ]

Michael: the current patch builds clojure fine for me, I'll try to reproduce. Which jvm version are you using?

Comment by Michael Blume [ 11/Dec/14 4:26 PM ]

[14:24][michael.blume@tcc-michael-4:~/workspace/clojure((0fc43db...))]$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
[14:24][michael.blume@tcc-michael-4:~/workspace/clojure((0fc43db...))]$ mvn -version
Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-11T13:58:10-07:00)
Maven home: /usr/local/Cellar/maven/3.2.3/libexec
Java version: 1.8.0_25, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.10.1", arch: "x86_64", family: "mac"

build was after I applied the patch to the current master branch of the clojure github repo

Comment by Andy Fingerhut [ 11/Dec/14 5:34 PM ]

I am seeing a similar compilation error as Michael Blume, with both JDK 1.7 and 1.8 on Mac OS X 10.9.5.

By accident I found that if I take latest Clojure master and do 'mvn package', then apply the patch CLJ-979.patch dated Dec 11 2014, then do 'mvn package' again without 'mvn clean', it compiles with no errors. If I do 'mvn clean' then 'mvn package' in a patched tree, I get the error every time I've tried.

Comment by Nicola Mometto [ 12/Dec/14 5:50 AM ]

The updated patch fixes the LinkageError Andy and Michael were getting.

Andy, Michael, can you confirm?

Comment by Nicola Mometto [ 12/Dec/14 9:38 AM ]

Added more testcases to new patch

Comment by Nicola Mometto [ 12/Dec/14 10:09 AM ]

Cleaned up the patch from whitespace changes

Comment by Andy Fingerhut [ 12/Dec/14 12:32 PM ]

I tried latest Clojure master plus patch CLJ-979-v4.patch, dated 12 Dec 2014, with Mac OS X 10.9.5 + JDK7, and Ubuntu Linux 14.04 with JDKs 6 through 9, and 'mvn clean' followed by 'mvn package' built and passed tests successfully with all of them.

I did notice that some files were created in the test directory that were not cleaned up by the end of the test, which you can use 'git status .' to see. Not sure if that is considered a bad thing for Clojure tests.

Comment by Nicola Mometto [ 12/Dec/14 1:07 PM ]

Thanks Andy, I've updated the patch and it now should remove all temporary classes created by the test.
It's probably not the best way to do it but I couldn't figure out how to do it another way.

Comment by Michael Blume [ 12/Dec/14 2:34 PM ]

Yep, looks good to me =)

Comment by Alex Miller [ 15/Dec/14 4:01 PM ]

Thanks first to Nicola for all his work so far on this!

Some feedback:
1) While the ticket itself isn't bad, I would really like to focus the title and description on a crisp statement of the real problem now that we understand it more. I'd like help on making sure we capture that correctly - how is this for a title: "Uses of Class.forName() miss classes in DynamicClassLoader cache?" ?

Similarly, the description should focus on the problem and show some examples. The defrecord one is good. The first example works for me before the patch and fails after?

2) The crux of this whole thing is the change in loading order in DCL.loadClass() - changing this is a big deal. We really need broader testing with things likely to be affected - off the top of my head: Immutant, Pomegranate, Leiningen, or anything else that monkeys with classloader stuff. Maybe something with Eclipse/OSGi if there is something testable out there.

3) DynamicClassLoader comments:
a) loadClass(String name) - I believe this is identical to the super impl, so can be removed.
b) findClass(String name) - now that we are hijacking loadClass(), I'm not sure it's even necessary to implement this or to call super.findClass() - if you get to the super.findClass(), I would expect that to always throw CNFE. Potentially, this method could even be removed (but this might do bad things if there are subclasses of DCL out there in the wild).
c) loadClass(String name, ...) - instead of calling findClass() and using the CNFE for control flow, you could just directly call findInMemoryClass(), then use a returned null as the decision factor. Again, this is possibly bad if there are DCL subclasses, so I'm on the fence about it.

4) Is the change in gen-interface something that should be a separate ticket? Seems like it could be separable.

5) I don't like the test changes with respect to set up and cleanup. The build already supports compiling a subset of test ns'es (like clojure.test-clojure.genclass.examples). I'd prefer to use that existing mechanism if at all possible. Check the build.xml for the hard-coded ns list.

6) What are the performance implications? I'm not expecting they are significant but we just made a bunch of changes for compilation performance and I'd hate to undo them all. Could findInMemoryClass be smarter about avoiding checks that can't succeed (avoiding "java.*" for example?).

Comment by Nicola Mometto [ 15/Dec/14 5:43 PM ]

1) It's not really about Class.forName() specifically, it's about DynamicClassLoader not being class cache aware in the loadClass method. The JVM uses the classloader loadClass method for resolving all kind of class usages,
including but not limited to Class.forName() (i.e. when loading some bytecode containing a "new" instruction, that class reference will be resolved via a call to loadClass)
I'll try to make the documentation a bit more clear, the first example is an exhibit of the bugged behaviour, the two calls should output the same hash.

2,4) So, there are 3 approaches to how DynamicClassLoader could go at it:

  • Prefer in-disk classes over in-memory classes, roughly the current approach (sometimes it will pick the in-memory class over the in-disk one causing weird bugs like Foo. and ->Foo constructing different classes), has the
    negative effect of breaking interaction between AOT compilation and JIT loading, which has created all sorts of troubles with redefinig deftypes/defprotocols in repls while having stale classfiles in disk.
  • Always pick the most-updated class, this has the advangate of being always correct but has several disadvantages that make it inpracticable in my opinion: we'd have to keep track of the timestamp in which a dynamic class
    is defined, and make the loadClass implementation such that if there a class is both in-memory and in-disk, it compares the timestamps and select the most updated one. This would complicate the implementation a lot and we'd
    likely have to pay a substantial performance hit.
  • Prefer in-memory classes over in-disk classes, the approach proposed in the current patches. It has the advantage of being almost always correct, make repl interaction & jit/aot mixing work fine and the implementation is
    mostly straightforward. The downside is that in cases like gen-class where an AOT class can actually be the most updated version, the in-memory version will be used. In clojure all the forms that do bytecode emission either
    only do AOT compilation or do AOT compilation on demand and always load the class in memory, except gen-interface that doesn't load the class in memory if it's being AOT compiled. Changing its semantics to behave like the
    other jit/aot compiling forms (deftype/defrecord/reify) is the only way to make this approach work so I don't think this should go in another ticket.

5) I don't like the previous testing strategy either but couldn't figure out a better way. Thanks for the pointer on the already in-place infrastructure, I'll check it out and update the patch

In the meantime I've uploaded a new patch addressing 3 and 6. Specifically:
3) I removed the unnecessary loadClass(String) arity, I've made loadClass(String, boolean) use findInMemoryClass(String) directly rather than relying on findClass(String) since nowhere in the documentation it guarantess that
findClass will be used by loadClass. However I've left the findClass(String) implementation in place in case there's code out there that relies on this.
6) I haven't done any serious testing but I haven't noticed any significant difference in compile times for any of my tools.* contrib libraries with the current patch. Filtering "java.*" class names before the inMemory check
didn't seem to produce any difference so it's not included in the updated patch. However I'll probably include an alternative patch with that filtering to do more performance testings and see if it can actually help a bit.

All this said, I'm afraid that I won't have time to personally do an in-depth benchmarking & cross project testing of this patch. I've been spending almost all the free time I had in the past weeks working through a bunch of tickets (mostly this one) but now because of school and other commitments I can't promise I will be able to do anything more than maintaining the current patch & answering to any questions about the bug. Any help in moving this ticket further would be appreciated, in particular to address points 2 and 6.

Comment by Alex Miller [ 16/Dec/14 8:33 AM ]

Thanks Nicola. I'll certainly take over sheparding the bug and appeal to the greater community for help in broad testing when I think we're ready for that.

Comment by Nicola Mometto [ 16/Dec/14 12:50 PM ]

Updated patch with better tests, addressing Alex Miller's comments.

Comment by Michael Blume [ 06/Jan/15 4:20 PM ]

New in-the-wild case: https://groups.google.com/forum/#!topic/clojure/WH6lJwH1vMg

Comment by Colin Fleming [ 17/Feb/15 10:30 PM ]

I believe I have a case which is now broken because of this patch. I'm not 100% clear on the details so I might be wrong, but I can't see any other explanation for what I'm seeing. The bug I'm looking at is Cursive #748. Cursive calls into Leiningen in-process, and before doing that I create a new DynamicClassLoader which uses an IntelliJ PluginClassLoader as its parent. I believe that what is happening is that PluginClassLoader.loadClass() delegates to various parent classloaders then throws CNFE if it can't find the class in question. Am I correct in thinking that after this patch DCL now delegates to the parent classloader before checking its URL list, whereas previously it did not?

Comment by Nicola Mometto [ 18/Feb/15 4:21 AM ]

Colin, I opened CLJ-1663 with a patch that should fix the issue. Can try it and comment on that ticket if it does?

Generated at Tue May 21 14:21:02 CDT 2019 using JIRA 4.4#649-r158309.