<< Back to previous view

[CLJ-1620] Constants are leaked in case of a reentrant eval Created: 18/Dec/14  Updated: 19/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Defect Priority: Critical
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: aot, compiler

Attachments: Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer.patch     Text File 0001-CLJ-1620-avoid-constants-leak-in-static-initalizer-v2.patch     Text File eval-bindings.patch    
Patch: Code
Approval: Triaged

 Description   

Compiling a function that references a non loaded (or unitialized) class triggers its init static. When the init static loads clojure code, some constants (source code I think) are leaked into the constants pool of the function under compilation.

It prevented CCW from working in some environments (Rational) because the static init of the resulting function was over 64K.

Steps to reproduce:

Load the leak.main ns and run the code in comments: the first function has 15 extra fiels despite being identical to the second one.

(ns leak.main)

(defn first-to-load []
  leak.Klass/foo)

(defn second-to-load []
  leak.Klass/foo)

(comment
=> (map (comp count #(.getFields %) class) [first-to-load second-to-load])
(16 1)
)
package leak;
 
import clojure.lang.IFn;
import clojure.lang.RT;
import clojure.lang.Symbol;
 
public class Klass {
  static {
    RT.var("clojure.core", "require").invoke(Symbol.intern("leak.leaky"));
  }
  public static IFn foo = RT.var("leak.leaky", "foo");
}
(ns leak.leaky)

(defn foo
  "Some doc"
  []
  "hello")

(def unrelated 42)

https://gist.github.com/cgrand/5dcb6fe5b269aecc6a5b#file-main-clj-L10



 Comments   
Comment by Christophe Grand [ 18/Dec/14 3:56 PM ]

Patch from Nicola Mometto

Comment by Nicola Mometto [ 18/Dec/14 4:01 PM ]

Attached the same patch with a more informative better commit message

Comment by Laurent Petit [ 18/Dec/14 4:03 PM ]

I'd like to thank Christophe and Alex for their invaluable help in understanding what was happening, formulating the right hypothesis and then finding a fix.

I would also mention that even if non IBM rational environments where not affected by the bug to the point were CCW would not work, they were still affected. For instance the class for a one-liner function wrapping an interop call weighs 700bytes once the patch is applied, when it weighed 90kbytes with current 1.6 or 1.7.

Comment by Laurent Petit [ 18/Dec/14 5:07 PM ]

In CCW for the initial problematic function, the -v2 patch produces exactly the same bytecode as if the referenced class does not load any namespace in its static initializers.
That is, the patch is valid. I will test it live in the IBM Rational environment ASAP.

Comment by Laurent Petit [ 19/Dec/14 12:10 AM ]

I confirm the patch fixes the issue detected initially in the IBM Rational environment





[CLJ-1610] Unrolled small maps Created: 08/Dec/14  Updated: 08/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.8

Type: Enhancement Priority: Critical
Reporter: Alex Miller Assignee: Zach Tellman
Resolution: Unresolved Votes: 1
Labels: collections

Approval: Vetted

 Description   

Placeholder for unrolled small maps enhancement (companion for vectors at CLJ-1517).






[CLJ-1546] Widen vec to take Iterable/IReduce Created: 02/Oct/14  Updated: 16/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Critical
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: PNG File benchmark.png     Text File clj-1546-2.patch     Text File clj-1546-3.patch     Text File clj-1546-4.patch     Text File clj-1546-5.patch     Text File clj-1546.patch    
Patch: Code and Test
Approval: Vetted

 Description   

These examples should work but do not:

Something Iterable but not IReduce:

user> (def i (eduction (map inc) (range 100)))
#'user/i
user> (instance? java.util.Collection i)
false
user> (instance? Iterable i)
true
user> (vec i)
RuntimeException Unable to convert: class clojure.core.Iteration to Object[]

Something IReduceInit but not Iterable:

user=> (vec
  (reify clojure.lang.IReduceInit
    (reduce [_ f start]
      (reduce f start (range 10)))))
RuntimeException Unable to convert: class user$reify__15 to Object[]

Proposal: Add PersistentVector.create(Iterable) and PersistentVector.create(IReduceInit) to efficiently create PVs from those.

For performance, vec has several cases:
1) (vec) if vector?: return new vector w/o meta - this matches prior behavior but has a constant cost of a few ns, rather than linear cost. If not a vector, spill to LazilyPersistentVector.create(Object).

2) (LPV) instanceof IReduceInit: Anything reducible can reduce itself fastest. Right now this has a big benefit for PersistentList. on 1.7.0-alpha4 with list of size 1024, into=28 seconds, vec=18 seconds. After patch, vec=7 seconds. If maps, sets, and range were IReduce later they would also use this path and see noticeable boosts. This is also the branch that will handle the Eduction and IReduceInit cases added in the patch.

3) (LPV) instanceof ISeq: If the coll is a sequence already, best to walk it rather than build an iterator or array from it. This calls into PersistentVector.create(ISeq). That implementation now contains an optimization to build into an array and construct the PersistentVector directly from the array for sequences <= 32 elements (which is most common). Once that threshold is reached, it switches to building with transients. The benchmark shows that the patch makes vec substantially faster for all seqs and even faster than into in some cases.

4) (LPV) instanceof Iterable: For all non-Clojure collections (ArrayList) and current non-IReduce Clojure collections (PHM, PHS), this is fastest path. Iterators are preferred to seqs as they do not cache or hold onto the values as they go by. The PV.create() for Iterable uses transients. Due to slightly more overhead, small maps and sets are slightly slower but this would be fixed by CLJ-1499 and/or making PHM/PHS IReduceInit.

5) (LPV) otherwise RT.toArray(): catches Map, String, Object[], primitive array, etc. The important ones here are the arrays - they are slightly slower on small arrays due to overhead of checking more cases above, but big arrays are significantly faster than they were.

In addition, there was one hard-coded path in the Compiler into PersistentVector.create() and I re-routed that through LazilyPersistentVector instead as that code is now the place to choose the fastest path logic.

Patch: clj-1546-5.patch

Screened by:



 Comments   
Comment by Timothy Baldridge [ 02/Oct/14 9:44 AM ]

Is there a reason the final case for (vec something) can't just be a call to (into [] coll)? It seems a bit odd to do (to-array) on anything thats not a java collection or Iterable, when we have IReduce.

Comment by Rich Hickey [ 02/Oct/14 10:02 AM ]

re: Tim - yes, this needs to support IReduce (and thereby educe) as well

Comment by Alex Miller [ 14/Oct/14 9:56 AM ]

Added new patch that handles Iterable and IReduceInit in vec. It also makes calling with a vector much faster due to the first check. into is still faster for chunked seqs (due to special InternalReduce handling of chunking).

It would be possible to move more of the variant checking into LazilyPersistentVector or PersistentVector so it could be used in more contexts. I'm not sure how much to do with that.

It would also be possible to instead lean on reduce more from the Java side if there was a Java version of reduce (as defined in mikera's branch for http://dev.clojure.org/jira/browse/CLJ-1192 at https://github.com/mikera/clojure/compare/clj-1192-vec-performance. Something like that is the only way I can see of leveraging that same InternalReduce logic that makes into faster than vec.

Comment by Alex Miller [ 13/Nov/14 4:14 PM ]

Prior comments from Stu removed from description: "Open Question: Which branch should come first, Collection or IReduceInit? Collection reaches the fast path for small collections through LazilyPersistentVector, but IReduceInit should be faster for larger things. Related: Shouldn't the item count in LazilyPersistentVector be a bounded count?"

I have attached a new patch that simplifies the impl to do it in LazilyPersistentVector instead of in vec, which was easier due to "and" not being able yet when vec is implemented to do the length check.

I have also done a considerable amount of analysis on the matrix of incoming collections and best path to follow and also collected some data on what collections are commonly passed into vec. The current patch reflects those findings. Some highlights:

  • vec is called with PersistentVector in all projects I tested. The instanceof check takes that case from typically 100s of nanos to ~5 ns. So I do think it is worth doing.
  • vec is overwhelmingly called with small collections - in most cases the incoming collection is <10 elements. In cases where the collection is not a sequence, the path of creating the Vector with an owning array is the fastest option, beating even IReduce and transient building (as that path has some checks involved).
  • PersistentList is the only IReduce likely to be encountered by vec right now and adding that branch is a significant performance boost from prior impl and vs into. If maps and sets were IReduce, they would gain this as well.
  • chunked seqs will be significantly faster with into than vec as into goes through CollReduce and can leverage many optimizations on reducing through chunks that are not available to vec.
  • seqs in general though are now faster with vec than they were due to leveraging transients.
  • eduction results support IReduce and are also faster with vec than into.
  • range is currently slower with vec, but when range is IReduce, it will probably be faster with vec

In summary, some new conventional wisdom (after this patch) on (into []) vs vec:

  • vec is faster if passed a vector, an IReduce, or an array
  • into is faster when working with seqs, but even vec is better than it used to be and may even be faster for things like range in the future
Comment by Michael Blume [ 13/Nov/14 7:24 PM ]

Latest patch won't build for me when applied to master

compile-clojure:
     [java] Exception in thread "main" java.lang.ExceptionInInitializerError
     [java] 	at clojure.lang.Compile.<clinit>(Compile.java:29)
     [java] Caused by: java.lang.NoSuchMethodError: clojure.lang.LazilyPersistentVector.create(Ljava/util/Collection;)Lclojure/lang/IPersistentVector;, compiling:(clojure/core.clj:14:23)
     [java] 	at clojure.lang.Compiler.load(Compiler.java:7206)
     [java] 	at clojure.lang.RT.loadResourceScript(RT.java:370)
     [java] 	at clojure.lang.RT.loadResourceScript(RT.java:361)
     [java] 	at clojure.lang.RT.load(RT.java:440)
     [java] 	at clojure.lang.RT.load(RT.java:411)
     [java] 	at clojure.lang.RT.doInit(RT.java:448)
     [java] 	at clojure.lang.RT.<clinit>(RT.java:329)
     [java] 	... 1 more
     [java] Caused by: java.lang.NoSuchMethodError: clojure.lang.LazilyPersistentVector.create(Ljava/util/Collection;)Lclojure/lang/IPersistentVector;
     [java] 	at clojure.lang.LispReader$VectorReader.invoke(LispReader.java:1073)
     [java] 	at clojure.lang.LispReader.readDelimitedList(LispReader.java:1138)
     [java] 	at clojure.lang.LispReader$ListReader.invoke(LispReader.java:972)
     [java] 	at clojure.lang.LispReader.read(LispReader.java:183)
     [java] 	at clojure.lang.LispReader$WrappingReader.invoke(LispReader.java:535)
     [java] 	at clojure.lang.LispReader.readDelimitedList(LispReader.java:1138)
     [java] 	at clojure.lang.LispReader$MapReader.invoke(LispReader.java:1081)
     [java] 	at clojure.lang.LispReader.read(LispReader.java:183)
     [java] 	at clojure.lang.LispReader$MetaReader.invoke(LispReader.java:716)
     [java] 	at clojure.lang.LispReader.readDelimitedList(LispReader.java:1138)
     [java] 	at clojure.lang.LispReader$ListReader.invoke(LispReader.java:972)
     [java] 	at clojure.lang.LispReader.read(LispReader.java:183)
     [java] 	at clojure.lang.Compiler.load(Compiler.java:7190)
     [java] 	... 7 more
Comment by Alex Miller [ 13/Nov/14 7:28 PM ]

Did you clean first? I replaced that static method call there with a wider version but if you are cleaning fresh it should be fine.

Comment by Michael Blume [ 13/Nov/14 7:31 PM ]

Apologies, maven just wasn't doing a good job of tracking changes, running mvn clean fixes the build.

Comment by Alex Miller [ 25/Nov/14 9:58 AM ]

Added benchmark.png showing times (in ns), tested with criterium, for into and vec on different types and sizes on 1.7.0-alpha4 and then vec again after the patch.





[CLJ-1544] AOT bug involving namespaces loaded before AOT compilation started Created: 01/Oct/14  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Defect Priority: Critical
Reporter: Allen Rohner Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: aot

Attachments: Text File 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co.patch     Text File 0001-CLJ-1544-force-reloading-of-namespaces-during-AOT-co-v2.patch    
Patch: Code
Approval: Vetted

 Description   

If namespace "a" that is being AOT compiled requires a namespace "b" that has been loaded but not AOT compiled, the classfile for that namespace will never be emitted on disk, causing errors when compiling uberjars or in other cases.

A minimal reproducible case is described in the following comment: http://dev.clojure.org/jira/browse/CLJ-1544?focusedCommentId=36734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-36734

Other examples of the bug:
https://github.com/arohner/clj-aot-repro
https://github.com/methylene/class-not-found

A real issue triggered by this bug: https://github.com/cemerick/austin/issues/23

Approach: The approach taken by the attached patch is to force reloading of namespaces during AOT compilation if no matching classfile is found in the compile-path or in the classpath



 Comments   
Comment by Alex Miller [ 04/Dec/14 12:45 PM ]

Possibly related: CLJ-1457

Comment by Nicola Mometto [ 05/Dec/14 4:51 AM ]

Has anyone been able to reproduce this bug from a bare clojure repl? I have been trying to take lein out of the equation for an hour but I don't seem to be able to reproduce it – this makes me think that it's possible that this is a lein/classlojure/nrepl issue rather than a compiler/classloader bug

Comment by Nicola Mometto [ 06/Dec/14 4:20 PM ]

I was actually able to reproduce and understand this bug thanks to a minimal example reduced from a testcase for CLJ-1413.

>cat error.sh
#!/bin/sh

rm -rf target && mkdir target

java -cp src:clojure.jar clojure.main - <<EOF
(require 'myrecord)
(set! *compile-path* "target")
(compile 'core)
EOF

java -cp target:clojure.jar clojure.main -e "(use 'core)"

> cat src/core.clj
(in-ns 'core)
(clojure.core/require 'myrecord)
(clojure.core/import myrecord.somerecord)

>cat src/myrecord.clj
(in-ns 'myrecord)
(clojure.core/defrecord somerecord [])

> ./error.sh
Exception in thread "main" java.lang.ExceptionInInitializerError
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at clojure.lang.RT.classForName(RT.java:2113)
	at clojure.lang.RT.classForName(RT.java:2122)
	at clojure.lang.RT.loadClassForName(RT.java:2141)
	at clojure.lang.RT.load(RT.java:430)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5403.invoke(core.clj:5808)
	at clojure.core$load.doInvoke(core.clj:5807)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5352.invoke(core.clj:5653)
	at clojure.core$load_lib.doInvoke(core.clj:5652)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5691)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:630)
	at clojure.core$use.doInvoke(core.clj:5785)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at user$eval212.invoke(NO_SOURCE_FILE:1)
	at clojure.lang.Compiler.eval(Compiler.java:6767)
	at clojure.lang.Compiler.eval(Compiler.java:6730)
	at clojure.core$eval.invoke(core.clj:3076)
	at clojure.main$eval_opt.invoke(main.clj:288)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.io.FileNotFoundException: Could not locate myrecord__init.class or myrecord.clj on classpath.
	at clojure.lang.RT.load(RT.java:443)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5403.invoke(core.clj:5808)
	at clojure.core$load.doInvoke(core.clj:5807)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5613)
	at clojure.core$load_lib$fn__5352.invoke(core.clj:5653)
	at clojure.core$load_lib.doInvoke(core.clj:5652)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5691)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5774)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at core__init.load(Unknown Source)
	at core__init.<clinit>(Unknown Source)
	... 33 more

This bug also has also affected Austin: https://github.com/cemerick/austin/issues/23

Essentially this bug manifests itself when a namespace defining a protocol or a type/record has been JIT loaded and a namespace that needs the protocol/type/record class is being AOT compiled later. Since the namespace defining the class has already been loaded the class is never emitted on disk.

Comment by Nicola Mometto [ 06/Dec/14 6:51 PM ]

I've attached a tentative patch fixing the issue in the only way I found reasonable: forcing the reloading of namespaces during AOT compilation if the compiled classfile is not found in the compile-path or in the classpath

Comment by Nicola Mometto [ 06/Dec/14 7:30 PM ]

Updated patch forces reloading of the namespace even if a classfile exists in the compile-path but the source file is newer, mimicking the logic of clojure.lang.RT/load

Comment by Nicola Mometto [ 06/Dec/14 7:39 PM ]

Further testing demonstrated that this bug is not only scoped to deftypes/defprotocols but can manifest itself in the general case of a namespace "a" requiring a namespace "b" already loaded, and AOT compiling the namespace "a"

Comment by Tassilo Horn [ 08/Dec/14 4:46 AM ]

I'm also affected by this bug. Is there some workaround I can apply in the meantime, e.g., by dictating the order in which namespaces are going to be loaded/compiled in project.clj?

Comment by Nicola Mometto [ 15/Dec/14 10:58 AM ]

Tassilo, if you don't have control over whether or not a namespace that an AOT namespace depends on has already been loaded before compilation starts, requiring those namespaces with :reload-all should be enough to work around this issue

Comment by Tassilo Horn [ 15/Dec/14 11:36 AM ]

Nicola, thanks! But in the meantime I've switched to using clojure.java.api and omit AOT-compilation. That works just fine, too.

Comment by Michael Blume [ 15/Dec/14 5:05 PM ]

Tassilo, that's often a good solution, another is to use a shim clojure class

(ns myproject.main-shim (:gen-class))

(defn -main [& args]
  (require 'myproject.main)
  ((resolve 'myproject.main) args))

then your shim namespace is AOT-compiled but nothing else in your project is.

Comment by Tassilo Horn [ 16/Dec/14 1:07 AM ]

Thanks Michael, that's a very good suggestion. In fact, I've always used AOT only as a means to export some functions to Java-land. Basically, I did as you suggest but required the to-be-exported fn's namespace in the ns-form which then causes AOT-compilation of that namespace and its own deps recursively. So your approach seems to be as convenient from the Java side (no need to clojure.java.require `require` in order to require the namespace with the fn I wanna call ) while still omitting AOT. Awesome!





[CLJ-1517] Unrolled small vectors Created: 01/Sep/14  Updated: 09/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Critical
Reporter: Zach Tellman Assignee: Unassigned
Resolution: Unresolved Votes: 15
Labels: collections, performance

Attachments: File unrolled-collections-2.diff     File unrolled-collections.diff     Text File unrolled-vector-2.patch     Text File unrolled-vector.patch    
Patch: Code
Approval: Vetted

 Description   

As discussed on the mailing list [1], this patch has two unrolled variants of vectors and maps, with special inner classes for each cardinality. Currently both grow to six elements before spilling over into the general versions of the data structures, which is based on rough testing but can be easily changed. At Rich's request, I haven't included any integration into the rest of the code, and there are top-level static create() methods for each.

The sole reason for this patch is performance, both in terms of creating data structures and performing operations on them. This can be seen as a more verbose version of the trick currently played with PersistentArrayMap spilling over into PersistentHashMap. Based on the benchmarks, which can be run by cloning cambrian-collections [2] and running 'lein test :benchmark', this should supplant PersistentArrayMap. Performance is at least on par with PAM, and often much faster. Especially noteworthy is the creation time, which is 5x faster for maps of all sizes (lein test :only cambrian-collections.map-test/benchmark-construction), and on par for 3-vectors, but 20x faster for 5-vectors. There are similar benefits for hash and equality calculations, as well as calls to reduce().

This is a big patch (over 5k lines), and will be kind of a pain to review. My assumption of correctness is based on the use of collection-check, and the fact that the underlying approach is very simple. I'm happy to provide a high-level description of the approach taken, though, if that will help the review process.

I'm hoping to get this into 1.7, so please let me know if there's anything I can do to help accomplish that.

[1] https://groups.google.com/forum/#!topic/clojure-dev/pDhYoELjrcs
[2] https://github.com/ztellman/cambrian-collections



 Comments   
Comment by Zach Tellman [ 01/Sep/14 10:13 PM ]

Oh, I forgot to mention that I didn't make a PersistentUnrolledSet, since the existing wrappers can use the unrolled map implementation. However, it would be moderately faster and more memory efficient to have one, so let me know if it seems worthwhile.

Comment by Nicola Mometto [ 02/Sep/14 5:23 AM ]

Zach, the patch you added isn't in the correct format, they need to be created using `git format-patch`

Comment by Nicola Mometto [ 02/Sep/14 5:31 AM ]

Also, I'm not sure if this is on-scope with the ticket but those patches break with *print-dup*, as it expects a static create(x) method for each inner class.

I'd suggest adding a create(Map x) static method for the inner PersistentUnrolledMap classes and a create(ISeq x) one for the inner PersistentUnrolledVector classes

Comment by Alex Miller [ 02/Sep/14 8:14 AM ]

Re making patches, see: http://dev.clojure.org/display/community/Developing+Patches

Comment by Jozef Wagner [ 02/Sep/14 9:16 AM ]

I wonder what is the overhead of having meta and 2 hash fields in the class. Have you considered a version where the hash is computed on the fly and where you have two sets of collections, one with meta field and one without, using former when the actual metadata is attached to the collection?

Comment by Zach Tellman [ 02/Sep/14 12:13 PM ]

I've attached a patch using the proper method. Somehow I missed the detailed explanation for how to do this, sorry. I know the guidelines say not to delete previous patches, but since the first one isn't useful I've deleted it to minimize confusion.

I did the print-dup friendly create methods, and then realized that once these are properly integrated, 'pr' will just emit these as vectors. I'm fairly sure the create methods aren't necessary, so I've commented them out, but I'm happy to add them back in if they're useful for some reason I can't see.

I haven't given a lot of thought to memory efficiency, but I think caching the hashes are worthwhile. I can see an argument for creating a "with-meta" version of each collection, but since that would double the size of an already enormous patch, I think that should probably wait.

Comment by Zach Tellman [ 03/Sep/14 4:31 PM ]

I found a bug! Like PersistentArrayMap, I have a special code path for comparing keywords, but my generators for collection-check were previously using only integer keys. There was an off-by-one error in the transient map implementation [1], which was not present for non-keyword lookups.

I've taken a close look for other gaps in my test coverage, and can't find any. I don't think this substantively changes the risk of this patch (an updated version of which has been uploaded as 'unrolled-collections-2.diff'), but obviously where there's one bug, there may be others.

[1] https://github.com/ztellman/cambrian-collections/commit/eb7dfe6d12e6774512dbab22a148202052442c6d#diff-4bf78dbf5b453f84ed59795a3bffe5fcR559

Comment by Zach Tellman [ 03/Oct/14 2:34 PM ]

As an additional data point, I swapped out the data structures in the Cheshire JSON library. On the "no keyword-fn decode" benchmark, the current implementation takes 6us, with the unrolled data structures takes 4us, and with no data structures (just lexing the JSON via Jackson) takes 2us. Other benchmarks had similar results. So at least in this scenario, it halves the overhead.

Benchmarks can be run by cloning https://github.com/dakrone/cheshire, unrolled collections can be tested by using the 'unrolled-collections' branch. The pure lexing benchmark can be reproduced by messing around with the cheshire.parse namespace a bit.

Comment by Zach Tellman [ 06/Oct/14 1:31 PM ]

Is there no way to get this into 1.7? It's an awfully big win to push off for another year.

Comment by Alex Miller [ 07/Oct/14 2:08 PM ]

Hey Zach, it's definitely considered important but we have decided to drop almost everything not fully done for 1.7. Timeframe for following release is unknown, but certainly expected to be significantly less than a year.

Comment by John Szakmeister [ 30/Oct/14 2:53 PM ]

You are all free to determine the time table, but I thought I'd point out that Zach is not entirely off-base. Clojure 1.4.0 was released April 5th, 2012. Clojure 1.5.0 was released March 1st, 2013 with 1.6.0 showing up March 25th, 2014. So it appears that the current cadence is around a year.

Comment by Alex Miller [ 30/Oct/14 3:40 PM ]

John, there is no point to comments like this. Let's please keep issue comments focused on the issue.

Comment by Zach Tellman [ 13/Nov/14 12:23 PM ]

I did a small write-up on this patch which should help in the eventual code review: http://blog.factual.com/using-clojure-to-generate-java-to-reimplement-clojure

Comment by Zach Tellman [ 07/Dec/14 10:34 PM ]

Per my conversation with Alex at the Conj, here's a patch that only contains the unrolled vectors, and uses the more efficient constructor for PersistentVector when spilling over.

Comment by Alex Miller [ 08/Dec/14 1:10 PM ]

Zach, I created a new placeholder for the map work at http://dev.clojure.org/jira/browse/CLJ-1610.

Comment by Jean Niklas L'orange [ 09/Dec/14 1:52 PM ]

It should probably be noted that core.rrb-vector will break for small vectors by this patch, as it peeks into the underlying structure. This will also break other libraries which peeks into the vector implementation internals, although I'm not aware of any other – certainly not any other contrib library.

Also, two comments on unrolled-vector.patch:

private transient boolean edit = true;
in the Transient class should probably be
private volatile boolean edit = true;
as transient means something entirely different in Java.

conj in the Transient implementation could invalidate itself without any problems (edit = false;) if it is converted into a TransientVector (i.e. spills over) – unless it has a notable overhead. The invalidation can prevent some subtle bugs related to erroneous transient usage.

Comment by Alex Miller [ 09/Dec/14 1:58 PM ]

Jean - understanding the scope of the impact will certainly be part of the integration process for this patch. I appreciate the heads-up. While we try to minimize breakage for things like this, it may be unavoidable for libraries that rely on implementation internals.

Comment by Michał Marczyk [ 09/Dec/14 2:03 PM ]

I'll add support for unrolled vectors to core.rrb-vector the moment they land on master. (Probably with some conditional compilation so as not to break compatibility with earlier versions of Clojure – we'll see when the time comes.)

Comment by Michał Marczyk [ 09/Dec/14 2:06 PM ]

I should say that it'd be possible to add generic support for any "vector lookalikes" by pouring them into regular vectors in linear time. At first glance it seems to me that that'd be out of line with the basic promise of the library, but I'll give it some more thought before the changes actually land.

Comment by Zach Tellman [ 09/Dec/14 5:43 PM ]

Somewhat predictably, the day after I cut the previous patch, someone found an issue [1]. In short, my use of the ArrayChunk wrapper applied the offset twice.

This was not caught by collection-check, which has been updated to catch this particular failure. It was, however, uncovered by Michael Blume's attempts to merge the change into Clojure, which tripped a bunch of alarms in Clojure's test suite. My own attempt to do the same to "prove" that it worked was before I added in the chunked seq functionality, hence this issue persisting until now.

As always, there may be more issues lurking. I hope we can get as many eyeballs on the code between now and 1.8 as possible.

[1] https://github.com/ztellman/cambrian-collections/commit/2e70bbd14640b312db77590d8224e6ed0f535b43
[2] https://github.com/MichaelBlume/clojure/tree/test-vector





[CLJ-1499] Replace seq-based iterators with direct iterators for all non-seq collections that use SeqIterator Created: 08/Aug/14  Updated: 02/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Critical
Reporter: Rich Hickey Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File clj-1499-all.diff     File clj-1499-v2.diff     File clj-1499-v3.diff     File clj-1499-v6.diff     Text File defrecord-iterator.patch     File defrecord-iterator-v2.diff    
Patch: Code and Test
Approval: Vetted

 Description   

Add support for direct iterators instead of seq-based iterators for non-seq collections that use SeqIterator.

Patch adds support for direct iterators on the following (removing use of SeqIterator):

  • PersistentHashMap - new internal iterator (~20% faster)
  • APersistentSet - use internal map impl iterator (~5% faster)
  • PersistentQueue
  • PersistentStructMap
  • records (in core_deftype.clj)

Patch does not change use of SeqIterator in:

  • LazyTransformer$MultiStepper (not sure if this could be changed)
  • ASeq
  • LazySeq

Patch: clj-1499-v6.diff



 Comments   
Comment by Alex Miller [ 13/Aug/14 1:57 PM ]

The list of non-seqs that uses SeqIterator are:

  • records (in core_deftype.clj)
  • APersistentSet - fallback, maybe is ok?
  • PersistentHashMap
  • PersistentQueue
  • PersistentStructMap

Seqs (that do not need to be changed) are:

  • ASeq
  • LazySeq.java

LazyTransformer$MultiStepper - not sure

Comment by Ghadi Shayban [ 27/Sep/14 2:16 PM ]

attached iterator impl for defrecords. ready to leverage iteration for extmap when PHM iteration lands.

Comment by Alex Miller [ 29/Sep/14 12:52 PM ]

PHM patch

Comment by Alex Miller [ 29/Sep/14 10:26 PM ]

New patch that fixes bugs with PHMs with null keys (and added tests to expose those issues), added support for PHS.

Comment by Ghadi Shayban [ 29/Sep/14 10:45 PM ]

Alex, the defrecord patch already uses the iterator for extmap. It's just made better by the PHM patch.

Comment by Alex Miller [ 29/Sep/14 10:47 PM ]

Comment by Ghadi Shayban [ 30/Sep/14 4:17 PM ]

Heh. Skate to where the puck is going to be – Gretzky

Re: defrecord iterator: As is, it propagates exceptions from reaching the end of the ExtMap's iterator. As noted in CLJ-1453, PersistentArrayMap's iterator improperly returns an ArrayIndexOutOfBoundsException, rather than NoSuchElementException.

Comment by Alex Miller [ 30/Sep/14 6:41 PM ]

Hey Ghadi, rather than rebuilding the case map to pass to the RecordIterator, why don't you just pass the fields in iteration order to it and leverage the case map via .valAt like everything else?

Comment by Ghadi Shayban [ 30/Sep/14 7:30 PM ]

defrecord-iterator-v2.diff reuses valAt and minimizes macrology.

Comment by Alex Miller [ 07/Oct/14 2:04 PM ]

Comments from Stu (found under the couch):

"1. some of the impls (e.g. queue manually concatenate two iters. Would implementing iter-cat and calling that be simpler and more robust?
2. I found this tweak to the generative testing more useful in reporting failure, non-dependent on clojure.test, and capable of expecting failures. Waddya think?

(defn seq-iter-match
  [seqable iterable]
  (let [i (.iterator iterable)]
    (loop [s (seq seqable)
           n 0]
      (if (seq s)
        (do
          (when-not (.hasNext i)
            (throw (ex-info "Iterator exhausted before seq"
                            {:pos n :seqable seqable :iterable iterable})))
          (when-not (= (.next i) (first s))
            (throw (ex-info "Iterator and seq did not match"
                            {:pos n :seqable seqable :iterable iterable})))
          (recur (rest s) (inc n)))
        (when (.hasNext i)
          (throw (ex-info "Seq exhausted before iterator"
                          {:pos n :seqable seqable :iterable iterable})))))))

		(defspec seq-and-iter-match-for-maps
  identity
  [^{:tag clojure.test-clojure.data-structures/gen-map} m]
  (seq-iter-match m m))

3. similar generative approach would be good for the other types (looks like we just do maps)"

Comment by Alex Miller [ 02/Dec/14 3:03 PM ]

Latest patch (clj-1499-v6.diff) makes Stu's suggested change #2 above and adds tests recommended in #3. I looked at #1 but decided in the end that it wasn't going to make anything easier.





[CLJ-1152] PermGen leak in multimethods and protocol fns when evaled Created: 30/Jan/13  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Chouser Assignee: Unassigned
Resolution: Unresolved Votes: 8
Labels: memory, protocols

Attachments: File naive-lru-for-multimethods-and-protocols.diff     File protocol_multifn_weak_ref_cache.diff    
Patch: Code
Approval: Incomplete

 Description   

There is a PermGen memory leak that we have tracked down to protocol methods and multimethods called inside an eval, because of the caches these methods use. The problem only arises when the value being cached is an instance of a class (such as a function or reify) that was defined inside the eval. Thus extending IFn or dispatching a multimethod on an IFn are likely triggers.

Reproducing: The easiest way that I have found to test this is to set "-XX:MaxPermSize" to a reasonable value so you don't have to wait too long for the PermGen spaaaaace to fill up, and to use "-XX:+TraceClassLoading" and "-XX:+TraceClassUnloading" to see the classes being loaded and unloaded.

leiningen project.clj
(defproject permgen-scratch "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.5.0-RC1"]]
  :jvm-opts ["-XX:MaxPermSize=32M"
             "-XX:+TraceClassLoading"
             "-XX:+TraceClassUnloading"])

You can use lein swank 45678 and connect with slime in emacs via M-x slime-connect.

To monitor the PermGen usage, you can find the Java process to watch with "jps -lmvV" and then run "jstat -gcold <PROCESS_ID> 1s". According to the jstat docs, the first column (PC) is the "Current permanent space capacity (KB)" and the second column (PU) is the "Permanent space utilization (KB)". VisualVM is also a nice tool for monitoring this.

Multimethod leak

Evaluating the following code will run a loop that eval's (take* (fn foo [])).

multimethod leak
(defmulti take* (fn [a] (type a)))

(defmethod take* clojure.lang.Fn
  [a]
  '())

(def stop (atom false))
(def sleep-duration (atom 1000))

(defn run-loop []
  (when-not @stop
    (eval '(take* (fn foo [])))
    (Thread/sleep @sleep-duration)
    (recur)))

(future (run-loop))

(reset! sleep-duration 0)

In the lein swank session, you will see many lines like below listing the classes being created and loaded.

[Loaded user$eval15802$foo__15803 from __JVM_DefineClass__]
[Loaded user$eval15802 from __JVM_DefineClass__]

These lines will stop once the PermGen space fills up.

In the jstat monitoring, you'll see the amount of used PermGen space (PU) increase to the max and stay there.

-    PC       PU        OC          OU       YGC    FGC    FGCT     GCT
 31616.0  31552.7    365952.0         0.0      4     0    0.000    0.129
 32000.0  31914.0    365952.0         0.0      4     0    0.000    0.129
 32768.0  32635.5    365952.0         0.0      4     0    0.000    0.129
 32768.0  32767.6    365952.0      1872.0      5     1    0.000    0.177
 32768.0  32108.2    291008.0     23681.8      6     2    0.827    1.006
 32768.0  32470.4    291008.0     23681.8      6     2    0.827    1.006
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258
 32768.0  32767.2    698880.0     24013.8      8     4    1.073    1.258

A workaround is to run prefer-method before the PermGen space is all used up, e.g.

(prefer-method take* clojure.lang.Fn java.lang.Object)

Then, when the used PermGen space is close to the max, in the lein swank session, you will see the classes created by the eval'ing being unloaded.

[Unloading class user$eval5950$foo__5951]
[Unloading class user$eval3814]
[Unloading class user$eval2902$foo__2903]
[Unloading class user$eval13414]

In the jstat monitoring, there will be a long pause when used PermGen space stays close to the max, and then it will drop down, and start increasing again when more eval'ing occurs.

-    PC       PU        OC          OU       YGC    FGC    FGCT     GCT
 32768.0  32767.9    159680.0     24573.4      6     2    0.167    0.391
 32768.0  32767.9    159680.0     24573.4      6     2    0.167    0.391
 32768.0  17891.3    283776.0     17243.9      6     2   50.589   50.813
 32768.0  18254.2    283776.0     17243.9      6     2   50.589   50.813

The defmulti defines a cache that uses the dispatch values as keys. Each eval call in the loop defines a new foo class which is then added to the cache when take* is called, preventing the class from ever being GCed.

The prefer-method workaround works because it calls clojure.lang.MultiFn.preferMethod, which calls the private MultiFn.resetCache method, which completely empties the cache.

Protocol leak

The leak with protocol methods similarly involves a cache. You see essentially the same behavior as the multimethod leak if you run the following code using protocols.

protocol leak
(defprotocol ITake (take* [a]))

(extend-type clojure.lang.Fn
  ITake
  (take* [this] '()))

(def stop (atom false))
(def sleep-duration (atom 1000))

(defn run-loop []
  (when-not @stop
    (eval '(take* (fn foo [])))
    (Thread/sleep @sleep-duration)
    (recur)))

(future (run-loop))

(reset! sleep-duration 0)

Again, the cache is in the take* method itself, using each new foo class as a key.

Workaround: A workaround is to run -reset-methods on the protocol before the PermGen space is all used up, e.g.

(-reset-methods ITake)

This works because -reset-methods replaces the cache with an empty MethodImplCache.

Patch: protocol_multifn_weak_ref_cache.diff

Screened by:



 Comments   
Comment by Chouser [ 30/Jan/13 9:10 AM ]

I think the most obvious solution would be to constrain the size of the cache. Adding an item to the cache is already not the fastest path, so a bit more work could be done to prevent the cache from growing indefinitely large.

That does raise the question of what criteria to use. Keep the first n entries? Keep the n most recently used (which would require bookkeeping in the fast cache-hit path)? Keep the n most recently added?

Comment by Jamie Stephens [ 18/Oct/13 9:35 AM ]

At a minimum, perhaps a switch to disable the caches – with obvious performance impact caveats.

Seems like expensive LRU logic is probably the way to go, but maybe don't have it kick in fully until some threshold is crossed.

Comment by Alex Miller [ 18/Oct/13 4:28 PM ]

A report seeing this in production from mailing list:
https://groups.google.com/forum/#!topic/clojure/_n3HipchjCc

Comment by Adrian Medina [ 10/Dec/13 11:43 AM ]

So this is why we've been running into PermGen space exceptions! This is a fairly critical bug for us - I'm making extensive use of multimethods in our codebase and this exception will creep in at runtime randomly.

Comment by Kevin Downey [ 17/Apr/14 9:52 PM ]

it might be better to split this in to two issues, because at a very abstract level the two issues are the "same", but concretely they are distinct (protocols don't really share code paths with multimethods), keeping them together in one issue seems like a recipe for a large hard to read patch

Comment by Kevin Downey [ 26/Jul/14 5:49 PM ]

naive-lru-method-cache-for-multimethods.diff replaces the methodCache in multimethods with a very naive lru cache built on PersistentHashMap and PersistentQueue

Comment by Kevin Downey [ 28/Jul/14 7:09 PM ]

naive-lru-for-multimethods-and-protocols.diff creates a new class clojure.lang.LRUCache that provides an lru cache built using PHashMap and PQueue behind an IPMap interface.

changes MultiFn to use an LRUCache for its method cache.

changes expand-method-impl-cache to use an LRUCache for MethodImplCache's map case

Comment by Kevin Downey [ 30/Jul/14 3:10 PM ]

I suspect my patch naive-lru-for-multimethods-and-protocols.diff is just wrong, unless MethodImplCache really is being used as a cache we can't just toss out entries when it gets full.

looking at the deftype code again, it does look like MethidImplCache is being used as a cache, so maybe the patch is fine

if I am sure of anything it is that I am unsure so hopefully someone who is sure can chime in

Comment by Nicola Mometto [ 31/Jul/14 11:02 AM ]

I haven't looked at your patch, but I can confirm that the MethodImplCache in the protocol function is just being used as a cache

Comment by dennis zhuang [ 08/Aug/14 6:21 AM ]

I developed a new patch that convert the methodCache in MultiFn to use WeakReference for dispatch value,and clear the cache if necessary.

I've test it with the code in ticket,and it looks fine.The classes will be unloaded when perm gen is almost all used up.

Comment by Alex Miller [ 22/Aug/14 4:55 PM ]

I don't know which to evaluate here. Does multifn_weak_method_cache.diff supersede naive-lru-for-multimethods-and-protocols.diff or are these alternate approaches both under consideration?

Comment by Kevin Downey [ 22/Aug/14 8:26 PM ]

the most straight forward thing, I think, is to consider them as alternatives, I am not a huge fan of weakrefs, but of course not using weakrefs we have to pick some bounding size for the cache, and the cache has a strong reference that could prevent a gc, so there are trade offs. My reasons to stay away from weak refs in general are using them ties the behavior of whatever you are building to the behavior of the gc pretty strongly. that may be considered a matter of personal taste

Comment by Andy Fingerhut [ 29/Aug/14 4:31 PM ]

All patches dated Aug 8 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day.

I have not checked how easy or difficult it might be to update the patches.

Comment by Kevin Downey [ 29/Aug/14 7:00 PM ]

I've updated naive-lru-for-multimethods-and-protocols.diff to apply to the current master

Comment by Andy Fingerhut [ 29/Aug/14 7:34 PM ]

Thanks, Kevin. While JIRA allows multiple attachments to a ticket with the same filename but different contents, that can be confusing for people looking for a particular patch, and for a program I have that evaluates patches for things like whether they apply and build cleanly. Would you mind removing the older one, or in some other way making all the names unique?

Comment by Kevin Downey [ 29/Aug/14 8:43 PM ]

I deleted all of my attachments accept for my latest and greatest

Comment by dennis zhuang [ 30/Aug/14 9:51 AM ]

I updated multifn_weak_method_cache2.diff patch too.

I think using weak reference cache is better,because we have to keep one cache per multifn.When you have many multi-functions, there will be many LRU caches in memory,and they will consume too much memory and CPU for evictions. You can't choose a proper threshold for LRU cache in every environment.
But i don't have any benchmark data to support my opinion.

Comment by Alex Miller [ 10/Sep/14 2:38 PM ]

I'm going to set the LRU cache patch aside. I don't think it's possible to find a "correct" size for it and it seems weird to me to extend APersistentMap to build such a thing anyways.

I think it makes more sense to follow the same strategy used for other caches (such as the Keyword cache) - a combination ConcurrentHashMap with WeakReferences and a ReferenceQueue for clean-up. I don't see any compelling reason not to take the same path as other internal caches.

Comment by Alex Miller [ 10/Sep/14 3:44 PM ]

Stepping back a little to think about the problem.... our requirements are:
1) cache map of dispatch value (could be any Object) to multimethod function (IFn)
2) do we want keys to be compared based on equality or identity? identity-based opens up more reference-based caching options and is fine for most common dispatch types (Class, Keyword), but reduces (often eliminates?) cache hits for all other types where values are likely to be equiv but not identical (vector of strings for example)
3) concurrent access to cache
4) cache cannot grow without bound
5) cache cannot retain strong references to dispatch values (the cache keys) because the keys might be instances of classes that were loaded in another classloader which will prevent GC in permgen

multifn_weak_method_cache.diff uses a ConcurrentHashMap (#3) that maps RefWrapper around keys to IFn (#1). The patch uses Util.equals() (#2) for (Java) equality-based comparisons. The RefWrapper wraps them in WeakReferences to avoid #5. Cache clearing based on the ReferenceQueue is used to prevent #4.

A few things definitely need to be fixed:

  • Util.equals() should be Util.equiv()
  • methodCache and rq should be final
  • Why does RefWrapper have obj and expect rq to possibly be null?
  • RefWrapper fields should all be final
  • Whitespace errors in patch

Another idea entirely - instead of caching dispatch value, cache based on hasheq of dispatch value then equality check on value. Could then use WeakHashMap and no RefWrapper.

This patch does not cover the protocol cache. Is that just waiting for the multimethod case to look good?

Comment by dennis zhuang [ 10/Sep/14 7:18 PM ]

Hi, alex, thanks for your review.But the latest patch is multifn_weak_method_cache2.diff. I will update the patch soon by your review, but i have a few questions to be explained.

1) I will use Util.equiv() instead of Util.equals().But what's the difference of them?
2) When the RefWrapper is retained as key in ConcurrentHashMap, it wraps the obj in WeakReference.But when trying to find it in ConcurrentHashMap, it uses obj directly as strong reference, and create it with passing null ReferenceQueue.Please look at the multifn_weak_method_cache2.diff line number 112. It short, the patch stores the dispatch value as weak reference in cache,but uses strong reference for cache getting.

3) If caching dispatch value based on hasheq , can we avoid hasheq value conflicts? If two different dispatch value have a same hasheq( or why it doesn't happen?), we would be in trouble.

Sorry, the patch doesn't cover the protocol cache, i will add it ASAP.

Comment by dennis zhuang [ 11/Sep/14 2:02 AM ]

The new patch 'protocol_multifn_weak_ref_cache.diff' is uploaded.

1) Using Util.equiv() instead of Util.equals()
2) Moved the RefWrapper and it's associated methods to Util.java, and refactor the code based on alex's review.
3) Fixed whitespace errors.
4) Fixed PermGen leak in protocol fns.

Comment by Alex Miller [ 03/Oct/14 10:35 AM ]

I screened this ticket again with Brenton Ashworth and had the following comments:

1) We need to have a performance test to verify that we have not negatively impacted performance of multimethods or protocol invocation.
2) Because there are special cases around null keys in the multimethod cache, please verify that there are existing example tests using null dispatch values in the existing test coverage.
3) In Util$RefWrapper.getObj() - why does this return this.ref at the end? It was not clear to me that the comment was correct or that this was useful in any way.
4) In Util$RefWrapper.clearRefWrapCache() - can k == null in that if check? If not, can we omit that? Also, if you explicitly create the Iterator from the entry set, you can call .remove() on it more efficiently than calling .remove() on the cache itself.
5) In core_deftype / MethodImplCache, it appears that you are modifying a now-mutable field rather than the prior version that was going to great lengths to stay immutable. It's not clear to me what the implications of this change are and that concerns me. Can it use a different collection or code to stay immutable?
6) Please update the description of this ticket to include an approach section that describes the changes we are making.

Thanks!





[CLJ-979] Clojure resolves to wrong deftype classes when AOT compiling or reloading Created: 03/May/12  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3, Release 1.4, Release 1.5, Release 1.6, Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Critical
Reporter: Edmund Jackson Assignee: Unassigned
Resolution: Unresolved Votes: 13
Labels: aot, classloader, compiler

Attachments: Text File CLJ-979.patch     Text File clj-979-symptoms.patch     Text File CLJ-979-v2.patch     Text File CLJ-979-v3.patch     Text File CLJ-979-v4.patch     Text File CLJ-979-v5.patch     Text File CLJ-979-v6.patch     Text File CLJ-979-v7.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Compiling a class via `deftype` during AOT compilation gives different results for the different constructors. These hashes should be identical.

user=> (binding [*compile-files* true] (eval '(deftype Abc [])))
user.Abc
user=> (hash Abc)
16446700
user=> (hash (class (->Abc)))
31966239 ;; should be 16446700

This also means that whenever there's a stale AOT compiled deftype class in the classpath, that class will be used rather then the JIT compiled one, breaking repl interaction.

Another demonstration of this classloader issue (from CLJ-1495) when reloading deftypes (no AOT) :

user> (defrecord Foo [bar])
user.Foo
user> (= (->Foo 42) #user.Foo{:bar 42}) ;;expect this to evaluate to true
true
user> (defrecord Foo [bar])
user.Foo
user> (= (->Foo 42) #user.Foo{:bar 42}) ;;expect this to evaluate to true also -- but it doesn't!
false
user>

This bug also affects AOT compilation of multimethods that dispatch on a class, this affected core.match for years see http://dev.clojure.org/jira/browse/MATCH-86, http://dev.clojure.org/jira/browse/MATCH-98. David had to work-around this issue by using a bunch of protocols instead of multimethods.

Cause of the bug: currently clojure uses Class.forName to resolve a class from a class name, which ignores the class cache from DynamicClassLoader thus reloading deftypes or mixing AOT compilation at the repl with deftypes breaks, resolving to the wrong class.

Approach: the current patch (CLJ-979-v7.patch) addresses this issue in multiple ways:

  • it makes RT.classForName/classForNameNonLoading look in the class cache before delegating to Class/forName if the current classloader is not a DynamicClassLoader (this incidentally addresses also CLJ-1457)
  • it makes clojure use RT.classForName/classForNameNonLoading instead of Class/forName
  • it overrides Classloader/loadClass so that it's class cache aware – this method is used by the jvm to load classes
  • it changes gen-interface to always emit an in-memory interface along the [optional] in disk interface so that the in-memory class is always updated.


 Comments   
Comment by Scott Lowe [ 12/May/12 9:05 PM ]

I can't reproduce this under Clojure 1.3 or 1.4, and Leiningen 1.7.1 on either Java 1.7.0-jdk7u4-b21 OpenJDK 64-Bit or Java 1.6.0_31 Java HotSpot 64-Bit. OS is Mac OS X 10.7.

Edmund, how are you running this AOT code? I wrapped your code in a main function and built an uberjar from it.

Comment by Edmund Jackson [ 13/May/12 2:20 AM ]

Hi Scott,

Interesting.

I have two use cases
1. AOT compile and call from repl.
My steps: git clone, lein compile, lein repl, (use 'aots.death), (in-ns 'aots.death), (= (class (Dontwork. nil)) (class (map->Dontwork {:a 1}))) => false

2. My original use case, which I've minimised here, is an AOT ns, producing a genclass that is called instantiated from other Java (no main). This produces the same error. I will produce an example of this and post it too.

Comment by Edmund Jackson [ 13/May/12 4:23 AM ]

Hi Scott,

Here is an example of it failing in the interop case: https://github.com/ejackson/aotquestion2
The steps I'm following to compile this all up are

git clone git@github.com:ejackson/aotquestion2.git
cd aotquestion2/cljside/
lein uberjar
lein install
cd ../javaside/
mvn package
java -jar ./target/aotquestion-1.0-SNAPSHOT.jar

and it dies with this:

Exception in thread "main" java.lang.ClassCastException: cljside.core.Dontwork cannot be cast to cljside.core.Dontwork
at cljside.MyClass.makeDontwork(Unknown Source)
at aotquestion.App.main(App.java:8)

The error message is really confusing (to me, anyway), but I think its the same root problem as for the REPL case.

What do you see when you run the above ?

Comment by Scott Lowe [ 13/May/12 8:41 AM ]

Ah, yes, looks like my initial attempt to reproduce was too simplistic. I used your second git repo, and can now confirm that it's failing for me with the same error.

Comment by Scott Lowe [ 13/May/12 10:35 PM ]

I looked into this a little further and the AOT generated code looks correct, in the sense that both code paths appear to be returning the same type.

However, I wonder if this is really a ClassLoader issue, whereby two definitions of the same class are being loaded at different times, because that would cause the x.y.Class cannot be cast to x.y.Class exception that we're seeing here.

Comment by Steve Miner [ 03/Sep/13 9:54 AM ]

This could be related to CLJ-1157 which deals with a ClassLoader issue with AOT compiled code.

Comment by Ambrose Bonnaire-Sergeant [ 29/Mar/14 1:11 PM ]

I've tried this patch attached to CLJ-1157 and it did not solve this issue.

Comment by Ambrose Bonnaire-Sergeant [ 29/Mar/14 2:27 PM ]

This bug seems to be rooted in different behaviour for do/let under compilation. Attached a patch showing these symptoms in the hope it helps people find the cause.

Comment by Peter Taoussanis [ 22/Sep/14 3:12 AM ]

Just a quick note to confirm that this still seems to be around as of Clojure 1.7.0-alpha2. Don't have any useful input on possible solutions, sorry.

Comment by Alex Miller [ 04/Dec/14 1:12 PM ]

Duplicates - CLJ-1495, CLJ-1132

Comment by Nicola Mometto [ 04/Dec/14 1:50 PM ]

The attached patch fixes the classloader issues by routing RT.classForName & variants through the DynamicClassLoader class cache before invoking Class.forName

Comment by Nicola Mometto [ 04/Dec/14 1:59 PM ]

Re-adding triaged status added by Alex Miller that got accidentaly nuked by a race-condition between my edits to the ticket description and Alex's ones

Comment by Nicola Mometto [ 04/Dec/14 2:30 PM ]

0001-CLJ-979-make-clojure-resolve-to-the-correct-Class-in-v2.patch is the same as 0001-CLJ-979-make-clojure-resolve-to-the-correct-Class-in.patch except it unconditionally looks for classes in the class cache of DynamicClassLoader, even if baseLoader() is not a DynamicClassLoader.
This fixes the bug of CLJ-1457 but might just be a workaround

Comment by Michael Blume [ 11/Dec/14 3:29 PM ]

Current patch blows up my Clojure build

https://gist.github.com/MichaelBlume/aa26fc715cbbdf711290

Comment by Nicola Mometto [ 11/Dec/14 3:45 PM ]

Michael: the current patch builds clojure fine for me, I'll try to reproduce. Which jvm version are you using?

Comment by Michael Blume [ 11/Dec/14 4:26 PM ]

[14:24][michael.blume@tcc-michael-4:~/workspace/clojure((0fc43db...))]$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
[14:24][michael.blume@tcc-michael-4:~/workspace/clojure((0fc43db...))]$ mvn -version
Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-11T13:58:10-07:00)
Maven home: /usr/local/Cellar/maven/3.2.3/libexec
Java version: 1.8.0_25, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.10.1", arch: "x86_64", family: "mac"

build was after I applied the patch to the current master branch of the clojure github repo

Comment by Andy Fingerhut [ 11/Dec/14 5:34 PM ]

I am seeing a similar compilation error as Michael Blume, with both JDK 1.7 and 1.8 on Mac OS X 10.9.5.

By accident I found that if I take latest Clojure master and do 'mvn package', then apply the patch CLJ-979.patch dated Dec 11 2014, then do 'mvn package' again without 'mvn clean', it compiles with no errors. If I do 'mvn clean' then 'mvn package' in a patched tree, I get the error every time I've tried.

Comment by Nicola Mometto [ 12/Dec/14 5:50 AM ]

The updated patch fixes the LinkageError Andy and Michael were getting.

Andy, Michael, can you confirm?

Comment by Nicola Mometto [ 12/Dec/14 9:38 AM ]

Added more testcases to new patch

Comment by Nicola Mometto [ 12/Dec/14 10:09 AM ]

Cleaned up the patch from whitespace changes

Comment by Andy Fingerhut [ 12/Dec/14 12:32 PM ]

I tried latest Clojure master plus patch CLJ-979-v4.patch, dated 12 Dec 2014, with Mac OS X 10.9.5 + JDK7, and Ubuntu Linux 14.04 with JDKs 6 through 9, and 'mvn clean' followed by 'mvn package' built and passed tests successfully with all of them.

I did notice that some files were created in the test directory that were not cleaned up by the end of the test, which you can use 'git status .' to see. Not sure if that is considered a bad thing for Clojure tests.

Comment by Nicola Mometto [ 12/Dec/14 1:07 PM ]

Thanks Andy, I've updated the patch and it now should remove all temporary classes created by the test.
It's probably not the best way to do it but I couldn't figure out how to do it another way.

Comment by Michael Blume [ 12/Dec/14 2:34 PM ]

Yep, looks good to me =)

Comment by Alex Miller [ 15/Dec/14 4:01 PM ]

Thanks first to Nicola for all his work so far on this!

Some feedback:
1) While the ticket itself isn't bad, I would really like to focus the title and description on a crisp statement of the real problem now that we understand it more. I'd like help on making sure we capture that correctly - how is this for a title: "Uses of Class.forName() miss classes in DynamicClassLoader cache?" ?

Similarly, the description should focus on the problem and show some examples. The defrecord one is good. The first example works for me before the patch and fails after?

2) The crux of this whole thing is the change in loading order in DCL.loadClass() - changing this is a big deal. We really need broader testing with things likely to be affected - off the top of my head: Immutant, Pomegranate, Leiningen, or anything else that monkeys with classloader stuff. Maybe something with Eclipse/OSGi if there is something testable out there.

3) DynamicClassLoader comments:
a) loadClass(String name) - I believe this is identical to the super impl, so can be removed.
b) findClass(String name) - now that we are hijacking loadClass(), I'm not sure it's even necessary to implement this or to call super.findClass() - if you get to the super.findClass(), I would expect that to always throw CNFE. Potentially, this method could even be removed (but this might do bad things if there are subclasses of DCL out there in the wild).
c) loadClass(String name, ...) - instead of calling findClass() and using the CNFE for control flow, you could just directly call findInMemoryClass(), then use a returned null as the decision factor. Again, this is possibly bad if there are DCL subclasses, so I'm on the fence about it.

4) Is the change in gen-interface something that should be a separate ticket? Seems like it could be separable.

5) I don't like the test changes with respect to set up and cleanup. The build already supports compiling a subset of test ns'es (like clojure.test-clojure.genclass.examples). I'd prefer to use that existing mechanism if at all possible. Check the build.xml for the hard-coded ns list.

6) What are the performance implications? I'm not expecting they are significant but we just made a bunch of changes for compilation performance and I'd hate to undo them all. Could findInMemoryClass be smarter about avoiding checks that can't succeed (avoiding "java.*" for example?).

Comment by Nicola Mometto [ 15/Dec/14 5:43 PM ]

1) It's not really about Class.forName() specifically, it's about DynamicClassLoader not being class cache aware in the loadClass method. The JVM uses the classloader loadClass method for resolving all kind of class usages,
including but not limited to Class.forName() (i.e. when loading some bytecode containing a "new" instruction, that class reference will be resolved via a call to loadClass)
I'll try to make the documentation a bit more clear, the first example is an exhibit of the bugged behaviour, the two calls should output the same hash.

2,4) So, there are 3 approaches to how DynamicClassLoader could go at it:

  • Prefer in-disk classes over in-memory classes, roughly the current approach (sometimes it will pick the in-memory class over the in-disk one causing weird bugs like Foo. and ->Foo constructing different classes), has the
    negative effect of breaking interaction between AOT compilation and JIT loading, which has created all sorts of troubles with redefinig deftypes/defprotocols in repls while having stale classfiles in disk.
  • Always pick the most-updated class, this has the advangate of being always correct but has several disadvantages that make it inpracticable in my opinion: we'd have to keep track of the timestamp in which a dynamic class
    is defined, and make the loadClass implementation such that if there a class is both in-memory and in-disk, it compares the timestamps and select the most updated one. This would complicate the implementation a lot and we'd
    likely have to pay a substantial performance hit.
  • Prefer in-memory classes over in-disk classes, the approach proposed in the current patches. It has the advantage of being almost always correct, make repl interaction & jit/aot mixing work fine and the implementation is
    mostly straightforward. The downside is that in cases like gen-class where an AOT class can actually be the most updated version, the in-memory version will be used. In clojure all the forms that do bytecode emission either
    only do AOT compilation or do AOT compilation on demand and always load the class in memory, except gen-interface that doesn't load the class in memory if it's being AOT compiled. Changing its semantics to behave like the
    other jit/aot compiling forms (deftype/defrecord/reify) is the only way to make this approach work so I don't think this should go in another ticket.

5) I don't like the previous testing strategy either but couldn't figure out a better way. Thanks for the pointer on the already in-place infrastructure, I'll check it out and update the patch

In the meantime I've uploaded a new patch addressing 3 and 6. Specifically:
3) I removed the unnecessary loadClass(String) arity, I've made loadClass(String, boolean) use findInMemoryClass(String) directly rather than relying on findClass(String) since nowhere in the documentation it guarantess that
findClass will be used by loadClass. However I've left the findClass(String) implementation in place in case there's code out there that relies on this.
6) I haven't done any serious testing but I haven't noticed any significant difference in compile times for any of my tools.* contrib libraries with the current patch. Filtering "java.*" class names before the inMemory check
didn't seem to produce any difference so it's not included in the updated patch. However I'll probably include an alternative patch with that filtering to do more performance testings and see if it can actually help a bit.

All this said, I'm afraid that I won't have time to personally do an in-depth benchmarking & cross project testing of this patch. I've been spending almost all the free time I had in the past weeks working through a bunch of tickets (mostly this one) but now because of school and other commitments I can't promise I will be able to do anything more than maintaining the current patch & answering to any questions about the bug. Any help in moving this ticket further would be appreciated, in particular to address points 2 and 6.

Comment by Alex Miller [ 16/Dec/14 8:33 AM ]

Thanks Nicola. I'll certainly take over sheparding the bug and appeal to the greater community for help in broad testing when I think we're ready for that.

Comment by Nicola Mometto [ 16/Dec/14 12:50 PM ]

Updated patch with better tests, addressing Alex Miller's comments.





[CLJ-703] Improve writeClassFile performance Created: 04/Jan/11  Updated: 08/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.3
Fix Version/s: None

Type: Enhancement Priority: Critical
Reporter: Jürgen Hötzel Assignee: Unassigned
Resolution: Unresolved Votes: 19
Labels: Compiler, performance

Attachments: Text File 0001-use-File.mkdirs-instead-of-mkdir-every-single-direct.patch     Text File 0002-Ensure-atomic-creation-of-class-files-by-renaming-a-.patch     Text File improve-writeclassfile-perf.patch     Text File remove-flush-and-sync-only.patch     Text File remove-sync-only.patch    
Patch: Code
Approval: Triaged

 Description   

This Discussion about timing issues when writing class files led to the the current implementation of synchronous writes to disk. This leads to bad performance/system-load when compiling Clojure code.

This Discussion questioned the current implentation.

Synchronous writes are not necessary and also do not ensure (crash while calling write) valid classfiles.

These Patches (0001 is just a code cleanup for creating the directory structure) ensures atomic creation of classfiles by using File.renameTo()



 Comments   
Comment by David Powell [ 17/Jan/11 2:16 PM ]

Removing sync makes clojure build much faster. I wonder why it was added in the first place? I guess only Rich knows? I assume that it is not necessary.

If we are removing sync though, I wouldn't bother with the atomic rename stuff. Doing that sort of thing can cause problems on some platforms, eg with search indexers and virus checkers momentarily locking files after they are created.

The patch seems to be assuming that sync is there for some reason, but my initial assumption would be that sync isn't necessary - perhaps it was working around some issue that no longer exists?

Comment by Jürgen Hötzel [ 19/Jan/11 2:05 PM ]

Although its unlikely: there is a possible race condition "loading a paritally written classfile"?:

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L393

Comment by John Szakmeister [ 25/May/12 4:22 AM ]

The new improve-writeclassfile-perf version of the patch combines the two previous patches into a single patch file and brings them up-to-date with master. I can split the two changes back out into separate patch files if desired, but I figured out current tooling is more geared towards a single patch being applied.

Comment by John Szakmeister [ 25/May/12 4:36 AM ]

FWIW, both fixes look sane. The first one is a nice cleanup. The second one is a little more interesting in that uses a rename operation to put the final file into place. It removes the sync call, which does make things faster. In general, if we're concerned about on-disk consistency, we should really have a combination of the two: write the full contents to a tmp file, sync it, and atomically rename to the destination file name.

Neither the current master, nor the current patch will guarantee on-disk consistency across a machine wide crash. The current master could crash before the sync() occurs, leaving the file in an inconsistent state. With the patch, the OS may not get the file from file cache to disk before an OS level crash occurs, and leave the file in an inconsistent state as well. The benefit of the patch version is that the whole file does atomically come into view at once. It does have a nasty side effect of leaving around a temp file if the compiler crashes just before the rename though.

Perhaps a little more work to catch an exception and clean up is in order? In general, I like the patched version better.

Comment by Ivan Kozik [ 05/Oct/12 7:07 PM ]

File.renameTo returns false on (most?) errors, but the patch doesn't check for failure. Docs say "The return value should always be checked to make sure that the rename operation was successful." Failure might be especially likely on Windows, where files are opened by others without FILE_SHARE_DELETE.

Comment by Dave Della Costa [ 23/Jun/14 11:37 PM ]

We've been wondering why our compilation times on linux were so slow. It became the last straw when we walked away from one project and came back after 15 minutes and it was not done yet.

After some fruitless investigation into our linux configuration and lein java args, we stumbled upon this issue via the associated Clojure group thread. Upon commenting out the flush() and sync() lines (https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L7171-L7172) and compiling Clojure 1.6 ourselves, our projects all started compiling in under a minute.

Point being, can we at least provide some flag to allow for "unsafe compilation" or something? As it is, this is bad enough that we've manually modified all our local versions of Clojure to work around the issue.

Comment by Tim McCormack [ 30/Sep/14 4:10 PM ]

Additional motivation: This becomes really unpleasant on an encrypted filesystem, since write and read latency become higher.

As a partial workaround, I've been using this script to mount a ramdisk over top of target, which speeds up compilation 2-4x: https://gist.github.com/timmc/6c397644668bcf41041f (but removing flush() and sync() entirely would probably speed things up even more, if safe)

Comment by Ragnar Dahlen [ 06/Oct/14 12:30 PM ]

I'd like to explore this issue further as I also don't think the flush and sync calls add any value, but do have a severe impact on performance.

To resurrect the discussion, I've attached a new patch with the following approach:

  • create a temporary file
  • write class bytecode to temporary file, no flush or sync
  • close temporary file
  • atomic rename of temporary file to class file name

It is different to previous patches in that:

  • it applies cleanly to master
  • it checks return value from File.renameTo
  • it omits proposed File.mkdirs change as the current implementation is actually converting from an "internal name", where forward slashes are assumed (splits on "/"), to a platform specific path using File.separator. I'm not convinced that the previous patch is safe on all versions of Windows, and I think it's separate from the main issue here.

I opted for the atomic rename of a temp file to avoid leaving empty class files with a correct expected class file name in case of failure.

It is my understanding that this patch will guarantee that:

  • when writeClassFile returns successfully, a class file with the expected name will exists, and subsequent reads from that file name will return the expected bytecode (possibly from kernel buffers).
  • when writeClassFile fails, a class file with the expected name will not have been created (or updated if it previously existed).

Anything preventing the operating system from flushing its buffers to disk (power failure etc) might leave a corrupt (empty, partially written) class file. It's my opinion that that is acceptable behaviour, and worth the performance improvement (I'm seeing AOT compilation reduced from 1m20s -> 22s on one of our codebases, would be happy to provide more benchmarks if requested).

Would be grateful for feedback/testing of this patch.

Comment by Ragnar Dahlen [ 08/Oct/14 6:00 AM ]

We're testing this patch on various projects/platforms at my company. So far we've seen:

  • Significantly reduced compilation times on Linux (two typical examples: 30s to 15s, 1m30s to 30s)
  • No significant change in compilation times on Mac OSX.
  • File.renameTo consistently failing on a Windows machine.

My understanding is that the performance difference between Linux and OSX is due to differences in how these platforms implement fsync. OSX by default does not actually tell the drive to flush its buffers (requires fcntl F_FULLSYNC for this, not used by JVMs) [1], whereas Linux does [2].

Our very limited test shows (as was previously pointed out) that File.renameTo is problematic on Windows. I've attached a new patch that doesn't use rename, and only has the the sync call removed (flush is a no-op for FileOutputStreams). We're currently testing this patch.

The drawback of this patch is that it may leave correctly named, but empty class files if the write fails. One option would be to try and delete the file in the catch block. Personally, I wouldn't expect a compilation that failed because of OS/IO reasons to leave my classfiles in a consistent state.

[1]: "Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence.": https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html

[2]: "[...] includes writing through or flushing a disk cache if present."
http://man7.org/linux/man-pages/man2/fsync.2.html





[CLJ-700] contains? broken for transient collections Created: 01/Jan/11  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.2
Fix Version/s: Release 1.8

Type: Defect Priority: Critical
Reporter: Herwig Hochleitner Assignee: Unassigned
Resolution: Unresolved Votes: 14
Labels: transient

Attachments: Java Source File 0001-Refactor-of-some-of-the-clojure-.java-code-to-fix-CL.patch     File clj-700-7.diff     File clj-700-8.diff     File clj-700.diff     Text File clj-700-patch4.txt     Text File clj-700-patch6.txt     Text File clj-700-rt.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Behavior with Clojure 1.6.0:

user=> (contains? (transient {:x "fine"}) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentArrayMap$TransientArrayMap  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient (hash-map :x "fine")) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentHashMap$TransientHashMap  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient [1 2 3]) 0)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentVector$TransientVector  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (contains? (transient #{:x}) :x)
IllegalArgumentException contains? not supported on type: clojure.lang.PersistentHashSet$TransientHashSet  clojure.lang.RT.contains (RT.java:724)
;; expected: true

user=> (:x (transient #{:x}))
nil
;; expected: :x

user=> (get (transient #{:x}) :x)
nil
;; expected: :x

Behavior with latest Clojure master as of Jun 27 2014 (same as Clojure 1.6.0) plus patch clj-700-7.diff. In all cases it matches the expected results shown in comments above:

user=> (contains? (transient {:x "fine"}) :x)
true
user=> (contains? (transient (hash-map :x "fine")) :x)
true
user=> (contains? (transient [1 2 3]) 0)
true
user=> (contains? (transient #{:x}) :x)
true
user=> (:x (transient #{:x}))
:x
user=> (get (transient #{:x}) :x) 
:x

Analysis by Alexander Redington: This is caused by expectations in clojure.lang.RT regarding the type of collections for some methods, e.g. contains() and getFrom(). Checking for contains looks to see if the instance passed in is Associative (a subinterface of PersistentCollection), or IPersistentSet.

This patch refactors several of the Clojure interfaces so that logic abstract from the issue of immutability is pulled out to a general interface (e.g. ISet, IAssociative), but preserves the contract specified (e.g. Associatives only return Associatives when calling assoc()).

With more general interfaces in place the contains() and getFrom() methods were then altered to conditionally use the general interfaces which are agnostic of persistence vs. transience. Includes tests in transients.clj to verify the changes fix this problem.

Questions on this approach from Stuart Halloway to Rich Hickey:

1. this represents working back from the defect to rethinking abstractions (good!). Does it go far enough?

2. what are good names for the interfaces introduced here?

Alex Miller: Should also keep an eye on CLJ-787 as it may have some collisions with this one.

Patch: clj-700-8.diff

One 'trailing whitespace' warning is perfectly normal when applying this patch to latest Clojure master as of Sep 1 2014, as shown below. This is simply because of carriage returns at the end of lines in file Associative.java. I know of no way to avoid such a warning without removing CRs from all Clojure source files (e.g. CLJ-1026):

% git am -s --keep-cr --ignore-whitespace < ~/clj/patches/clj-700-8.diff
Applying: Refactor of some of the clojure .java code to fix CLJ-700.
/Users/andy/clj/latest-clj/clojure/.git/rebase-apply/patch:29: trailing whitespace.
public interface Associative extends IPersistentCollection, IAssociative{
warning: 1 line adds whitespace errors.
Applying: more CLJ-700: refresh to use hasheq

------

Adding an addendum here for now. Needs more discussion and clean up before screening. I added clj-700-rt.patch which is a completely different approach to solving this issue in a less invasive way - clj-700-rt.patch. - Alex M



 Comments   
Comment by Herwig Hochleitner [ 01/Jan/11 8:01 PM ]

the same is also true for TransientVectors

{{(contains? (transient [1 2 3]) 0)}}

false

Comment by Herwig Hochleitner [ 01/Jan/11 8:25 PM ]

As expected, TransientSets have the same issue; plus an additional, probably related one.

(:x (transient #{:x}))

nil

(get (transient #{:x}) :x)

nil

Comment by Alexander Redington [ 07/Jan/11 2:07 PM ]

This is caused by expectations in clojure.lang.RT regarding the type of collections for some methods, e.g. contains() and getFrom(). Checking for contains looks to see if the instance passed in is Associative (a subinterface of PersistentCollection), or IPersistentSet.

This patch refactors several of the Clojure interfaces so that logic abstract from the issue of immutability is pulled out to a general interface (e.g. ISet, IAssociative), but preserves the contract specified (e.g. Associatives only return Associatives when calling assoc()).

With more general interfaces in place the contains() and getFrom() methods were then altered to conditionally use the general interfaces which are agnostic of persistence vs. transience. Includes tests in transients.clj to verify the changes fix this problem.

Comment by Stuart Halloway [ 28/Jan/11 10:35 AM ]

Rich: Patch doesn't currently apply, but I would like to get your take on approach here. In particular:

  1. this represents working back from the defect to rethinking abstractions (good!). Does it go far enough?
  2. what are good names for the interfaces introduced here?
Comment by Alexander Redington [ 25/Mar/11 7:44 AM ]

Rebased the patch off the latest pull of master as of 3/25/2011, it should apply cleanly now.

Comment by Stuart Sierra [ 17/Feb/12 2:59 PM ]

Latest patch does not apply as of f5bcf647

Comment by Andy Fingerhut [ 17/Feb/12 5:59 PM ]

clj-700-patch2.txt does patch cleanly to latest Clojure head as of a few mins ago. No changes to patch except in context around changed lines.

Comment by Andy Fingerhut [ 07/Mar/12 3:23 AM ]

Sigh. Git patches applied via 'git am' are fragile beasts indeed. Look at them the wrong way and they fail to apply.

clj-700-patch3.txt applies cleanly to latest master as of Mar 7, 2012, but not if you use this command:

git am -s < clj-700-patch3.txt

I am pretty sure this is because of DOS CR/LF line endings in the file src/jvm/clojure/lang/Associative.java. The patch does apply cleanly if you use this command:

git am --keep-cr -s < clj-700-patch3.txt

Comment by Andy Fingerhut [ 23/Mar/12 6:34 PM ]

This ticket was changed to Incomplete and waiting on Rich when Stuart Halloway asked for feedback on the approach on 28/Jan/2011. Stuart Sierra changed it to not waiting on Rich on 17/Feb/2012 when he noted the patch didn't apply cleanly. Latest patch clj-700-patch3.txt does apply cleanly, but doesn't change the approach used since the time Stuart Halloway's concern was raised. Should it be marked as waiting on Rich again? Something else?

Comment by Stuart Halloway [ 08/Jun/12 12:44 PM ]

Patch 4 incorporates patch 3, and brings it up to date on hashing (i.e. uses hasheq).

Comment by Andy Fingerhut [ 08/Jun/12 12:52 PM ]

Removed clj-700-patch3.txt in favor of Stuart Halloway's improved clj-700-patch4.txt dated June 8, 2012.

Comment by Andy Fingerhut [ 18/Jun/12 3:06 PM ]

clj-700-patch5.txt dated June 18, 2012 is the same as Stuart Halloway's clj-700-patch4.txt, except for context lines that have changed in Clojure master since Stuart's patch was created. clj-700-patch4.txt no longer applies cleanly.

Comment by Andy Fingerhut [ 19/Aug/12 4:47 AM ]

Adding clj-700-patch6.txt, which is identical to Stuart Halloway's clj-700-patch4.txt, except that it applies cleanly to latest master as of Aug 19, 2012. Note that as described above, you must use the --keep-cr option to 'git am' when applying this patch for it to succeed. Removing clj-700-patch5.txt, since it no longer applies cleanly.

Comment by Stuart Sierra [ 24/Aug/12 1:08 PM ]

Patch fails as of commit 1c8eb16a14ce5daefef1df68d2f6b1f143003140

Comment by Andy Fingerhut [ 24/Aug/12 1:53 PM ]

Which patch did you try, and what command did you use? I tried applying clj-700-patch6.txt to the same commit, using the following command, and it applied, albeit with the warning messages shown:

% git am --keep-cr -s < clj-700-patch6.txt
Applying: Refactor of some of the clojure .java code to fix CLJ-700.
/Users/jafinger/clj/latest-clj/clojure/.git/rebase-apply/patch:29: trailing whitespace.
public interface Associative extends IPersistentCollection, IAssociative{
warning: 1 line adds whitespace errors.
Applying: more CLJ-700: refresh to use hasheq

Note the --keep-cr option, which is necessary for this patch to succeed. It is recommended in the "Screening Tickets" section of the JIRA workflow wiki page here: http://dev.clojure.org/display/design/JIRA+workflow

Comment by Andy Fingerhut [ 28/Aug/12 5:48 PM ]

Presumptuously changing Approval from Incomplete back to None, since the latest patch does apply cleanly if the --keep-cr option is used. It was in Screened state recently, but I'm not so presumptuous as to change it to Screened

Comment by Alex Miller [ 19/Aug/13 12:26 PM ]

I think through a series of different hands on this ticket it got knocked way back in the list. Re-marking vetted as it's previously been all the way up through screening. Should also keep an eye on CLJ-787 as it may have some collisions with this one.

Comment by Andy Fingerhut [ 08/Nov/13 10:14 AM ]

clj-700-7.diff is identical to clj-700-patch6.txt, except it applies cleanly to latest master. Only some lines of context in a test file have changed.

When I say "applies cleanly", I mean that there is one warning when using the proper "git am" command from the dev wiki page. This is because one line replaced in Associative.java has a CR/LF at the end of the line, because all lines in that file do.

Comment by Herwig Hochleitner [ 17/Feb/14 9:54 AM ]

Since clojure 1.5, contains? throws an IllegalArgumentException on transients.
In 1.6.0-beta1, transients are no longer marked as alpha.

Does this mean, that we won't be able to distinguish between a nil value and no value on a transient?

Comment by Stuart Halloway [ 27/Jun/14 10:20 AM ]

Request for someone to (1) update patch to apply cleanly, and (2) summarize approach so I don't have to read through the comment history.

Comment by Andy Fingerhut [ 27/Jun/14 11:02 AM ]

The latest patch is clj-700-7.diff dated Nov 8, 2013. I believe it is impossible to create a patch that applies any more cleanly using git for source files that have carriage returns in them, which at least one modified source file does. Here is the command I used on latest Clojure master as of today (Jun 27 2014), which is the same as that of March 25 2014:

% git am -s --keep-cr --ignore-whitespace < ~/clj/patches/clj-700-7.diff 
Applying: Refactor of some of the clojure .java code to fix CLJ-700.
/Users/admin/clj/latest-clj/clojure/.git/rebase-apply/patch:29: trailing whitespace.
public interface Associative extends IPersistentCollection, IAssociative{
warning: 1 line adds whitespace errors.
Applying: more CLJ-700: refresh to use hasheq

If you want a patch that doesn't have the 'trailing whitespace' warning in it, I think someone would have to commit a change that removed the carriage returns from file Associative.java. If you want such a patch, let me know and we can remove all of them from every source file and be done with this annoyance.

Comment by Andy Fingerhut [ 27/Jun/14 11:19 AM ]

Updated description to contain a copy of only those comments that seemed 'interesting'. Most comments have simply been "attached an updated patch that applies cleanly", or "changed the state of this ticket for reason X".

Comment by Alex Miller [ 27/Jun/14 1:19 PM ]

Looks like Andy did as requested, moving back to Screenable.

Comment by Andy Fingerhut [ 29/Aug/14 4:27 PM ]

Patch clj-700-7.diff dated Nov 8 2013 no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. It did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch.

Comment by Andy Fingerhut [ 01/Sep/14 3:59 AM ]

Patch clj-700-8.diff dated Sep 1 2014 is identical to clj-700-7.diff, except that it applies "cleanly" to latest master, by which I mean it applies as cleanly as I think it is possible to apply for a git patch to a file with carriage return/line feed line endings, as one of the modified files still does.

Comment by Alex Miller [ 17/Dec/14 3:12 PM ]

Added new patch with alternate approach that just makes RT know about transients instead of refactoring the class hierarchy.

clj-700-rt.patch

In some ways I think the class hierarchy refactoring is due, but I'm not totally on board with all the changes in those patches and it has impacts on collections outside Clojure itself that are hard to reason about.





[CLJ-1619] PersistentVector implements IReduce but the no init arity throws Created: 17/Dec/14  Updated: 18/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-Implement-no-init-arity-of-reduce-for-PersistentVect.patch    
Patch: Code
Approval: Screened

 Description   

The reduce arity of IReduce in PersistentVector is implemented as: "throw new UnsupportedOperationException()".

After the CLJ-1572 patch is applied the following code will throw:

(reduce + [1 2])

Approach taken: Implement reduce(f) in PersistentVector.

Alternative: An alternate would be to change PersistentVector from IReduce to IReduceInit and remove the reduce without init function. In this case, reducing a vector would fall back to seqs.

Patch: 0001-Implement-no-init-arity-of-reduce-for-PersistentVect.patch

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 18/Dec/14 10:59 AM ]

Is that return null there right? In the case of no elements, you should invoke f with no args right?

Comment by Nicola Mometto [ 18/Dec/14 11:04 AM ]

you're right, I didn't know that detail about the behaviour of reduce. Updated the patch to invoke (f) rather than returning nil when the coll is empty





[CLJ-1618] Widen set to take Iterable/IReduceInit Created: 17/Dec/14  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File clj-1618.patch    
Patch: Code
Approval: Vetted

 Description   

Similar to CLJ-1546 (same thing on vec), set should work on IReducibleInit or Iterable. Currently eduction will work via Iterable but through SeqIterator. set on an IReduceInit will throw an error.

user=> (set (eduction (map inc) (range 100)))  ;; works, but slower path
user=> (set (reify clojure.lang.IReduceInit  
       (reduce [_ f start]
         (reduce f start (range 10)))))
IllegalArgumentException Don't know how to create ISeq from: user$eval1198$reify__1199  clojure.lang.RT.seqFrom (RT.java:506)

Approach: Check for and use IReduceInit path if available, otherwise fallback to seq. Additionally, the patch adds a modification to return a set without it's meta (same approach as CLJ-1546) if a set is passed, which is fast constant time with no change in effective behavior.

Performance: (using Criterium quick-bench)

Timings done with either (count (set coll)) or (count (into #{} coll)):

coll 1.6.0 into 1.6.0 set 1.7.0-alpha4 set 1.7.0-alpha4+patch set
(set (range 100)) 15.4 µs 17.0 µs 11.4 µs 0.0 µs
(vec (range 1000000)) 360.7 ms 702.5 ms 391.1 ms 358.6 ms
(doall (range 1000000)) 363.6 ms 736.9 ms 387.5 ms 371.0 ms
(doall (range 5)) 404.9 ns 612.3 ns 481.9 ns 445.9 ns
(eduction (map identity) (vec (range 100))) n/a n/a 11.3 µs 8.7 µs

See also: CLJ-1546, CLJ-1384

Patch: clj-1618.patch

Screened by:






[CLJ-1613] :or defaults should refer to enclosing scope in map destructuring Created: 12/Dec/14  Updated: 12/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Michał Marczyk Assignee: Michał Marczyk
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-CLJ-1613-evaluate-or-defaults-in-enclosing-scope-in-.patch    

 Description   

Michael Blume noticed that :or defaults can depend on the values of other keys, see https://groups.google.com/d/msg/clojure/6kOhpPOpHWM/ITjWwQFS_VQJ

Michael's Gist https://gist.github.com/MichaelBlume/4891dafdd31f0dcbc727 displays a case where an associative form involving :keys and :or compiles or not depending on the order of symbols in :keys. By tweaking that case one can arrive at expressions which always compile, but produce different values depending on :keys:

(let [foo 1
       bar 2
       {:keys [bar foo]
        :or {foo 3 bar (inc foo)}} {}]
  {:foo foo :bar bar})
;= {:foo 3, :bar 4}

(let [foo 1
      bar 2
      {:keys [foo bar]
       :or {foo 3 bar (inc foo)}} {}]
  {:foo foo :bar bar})
;= {:foo 3, :bar 2}

I believe that the most natural solution is to demand that :or defaults be evaluated in an enclosing scope where none of the destructuring-introduced locals are present. This approach is taken by the 0001 patch.



 Comments   
Comment by Michael Blume [ 12/Dec/14 2:27 AM ]

I suspect that this is the right thing to do but I think it's important to note that this will break existing code https://github.com/ngrunwald/ring-middleware-format/blob/master/src/ring/middleware/format_params.clj#L214





[CLJ-1612] clojure.core.reducers/mapcat can call f1 with undefined arity of 0 arguments? Created: 10/Dec/14  Updated: 10/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6, Release 1.7
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: reducers


 Description   

I have not run across this with running code, so perhaps it is impossible for reasons I have not understood. Also not sure whether fixing issues with reducers is of any importance, given transducers. This was found while testing the Eastwood lint tool on some Clojure namespaces, including clojure.core.reducers.

(defcurried mapcat
  "Applies f to every value in the reduction of coll, concatenating the result
  colls of (f val). Foldable."
  {:added "1.5"}
  [f coll]
  (folder coll
   (fn [f1]
     (let [f1 (fn
                ([ret v]
                  (let [x (f1 ret v)] (if (reduced? x) (reduced x) x)))
                ([ret k v]
                  (let [x (f1 ret k v)] (if (reduced? x) (reduced x) x))))]
       (rfn [f1 k]
            ([ret k v]
               (reduce f1 ret (f k v))))))))

The definition of macro rfn expands to a (fn ...) that can call f1 with no arguments, which is not a defined arity for f1.






[CLJ-1611] clojure.java.io/pushback-reader Created: 08/Dec/14  Updated: 08/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Phill Wolf Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Whereas

  • clojure.core/read and clojure.edn/read require a PushbackReader;
  • clojure.java.io/reader produces a BufferedReader, which isn't compatible;
  • the hazard has tripped folks up for years[1];
  • clojure.java.io is pure sugar anyway (and would not be damaged by the addition of a little bit more);
  • clojure.java.io's very existence suggests suitability and fitness for use (wherein by the absence of a read-compatible pushback-reader it falls short);

i.e., in the total absence of clojure.java.io it would not seem "hard" to use clojure.edn, but in the presence of clojure.java.io and its "reader" function, amidst so much else in the API that does fit together, one keeps thinking one is doing it wrong;

and

  • revising the "read" functions to make their own Pushback was considered but rejected [2];

Therefore let it be suggested to add clojure.java.io/pushback-reader, returning something consumable by clojure.core/read and clojure.edn/read.

[1] The matter was discussed on Google Groups:

(2014, "clojure.edn won't accept clojure.java.io/reader?") https://groups.google.com/forum/#!topic/clojure/3HSoA12v5nc

with a reference to an earlier thread

(2009, "Reading... from a reader") https://groups.google.com/forum/#!topic/clojure/_tuypjr2M_A

[2] CLJ-82 and the 2009 message thread






[CLJ-1606] Transducing an eduction finishes twice Created: 27/Nov/14  Updated: 15/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Herwig Hochleitner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transducers
Environment:

1.7.0-alpha4


Attachments: Text File CLJ-1606-2.patch     Text File CLJ-1606-2.patch     Text File CLJ-1606-3.patch     Text File CLJ-1606-4.patch     Text File CLJ-1606.patch    
Patch: Code and Test
Approval: Screened

 Description   
> (transduce (map identity)
             (fn
               ([s] (println "Finishing") s)
               ([s i] s))
             nil
             (eduction (map identity) []))
Finishing
Finishing
nil

Cause: transduce passes (xf f) into .reduce of Eduction, which calls transduce, causing completing xf to be called more than once.

Proposed: Eduction reduce should use (completing f) instead of f to isolate completion of inner xf from outer xf.

Patch: CLJ-1606-4.patch

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 27/Nov/14 11:01 PM ]

identity is not a valid xf - changed to (map identity)

Comment by Ghadi Shayban [ 27/Nov/14 11:34 PM ]

identity is a valid though nonsensical transducer. fix & test added.

Comment by Ghadi Shayban [ 28/Nov/14 12:06 AM ]

Simple reproduction similar to into:

(transduce (map dec)
           (completing conj! persistent!)
           (transient [])
           (eduction (map inc) (range 6)))

;; ClassCastException clojure.lang.PersistentVector cannot be cast to clojure.lang.ITransientCollection

into doesn't use completing, and conj! has an arity that hides the problem.

Comment by Alex Miller [ 28/Nov/14 8:54 AM ]

I removed trailing whitespace in the patch so it applies cleanly.

Comment by Ghadi Shayban [ 14/Dec/14 11:16 PM ]

This patch is a little more subtle than I thought. Completion of the eduction's rfn needs to be handled separately from the "outer" transduce's xform. Patch coming.

Comment by Ghadi Shayban [ 14/Dec/14 11:32 PM ]

New patch with tests that completes the inner xform without completing the passed in rfn

Comment by Ghadi Shayban [ 15/Dec/14 1:19 AM ]

both -3 and -2 are equivalent. -3 is probably better stylistically.

Comment by Alex Miller [ 15/Dec/14 8:37 AM ]

Added CLJ-1606-4.patch - identical to -3, just fixed whitespace error.





[CLJ-1604] AOT'ed code that defs a var with clojure.core symbol name causes IllegalStateException Created: 25/Nov/14  Updated: 02/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: aot, compiler

Attachments: Text File 0001-fix-AOT-bug-preventing-overriding-of-clojure.core-fu.patch     Text File 0001-fix-AOT-bug-preventing-overriding-of-clojure.core-fu-v2.patch    
Patch: Code
Approval: Screened

 Description   

AOT'ed code that defs a var that is also a symbol in clojure.core results in an exception at runtime. This problem can be avoided with (:refer-clojure :exclude ...) but this requires a library author to update and release a new version. AOT'ed applications must then wait for all transitive dependencies to update before they can update to a new Clojure version. For some users, this problem prevents them from trying or adopting new releases.

For example, the contrib library data.int-map defines an update function. clojure.core will also have a new update function as of 1.7.0. If this library is AOT'ed, then users of the clojure.data.int-map/update function will see the exception below. This situation can commonly occur when an application uses lein uberjar to compile all of the project+libs. In this case, applications or libraries that use data.int-map (either directly or indirectly) are affected.

java.lang.IllegalStateException: Attempting to call unbound fn: #'clojure.data.int-map/update
 at clojure.lang.Var$Unbound.throwArity (Var.java:43)
    clojure.lang.AFn.invoke (AFn.java:40)
    compiler_update_not_referenced_bug.core$foo.invoke (core.clj:5)

Reproduce with this sample project: https://github.com/yeller/compiler_update_not_referenced_bug

Cause: When AOT compiling a namespace, the def forms are hoisted into the ns__init class (in the example here, clojure.data.int_map__init). The static initializer in this class creates each var in the ns via a call to RT.var(ns, name). For data.int-map the static initializer will properly create the var for clojure.data.int-map/update. But when the ns is loaded (via the clojure.data.int_map.load() method), (refer-clojure) will be called, which will remap clojure.data.int-map/update to point to clojure.core/update.

This problem does not affect non-AOT loading (which doesn't use the ns__init class) and does not affect collisions from any other namespace. Only collisions from clojure.core create this possibility.

Proposed: The proposed patch explicitly refers the Var during ns__init.load() (after Clojure symbols are referred) rather than implicitly during ns__init static {}. This change only happens in the specific case where a core symbol is being shadowed.

Patch: 0001-fix-AOT-bug-preventing-overriding-of-clojure.core-fu-v2.patch

Screened by: Alex Miller



 Comments   
Comment by Andy Fingerhut [ 25/Nov/14 11:28 PM ]

When I try latest Clojure master plus patch CLJ-1604-only-core.patch with the small test project created by Tom Crayford to demonstrate this issue: https://github.com/yeller/compiler_update_not_referenced_bug

In that project, I get the same exception thrown when attempting 'lein do clean, uberjar, test' using this patch, as without it. It is because int-map/update in namespace compiler-update-not-referenced-bug.core is an unbound var.

Comment by Nicola Mometto [ 26/Nov/14 4:25 AM ]

Andy, you're right. For some reason I attached the wrong patch to the ticket, this is the correct one

Comment by Nicola Mometto [ 26/Nov/14 5:21 AM ]

I wasn't able to write a test for this, so here's a repl session using the clojure jar demonstrating this issue:

[˷/test]> ls
classes  clojure.jar  test.clj
[˷/test]> cat test.clj
(in-ns 'test)
(clojure.core/refer 'clojure.core)
(def foo "bar")
(def update "foo")
[˷/test]> java -cp classes:clojure.jar:. clojure.main
Clojure 1.7.0-master-SNAPSHOT
user=> (binding [*compile-files* true] (load "test"))
WARNING: update already refers to: #'clojure.core/update in namespace: test, being replaced by: #'test/update
nil
user=> test/foo
"bar"
user=> test/update
"foo"
user=>
[˷/test]> java -cp classes:clojure.jar:. clojure.main
Clojure 1.7.0-master-SNAPSHOT
user=> (load "test")
nil
user=> test/foo
"bar"
user=> test/update
CompilerException java.lang.RuntimeException: No such var: test/update, compiling: (NO_SOURCE_PATH:0:0)
user=>
Comment by Andy Fingerhut [ 26/Nov/14 10:39 AM ]

Thanks. I have not tried to assess the details of the change, other than to say that patch 0001-fix-AOT-bug-preventing-overriding-of-clojure.core-fu.patch dated 26 Nov 2014, when applied to latest Clojure master as of today, enables both 'lein do clean, test' and 'lein do clean, uberjar, test' to work as expected with Tom Crayford's test project, linked above, whereas 'lein do clean, uberjar, test' fails without this patch, due to a var being unbound that should have a value.

Comment by Andy Fingerhut [ 27/Nov/14 10:53 AM ]

Copying a comment here from CLJ-1591, since it is more appropriate here. It is responding to Tom Crayford's posting of his example project to demonstrate the issue: https://github.com/yeller/compiler_update_not_referenced_bug

Tom, looked at your project. Thanks for that. It appears not to have anything like (def inc inc) in it. It throws exception during test step of 'lein do clean, uberjar, test' consistently for me, too, but compiles with only warnings and passes tests with 'lein do clean, test'. I have more test results showing in which Clojure versions these results change. To summarize, the changes to Clojure that appear to make the biggest difference in the results are below (these should be added to the new ticket you create – you are welcome to do so):

Clojure 1.6.0, 1.7.0-alpha1, and later changes up through the commit with description "CLJ-1378: Allows FnExpr to override its reported class with a type hint": No errors or warnings for either lein command above.

Next commit with description "Add clojure.core/update, like update-in but takes a single key" that adds clojure.core/update: 'lein do clean, test' is fine, but 'lein do clean, uberjar' throws exception during compilation, probably due to CLJ-1241.

Next commit with description "fix CLJ-1241": 'lein do clean, test' and 'lein do clean, uberjar' give warnings about clojure.core/update, but no errors or exceptions. 'lein do clean, uberjar, test' throws exception during test step that is same as the one I see with Clojure 1.7.0-alpha4. Debug prints of values of clojure.core/update and int-map/update (in data.int-map and in Tom's namespace compiler-update-not-referenced-bug.core) show things look fine when printed inside data.int-map, and in Tom's namespace when not doing the uberjar, but when doing the uberjar, test, int-map/update is unbound in Tom's namespace.

In case it makes a difference, my testing was done with Mac OS X 10.9.5, Leiningen 2.5.0 on Java 1.7.0_45 Java HotSpot(TM) 64-Bit Server VM

Comment by Nicola Mometto [ 02/Dec/14 9:04 AM ]

The updated patch only emits the interning bytecode when necessary, avoiding the emission when a clojure.core var with the same name exists but is not mapped to the current namespace





[CLJ-1603] cycle, iterate, repeat return vals should IReduceInit Created: 25/Nov/14  Updated: 11/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File clj-1603-2.patch     Text File clj-1603.patch    
Patch: Code and Test
Approval: Vetted

 Description   
  • with generative tests
  • with perf examples

Alternatives:

There were a number of possible approaches for these enhancements:
1) Straight Java impl (chosen, see below)
2) Clojure deftype - there are several issues with this - complexity of implementing all necessary interfaces, entanglements with deftype load order, inability to create transient hash codes, etc - see Ghadi's comment.
3) Add Iterable or IReduceInit directly to LazySeq. Conceptually, this does not make sense for general lazy seqs. Seqs materialize and cache each value once and doing this along with the ability to iterate/reduce introduces issues with caching (might as well use seqs for that) and synchronization. I also considered optionally allowing this but then it is tricky when in a reduce to determine which path to go down.

In the end, #1 seemed to be the most straightforward implementation by extending ASeq and providing custom seq and reduce logic. The perf #s below demonstrate the benefits of using a customized seq impl vs the generic lazy seq versions.

Approach:

A few things to note:

  • Added repeat to title and implementation (seemed natural along with cycle)
  • Added some example tests for iterate (cycle and repeat were covered). Did not add generative tests. Not clear to me what these would be that would actually be valuable. All of these functions are pretty simple and the examples cover the special cases.
  • Because the former repeat, cycle, and iterate produced lazyseqs that participated in the IReduce form of reduce (via the seq paths in CollReduce), I extended these classes all to IReduce instead of IReduceInit. An alternative would have been to re-route the no-init form of reduce for these classes to seq-reduce.
  • Because Repeat, Cycle, and Iterate are IReduce but also extend ASeq, I provided explicit extensions for them in CollReduce to ensure they got called via reduce path rather than seq path.

Performance:

Some example timing, all in µs:

Expression 1.6.0 1.7.0-alpha4 1.7.0-alpha4 + patch
(into [] (repeat 1000 1)) 107 97 5
(reduce + 0 (repeat 1000 1)) 112 112 17
(into [] (take 1000) (repeat 1)) n/a 75 33
(doall (take 1000 (cycle [1 2 3]))) 110 115 81
(into [] (take 1000) (cycle [1 2 3])) n/a 66 33
(doall (take 1000 (iterate inc 0))) 98 96 75
(into [] (take 1000) (iterate inc 0)) n/a 79 28

Patch: clj-1603-2.patch

Screened by:



 Comments   
Comment by Ghadi Shayban [ 25/Nov/14 11:01 AM ]

Stu, do you intend these to be in Java or Clojure? It could be trickier to implement in Clojure directly, as loading would have to be deferred until core_deftype loads. It's certainly tractable without breaking any backwards compatibility, and I've explored this while experimenting with Range as a deftype https://github.com/ghadishayban/clojure/commit/906cd6ed4d7c624dc4e553373dabfd57550eeff2

A macro to help with Seq&List participation could be certainly useful, as efficiently being both a Seq/List and IReduceInit isn't a party.

May be useful to list requirements for protocol/iface participation.

It seems like 'repeatedly' is another missing link in the IReduceInit story.

Rich mentioned the future integration of reduce-kv at the conj, it would also be useful to know how that could fit in.

---- Other concerns and ops that may belong better on the mailing list ----

In experimenting with more reducible sources, I put out a tiny repo (github.com/ghadishayban/reducers) a couple weeks ago that includes some sources and operations. The sources were CollReduce and not ISeq.

Relatedly, caching the hashcode as a Java `transient` field is not supported when implementing a collection using deftype (patch w/ test in CLJ-1573).

Sources:
Iterate was one of them https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L43-L51
Repeatedly https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L43-L51

Reduce/transduce-based Operations that accept transducers:
some, any, yield-first https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L52-L80
(any could use a better name, equiv to (first (filter...)))
some and any have a symmetry like filter/remove.

Novelty maybe for 1.8:
A transducible context for Iterables similar to LazyTransformer:
https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L157-L161

The unless-reduced macro was very useful in implementing the collections:
https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L7-L15
It is different than the ensure-reduced and unreduced functions in core.

Comment by Alex Miller [ 25/Nov/14 12:01 PM ]

When we discussed this in the past, it was in the vein of reusing some of the range work (in Java) to implement cycle and iterate (per CLJ-1515).

Comment by Ghadi Shayban [ 25/Nov/14 9:20 PM ]

Never mind about 'repeatedly'. Being both ISeq and IReduceInit for repeatedly doesn't make sense for something that relies on side-effects. Current users of repeatedly can reduce over it many times and only realize the elements once.

Comment by Alex Miller [ 05/Dec/14 11:17 PM ]

attached wip Java impl and posted some example timings

Comment by Ghadi Shayban [ 11/Dec/14 4:35 PM ]

NB iterate in this patch does not cache the realized ISeq, but recalcs it at every call to realize the tail. This is not a change in the promised behavior (docstring says "f must be side-effect free") but an implementation change, as worth noting in the changelog.





[CLJ-1602] vals and keys return values should implement IReduceInit Created: 25/Nov/14  Updated: 04/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File clj-1602-2.diff     File clj-1602-3.diff     File clj-1602.diff    
Patch: Code and Test
Approval: Vetted

 Description   
  • with generative tests
  • with perf demos

Background: clojure.core/keys calls RT.keys(Object) calls APersistentMap.KeySeq.create(ISeq). RT.keys() creates a sequence (of Map.Entry objects) and KeySeq just wraps it, calling .keyValue(). There is an equivalent vals -> RT.vals() -> ValSeq path. Both of these seq impls extend ASeq and provide Iterable implementations via SeqIterator (iterator wrapped over the seq).

Approach: The important thing here is to avoid creating the sequence and instead directly iterate/reduce over the map. Noting that CLJ-1499 provides support for making PHM directly Iterable and that KeySeq/ValSeq already implement Iterable, I chose to focus on making the instances returned from keys and vals support Iterable directly on the underlying map instead via the seq.

RT.keys()/vals() created the seq and passed it to KeySeq/ValSeq which made it too late to directly cover the original map iterator. There are a few places that rely on passing a seq of Map.Entry to keys/vals (not just a map instance), so I check for IPersistentMap and in that case pass it directly to a new KeySeq factory method that remembers both the Iterable and the ISeq.

Questions/notes:

  • Could potentially check for Map or Iterable instead of IPersistentMap in RT.keys()/vals(). Not sure how common it is to pass normal Java maps to keys/vals.
  • The direct Iterable support vanishes once you move off the head of the keys or vals seq. So (rest (keys map)) does not have Iterable support. This is not really possible unless you hold an Iterator and advance it along with the seq, but that seemed to introduce all sorts of possibilities for badness. Since maps are unordered, it seems weird to rely on any ordering or processing only parts of any map, so I suspect doing this would be quite rare.
  • CLJ-1499 makes Iterators fast on persistent maps, so it's really required to see the benefits below.

Performance: I tested perf using criterium to benchmark as follows:

(use 'criterium.core)
(def m (zipmap (range 1000) (range 1000)))
(bench (reduce + (keys m)))
(bench (reduce + (vals m)))
version (reduce + (keys m)) (reduce + (vals m))
1.7.0-alpha4 74 µs 64 µs
clj-1602-3.diff 60 µs 67 µs
clj-1602-3.diff + clj-1499-v6.diff 51 µs 50 µs

I can't explain why vals is faster than keys in alpha4 or slower with just the patch - they both do essentially the same work. The stddev in those tests is <1 µs and no outliers were found in the data. Retests showed the differences to be repeatable. With the 1499 patch, they are using a totally different iterator impl and no sequences are involved.

Tests:

  • The included tests in data_structures depend on the generators and properties created in CLJ-1499 testing.
  • I added some basic tests for subseq and rsubseq as those both rely on the somewhat special behavior of keys accepting a seqable of Map.Entry objects (not just a map itself). There were no other tests for subseq or rsubseq already present.

Patch: clj-1602-3.diff - requires CLJ-1499 patch first (but only to build on the tests in data_structures.clj so these could be separated if necessary)



 Comments   
Comment by Alex Miller [ 25/Nov/14 11:53 AM ]

Could leverage CLJ-1499 for the bulk of this, may pull that back from 1.8 into 1.7. Waiting on further work till that's answered.

Comment by Alex Miller [ 03/Dec/14 11:24 PM ]

I also have a patch that extends the CLJ-1499 iterators to support providing both key and val iterators that do not require creating and unpacking a Map.Entry. Unfortunately I only saw times that were ~48 µs on the perf benchmark in the description, so it's not a huge benefit (short-lived object allocation is cheap).





[CLJ-1601] transducer arities for map-indexed, distinct, and interpose Created: 25/Nov/14  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Stuart Halloway Assignee: Alex Miller
Resolution: Unresolved Votes: 0
Labels: transducers

Attachments: Text File clj-1601-2.patch     Text File clj-1601-3.patch     Text File clj-1601.patch    
Patch: Code and Test
Approval: Vetted

 Description   
  • with generative tests
  • with examples demonstrating performance

Performance: Details in comments, summary:

(def v (vec (concat (range 1000) (range 1000))))
(into [] (distinct v))            ;; 821.3 µs
(into [] (distinct) v)            ;; 388.2 µs
(into [] (interpose nil v))       ;; 316.0 µs
(into [] (interpose nil) v)       ;; 35.5 µs
(into [] (map-indexed vector v))  ;; 76.8 µs
(into [] (map-indexed vector) v)  ;; 49.4 µs

Patch: clj-1601-3.patch

Screened by:



 Comments   
Comment by Alex Miller [ 25/Nov/14 11:54 AM ]

working on this

Comment by Alex Miller [ 25/Nov/14 4:22 PM ]

Initial patch with impls. Tests and perf still to do.

Comment by Alex Miller [ 27/Nov/14 7:09 AM ]

Perf tests, summarized in description:

user=> (use 'criterium.core)
nil
user=> (def v (vec (concat (range 1000) (range 1000))))
#'user/v
user=> (quick-bench (into [] (distinct v)))
WARNING: Final GC required 10.433088780213309 % of runtime
Evaluation count : 744 in 6 samples of 124 calls.
             Execution time mean : 821.339608 µs
    Execution time std-deviation : 11.351053 µs
   Execution time lower quantile : 811.901435 µs ( 2.5%)
   Execution time upper quantile : 837.972000 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (distinct) v))
WARNING: Final GC required 10.78492057474076 % of runtime
Evaluation count : 14028 in 6 samples of 2338 calls.
             Execution time mean : 43.630656 µs
    Execution time std-deviation : 170.185825 ns
   Execution time lower quantile : 43.433193 µs ( 2.5%)
   Execution time upper quantile : 43.853959 µs (97.5%)
                   Overhead used : 1.794010 ns
				   
user=> (quick-bench (into [] (interpose nil v)))
WARNING: Final GC required 10.79555726490133 % of runtime
Evaluation count : 1914 in 6 samples of 319 calls.
             Execution time mean : 316.024853 µs
    Execution time std-deviation : 9.077484 µs
   Execution time lower quantile : 310.139273 µs ( 2.5%)
   Execution time upper quantile : 330.917486 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil
user=> (quick-bench (into [] (interpose nil) v))
WARNING: Final GC required 10.70401297525592 % of runtime
Evaluation count : 17022 in 6 samples of 2837 calls.
             Execution time mean : 35.592672 µs
    Execution time std-deviation : 560.066138 ns
   Execution time lower quantile : 35.252348 µs ( 2.5%)
   Execution time upper quantile : 36.553414 µs (97.5%)
                   Overhead used : 1.794010 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
nil

user=> (quick-bench (into [] (map-indexed vector v)))
WARNING: Final GC required 12.45755646853723 % of runtime
Evaluation count : 7338 in 6 samples of 1223 calls.
             Execution time mean : 76.807691 µs
    Execution time std-deviation : 381.019170 ns
   Execution time lower quantile : 76.433202 µs ( 2.5%)
   Execution time upper quantile : 77.170733 µs (97.5%)
                   Overhead used : 1.794010 ns
nil
user=> (quick-bench (into [] (map-indexed vector) v))
WARNING: Final GC required 11.38700971837483 % of runtime
Evaluation count : 12474 in 6 samples of 2079 calls.
             Execution time mean : 49.458043 µs
    Execution time std-deviation : 620.716737 ns
   Execution time lower quantile : 48.995801 µs ( 2.5%)
   Execution time upper quantile : 50.229507 µs (97.5%)
                   Overhead used : 1.794010 ns
Comment by Alex Miller [ 17/Dec/14 1:50 PM ]

Updated based on comment from Christophe Grand that java.util.HashSet used in distinct impl had different hash/equality semantics than the set used with sequences.





[CLJ-1600] calling hashCode on clojure.lang.LazyTransformer causes a StackOverflowError Created: 24/Nov/14  Updated: 25/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Sam Ritchie Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transducers
Environment:

OS X 10.10.1, Macbook Pro,, Java 1.8.0_11, Clojure 1.7.0-alpha4


Attachments: Text File CLJ-1600-2.patch     Text File CLJ-1600.patch    
Patch: Code and Test
Approval: Screened

 Description   

Calling .hashCode on a an instance of clojure.lang.LazyTransformer causes a StackOverflowError:

user> (.hashCode (sequence (map identity) ["s"]))
StackOverflowError   clojure.lang.Util.hash (Util.java:161)

The trace is

Util.java:  161  clojure.lang.Util/hash
          LazyTransformer.java:  216  clojure.lang.LazyTransformer/hashCode
                     Util.java:  161  clojure.lang.Util/hash
          LazyTransformer.java:  216  clojure.lang.LazyTransformer/hashCode
                     Util.java:  161  clojure.lang.Util/hash
          LazyTransformer.java:  216  clojure.lang.LazyTransformer/hashCode
                     Util.java:  161  clojure.lang.Util/hash
          LazyTransformer.java:  216  clojure.lang.LazyTransformer/hashCode

Relevant lines:

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LazyTransformer.java#L212
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Util.java#L161

Cause: Looks like "seq" returns "this":

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LazyTransformer.java#L55

This does NOT occur on an empty sequence, as clojure.core/sequence short-circuits.

Proposal: compute and cache hash and hasheq using same algorithm used in other seqs

Patch: CLJ-1600-2.patch

Screened by: Alex Miller



 Comments   
Comment by Ghadi Shayban [ 25/Nov/14 1:18 AM ]

Patch with hashcode calculation and caching similar to ASeq. Might be worthwhile hoisting that into its own hashSeq method.

Comment by Alex Miller [ 25/Nov/14 10:13 AM ]

What's here looks good. Can we hook into existing tests that verify equals/hashcode and equiv/hasheq equivalence?

Comment by Ghadi Shayban [ 25/Nov/14 1:24 PM ]

Test case added. Verified case was failing with SO prior to patch.





[CLJ-1591] Symbol not being bound in namespace when name clashes with clojure.core Created: 14/Nov/14  Updated: 27/Nov/14

Status: Reopened
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6, Release 1.7
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None

Approval: Triaged

 Description   

The following code fails (both in 1.6 and latest 1.7-alpha4):

user=> (ns foo)
nil
foo=>  (def inc inc)
WARNING: inc already refers to: #'clojure.core/inc in namespace: foo, being replaced by: #'foo/inc
#'foo/inc

;; Note inc is unbound at this point, which causes the exception below
foo=> inc
#<Unbound Unbound: #'foo/inc>
foo=> (ns bar)
nil
bar=> (require ['foo :refer ['inc]])
WARNING: inc already refers to: #'clojure.core/inc in namespace: bar, being replaced by: #'foo/inc
nil
bar=> (inc 8)

IllegalStateException Attempting to call unbound fn: #'foo/inc  clojure.lang.Var$Unbound.throwArity (Var.java:43)

Further investigation shows that foo/inc is unbound:

foo/inc
=> #<Unbound Unbound: #'foo/inc>

Further investigation also shows that replacing the (def inc inc) with almost anything else, e.g. (def inc dec), (def inc clojure.core/inc), or (def inc (fn [n] (+ n 1))), causes no exception (but the warnings remain).

I would expect:
a) foo/inc should be bound and have the same value as clojure.core/inc
b) No error when requiring foo/inc
c) bar/inc should be bound to foo/inc



 Comments   
Comment by Nicola Mometto [ 14/Nov/14 10:04 PM ]

The second error should be expected, the right syntax should be (require ['foo :refer ['inc]]) (note the leading quote before inc)

Comment by Mike Anderson [ 14/Nov/14 10:20 PM ]

Thanks for the catch Nicola - I've edited the description. Still get the same error however (just with a slightly different message)

Comment by Alex Miller [ 14/Nov/14 10:22 PM ]

See comment...

Comment by Mike Anderson [ 14/Nov/14 10:24 PM ]

@Alex what comment? Note that the error still occurs even with the right syntax....

Comment by Mike Anderson [ 14/Nov/14 10:26 PM ]

Appears to have been closed prematurely

Comment by Alex Miller [ 14/Nov/14 10:39 PM ]

I can't reproduce with the correct syntax:

Clojure 1.7.0-master-SNAPSHOT
user=> (ns foo)
nil
foo=> (def inc inc)
WARNING: inc already refers to: #'clojure.core/inc in namespace: foo, being replaced by: #'foo/inc
#'foo/inc
foo=> (ns bar)
nil
bar=> (require ['foo :refer ['inc]])
WARNING: inc already refers to: #'clojure.core/inc in namespace: bar, being replaced by: #'foo/inc
nil
Comment by Mike Anderson [ 14/Nov/14 10:55 PM ]

The problem is that the var is still unbound and causes e.g. the following error:

=> (foo/inc 8)
IllegalStateException Attempting to call unbound fn: #'foo/inc clojure.lang.Var$Unbound.throwArity (Var.java:43)

I don't think that should be expected - or am I missing something?

Comment by Alex Miller [ 14/Nov/14 10:57 PM ]

Ah, will take a look. But not right now.

Comment by Andy Fingerhut [ 15/Nov/14 1:09 PM ]

Updated the description with a few more details. The exception goes away if you do (def inc (fn [n] (+ n 1))) instead of (def inc inc), for example. The warnings remain.

Comment by Tom Crayford [ 20/Nov/14 11:07 AM ]

Unsure if this is the same issue (I think it might be?), but I reproduced the exact same error message with AOT compilation involved:

reproduced in this git repository: https://github.com/yeller/compiler_update_not_referenced_bug

clone it, run `lein do clean, uberjar, test`, and that error message will show up every time for me

Comment by Andy Fingerhut [ 20/Nov/14 5:43 PM ]

Mike, I think replacing (def inc inc) in your example with (def inc clojure.core/inc) should be considered as a reasonable workaround for this issue, unless you have some use case where you need to def inc to something that is not in clojure.core (and if so, why?)

The reason (def inc inc) behaves this way is, if not absolutely necessary, at least commonly used in Clojure programs to define recursive functions, e.g. (defn fib [n] (if (<= n 1) 1 (+ (fib (dec n)) (fib (- n 2))))), so that the occurrences of fib in the body are resolved to the fib being defined.

Comment by Alex Miller [ 22/Nov/14 9:05 AM ]

Moving to 1.7 until I can look at this more deeply.

Comment by Mike Anderson [ 23/Nov/14 6:08 PM ]

Andy - yes the workaround is fine for me right now.

I don't think this is an urgent issue but it may be exposing a subtle complexity regarding assumptions about the state of the namespace at different times. Perhaps the semantics should be something like:

  • The def statement itself should be run before the var is interned. e.g. (def inc (inc 5)) should result in (def inc 6)
  • Anything complied / deferred to run after completion of the def statement should use the new var (i.e. the new var should be referenced by fns, lazy sequences etc.)
Comment by Andy Fingerhut [ 23/Nov/14 6:36 PM ]

I'm not sure what your proposal means in a case like this:

(def inc (fn [x] (inc x)))

Is the second inc to be interpreted/resolved before or after the new inc is created? Because it is (fn ...) it should be the after-behavior? What else besides fn should cause the after-behavior, rather than the before-behavior?

Even more fun (not saying that people often write code like this, but the compiler can handle it today):

(def inc (if (> (inc y) 5)
           (fn [x] (inc x))
           (fn [x] (dec x))))

I think the current compiler behavior of 'in the body of a def, the def'd symbol always refers to the new var, not any earlier def'd vars' is fairly straightforward to explain.

Comment by Tom Crayford [ 23/Nov/14 9:15 PM ]

Should I file the AOT issue reproduced in that thing as a new issue?

Comment by Andy Fingerhut [ 24/Nov/14 5:16 PM ]

Tom: Alex Miller or another screener would be best to say whether the AOT issue should be a separate ticket, but my best guess would be "go for it". I tried to look at the link you gave but it seems not to point to anything. Could you double-check that link?

Comment by Tom Crayford [ 24/Nov/14 6:48 PM ]

Andy,

Great. I'll write one up tomorrow sometime. I accidentally left that repo as private, should be visible now.

Comment by Andy Fingerhut [ 24/Nov/14 8:11 PM ]

This comment is really most relevant for ticket CLJ-1604, where it has been copied:

Tom, looked at your project. Thanks for that. It appears not to have anything like (def inc inc) in it. It throws exception during test step of 'lein do clean, uberjar, test' consistently for me, too, but compiles with only warnings and passes tests with 'lein do clean, test'. I have more test results showing in which Clojure versions these results change. To summarize, the changes to Clojure that appear to make the biggest difference in the results are below (these should be added to the new ticket you create – you are welcome to do so):

Clojure 1.6.0, 1.7.0-alpha1, and later changes up through the commit with description "CLJ-1378: Allows FnExpr to override its reported class with a type hint": No errors or warnings for either lein command above.

Next commit with description "Add clojure.core/update, like update-in but takes a single key" that adds clojure.core/update: 'lein do clean, test' is fine, but 'lein do clean, uberjar' throws exception during compilation, probably due to CLJ-1241.

Next commit with description "fix CLJ-1241": 'lein do clean, test' and 'lein do clean, uberjar' give warnings about clojure.core/update, but no errors or exceptions. 'lein do clean, uberjar, test' throws exception during test step that is same as the one I see with Clojure 1.7.0-alpha4. Debug prints of values of clojure.core/update and int-map/update (in data.int-map and in Tom's namespace compiler-update-not-referenced-bug.core) show things look fine when printed inside data.int-map, and in Tom's namespace when not doing the uberjar, but when doing the uberjar, test, int-map/update is unbound in Tom's namespace.

In case it makes a difference, my testing was done with Mac OS X 10.9.5, Leiningen 2.5.0 on Java 1.7.0_45 Java HotSpot(TM) 64-Bit Server VM

Comment by Nicola Mometto [ 25/Nov/14 3:44 PM ]

Tom, I've opened a ticket with a patch fixing the AOT issue: http://dev.clojure.org/jira/browse/CLJ-1604





[CLJ-1590] Some IReduce/IReduceInit implementors don't respect reduced Created: 14/Nov/14  Updated: 10/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-ensure-IReduce-IReduceInit-implementors-respect-redu.patch     Text File clj-1537-gvec-ArraySeq.patch    
Patch: Code and Test
Approval: Screened

 Description   

Several reduce implementations don't properly respect reduced:

  • clojure.core.ArrayChunk's implementation of IChunk/reduce
  • VecSeq's impl of InternalReduce/reduce
  • APersistentVector's reduce with init doesn't unwrap reduced on last value
  • seqs of primitive arrays don't unwrap reduced on last value
  • PersistentList doesn't unwrap reduced on last value

Some examples:

user=> (transduce (take 1) conj (seq (long-array [1 2 3 4])))
#<Reduced@38f774f8: [1]>
user=> (.reduce (list 1 2 3 4 5) (fn [_ a] (if (= a 5) (reduced "foo"))) 1)
#<Reduced@753d01cc: "foo">

Patch: 0001-ensure-IReduce-IReduceInit-implementors-respect-redu.patch
Screened by: Alex Miller
See also: http://dev.clojure.org/jira/browse/CLJ-1537



 Comments   
Comment by Alex Miller [ 17/Nov/14 12:35 PM ]

The patch should only be considering the result of calling the reducing function, not checking the init value (this matches what we do elsewhere).

Also, needs at least some simple example tests.

Comment by Nicola Mometto [ 17/Nov/14 1:36 PM ]

While reworking my patch to address your comment, I discovered that PersistentList and APersistentList's IReduceInit/reduce implementation aren't handling correctly reduced when the reducing function returns one on the last iteration.

The attached patch fixes those too and contains testcases demonstrating the issues.

Comment by Nicola Mometto [ 17/Nov/14 1:39 PM ]

I haven't fixed the IReduce/IReduceInit implementations for range as that's in scope for http://dev.clojure.org/jira/browse/CLJ-1515

Comment by Nicola Mometto [ 17/Nov/14 1:59 PM ]

As Ghadi Shayban noticed, while reduce doesn't use IReduceInit's reduce impl for PersistentList, transduce does so this might be cause of serious bugs even from clojure code, not only when using `.reduce` calls

Comment by Alex Miller [ 17/Nov/14 2:47 PM ]

reduce will use IReduceInit's reduce impl for PersistentList, after CLJ-1572.





[CLJ-1589] Cleanup internal-reduce implementation Created: 14/Nov/14  Updated: 15/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-cleanup-internal-reduce-impl.patch    
Patch: Code
Approval: Screened

 Description   

Currently internal-reduce provides an implementation for ArraySeq and the ArraySeq_* prim classes.
Since those classes implement IReduce the current patch makes instances of those classes fallback on coll-reduce's IReduce impl (that simply invokes .reduce)

This change is desiderable because it removes unnecessary duplicated code, reducing the implementation surface and making it easier to follow reduce's code path. In addition to ArraySeq there will be (based on other tickets) more seq impls that also IReduce, so it would be good to re-route back through coll-reduce when we get combinations of potentially reducible sub-seqs.

Patch: 0001-cleanup-internal-reduce-impl.patch

  • This patch depends on the patch for CLJ-1590 since the current IReduce impl for those ArraySeq classes doesn't properly handle Reduced

Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 14/Nov/14 10:28 PM ]

I'm not sure whether this should be in 1.7 or not, but I'm adding it there so we can have a discussion on it regardless.





[CLJ-1586] Compiler doesn't preserve metadata for LazySeq literals Created: 12/Nov/14  Updated: 12/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: compiler, metadata, typehints

Attachments: Text File 0001-Compiler-doesn-t-preserve-metadata-for-lazyseq-liter.patch    
Patch: Code
Approval: Triaged

 Description   

The analyzer in Compiler.java forces evaluation of lazyseq literals, but loses the compile time original metadata of that form, meaning that a type hint will be lost.

Example demonstrating this issue:

user=> (set! *warn-on-reflection* true)
true
user=> (list '.hashCode (with-meta (concat '(identity) '("foo")) {:tag 'String}))
(.hashCode (identity "foo"))
user=> (eval (list '.hashCode (with-meta (concat '(identity) '("foo")) {:tag 'String})))
Reflection warning, NO_SOURCE_PATH:1:1 - reference to field hashCode can't be resolved.
101574

Forcing the concat call to an ASeq rather than a LazySeq fixes this issue:

user=> (eval (list '.hashCode (with-meta (seq (concat '(identity) '("foo"))) {:tag 'String})))
101574

This ticket blocks http://dev.clojure.org/jira/browse/CLJ-1444 since clojure.core/sequence might return a lazyseq.

This bug affected both tools.analyzer and tools.reader and forced me to commit a fix in tools.reader to work around this issue, see: http://dev.clojure.org/jira/browse/TANAL-99

The proposed patch trivially preserves the form metadata after realizing the lazyseq






[CLJ-1585] Report boxed math warning on function that boxes primitive return value Created: 11/Nov/14  Updated: 12/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: errormsgs, math

Attachments: Text File clj-1585.patch    
Patch: Code
Approval: Triaged

 Description   

With the new :warn-on-boxed (CLJ-1325), these examples do not report a boxed math warning although they each do boxing:

user=> (defn f1 [^long x] (inc x))
f1
user=> (defn f2 [x] (aget (long-array [1 2]) 0))
f2
user=> (defn f3 [x] (aget (int-array [1 2]) 0))
f3
user=> (defn f4 [^String s] (.indexOf s "a"))

Cause: emitBoxReturn has a hard-coded call to box a prim return value.

Solution: If *unchecked-math* is set to :warn-on-boxed, emit a warning on boxing of primitive numeric return types.

Patch:



 Comments   
Comment by Alex Miller [ 12/Nov/14 12:39 AM ]

Attached patch does the job, but from trying it out on some real code, it finds both problematic cases and lots of cases that could safely be ignored and/or where there is no obvious way to fix the warning. I think it may need some more tuning to reduce the rate of unfixable things a bit.





[CLJ-1583] Apply forces the evaluation of one element more than necessary Created: 07/Nov/14  Updated: 09/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-make-RT.boundedLength-lazier.patch    
Patch: Code

 Description   

Given a function with one fixed argument and a vararg, it should be sufficient to force evaluation of 2 elements for apply to know which arity it should select, however it currently forces 3:

user=> (defn x ([a & b]))
#'user/x
user=> (apply x (map println (iterate inc 0)))
0
1
2
nil

This makes lazy functions that use apply (for example mapcat) less lazy than they could be.
The proposed patch makes RT.boundedLength short-circuit immediately after the seq count is greater than the max fixed arity:

user=> (defn x ([a & b]))
#'user/x
user=> (apply x (map println (iterate inc 0)))
0
1
nil


 Comments   
Comment by Nicola Mometto [ 09/Nov/14 3:37 PM ]

The patch in this ticket slightly improves the issue reported at http://dev.clojure.org/jira/browse/CLJ-1218





[CLJ-1582] Overriding in-ns and ns is problematic Created: 07/Nov/14  Updated: 07/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None

Attachments: Text File 0001-Allow-overriding-of-clojure.core-in-ns-and-clojure.c.patch    
Patch: Code

 Description   

Currently it is possible to override clojure.core/in-ns and clojure.core/ns, but it is not possible to refer to the namespace-specific vars without fully qualifying them:

user=> (ns foo (:refer-clojure :exclude [in-ns]))
nil
foo=> (def in-ns 1)
#'foo/in-ns
foo=> in-ns
#<clojure.lang.RT$1@76b5e4c5>

After this patch, overriding in-ns and ns works like for every other clojure.core var:

user=> (ns foo (:refer-clojure :exclude [in-ns]))
nil
foo=> (def in-ns 1)
#'foo/in-ns
foo=> in-ns
1


 Comments   
Comment by Reid McKenzie [ 07/Nov/14 11:46 AM ]

This is motivated by https://github.com/jonase/eastwood/issues/100





[CLJ-1580] Transient collections should guarantee thread visibility Created: 05/Nov/14  Updated: 17/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transient

Attachments: Text File clj-1580.patch    
Patch: Code
Approval: Vetted

 Description   

With changes from CLJ-1498, transients are still thread isolated but may move between threads during their lifetime which introduces new concurrency concerns, namely visibility of changes across threads.

Approach: Make all transient collection fields either final or volatile to ensure visibility across threads.

Patch: clj-1580.patch

Screened by:






[CLJ-1576] clojure.pprint should print vars as pr does Created: 29/Oct/14  Updated: 13/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Kevin Downey Assignee: Unassigned
Resolution: Unresolved Votes: 5
Labels: print

Attachments: Text File pprint-vars-as-prn-does-0.patch    
Patch: Code
Approval: Triaged

 Description   

clojure.pprint/pprint currently by default lets vars fall through to its IDeref printing which prints them as something like:

#<Var@107e78: #<core$inc clojure.core$inc@f278dd>>

which is not a super representation of a var. vars have names.

generally when I pprint a data structure containing vars it is because at some point in writing the code that constructed that data structure I decided I wanted a history of the functions called, and since vars are invokable as functions and have a name, I can just use those as the history. the history then turns in to a big structure so I pretty print it, which then doesn't print the vars.

it is possible to work/around change the behaviour of the pretty printer by using its customizing options, but it is not a simple change to make, and means that for a small program a large percentage of it is spent making the pretty printer print something useful for vars.



 Comments   
Comment by Chris Blom [ 13/Nov/14 4:25 AM ]

I've added a patch which adds a method for clojure.lang.Var to
simple-dispatch in src/clojure/pprint/dispatch.clj:

(use-method simple-dispatch clojure.lang.Var pr)

The patch includes a simple test.

Comment by Aspasia Beneti [ 13/Nov/14 4:37 AM ]

Related bug http://dev.clojure.org/jira/browse/CLJ-1565





[CLJ-1572] into does not work with IReduceInit Created: 24/Oct/14  Updated: 18/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transducers

Attachments: Text File clj-1572-2.patch     Text File clj-1572-3.patch     Text File clj-1572-4.patch     Text File CLJ-1572-alternative-POC.patch     Text File clj-1572.patch    
Patch: Code and Test
Approval: Vetted

 Description   

This should work:

(into []
  (reify clojure.lang.IReduceInit
    (reduce [_ f start]
      (reduce f start (range 10)))))
IllegalArgumentException Don't know how to create ISeq from: user$eval5$reify__6
	clojure.lang.RT.seqFrom (RT.java:506)
	clojure.lang.RT.seq (RT.java:487)
	clojure.core/seq--seq--4091 (core.clj:135)
	clojure.core.protocols/seq-reduce (protocols.clj:30)
	clojure.core.protocols/fn--6422 (protocols.clj:42)
	clojure.core.protocols/fn--6369/f--6255--auto----G--6364--6382 (protocols.clj:13)
	clojure.core/reduce (core.clj:6469)
	clojure.core/into (core.clj:6550)

Cause: CollReduce only supports IReduce, not IReduceInit so when reduce calls into it, it falls back to trying to obtain a seq representation which fails.

Proposed: Extend CollReduce to IReduceInit and in the non-init arity, cast to IReduce. Also, now that CollReduce supports both IReduceInit and Iterable, a coll that implements both makes the path through CollReduce nondeterministic. transduce does an explicit check that prefers IReduceInit - the patch copies that approach to reduce as well.

Another consequence of this change is that since PersistentVector implements IReduce but throws on the non-init path, there are some test breakages. To address this, CLJ-1619 (which implements the non-init reduce) must be applied first.

Patch: clj-1572-4.patch
Depends on: CLJ-1619 being applied first



 Comments   
Comment by Alex Miller [ 24/Oct/14 10:40 AM ]

into calls reduce which calls into CollReduce. CollReduce extends to IReduce, but not to IReduceInit. If CollReduce were extended to IReduceInit for the arity it can support, into work as expected in the given example. Patch clj-1572.patch does this.

Comment by Ghadi Shayban [ 08/Nov/14 4:34 PM ]

It is also possible that core/reduce needs the same special casing of IReduceInit that transduce has to allow for a deterministic dispatch when transduce is called with (mapcat f), as mapcat calls reduce.

Comment by Stuart Halloway [ 10/Nov/14 11:02 AM ]

Can someone please expand on Ghadi's comment with an example of the problem?

Comment by Ghadi Shayban [ 10/Nov/14 11:14 AM ]

Example of something that is Iterable & ReduceInit:
https://github.com/ghadishayban/reducers/blob/master/src/ghadi/reducers.clj#L122-L128

Let's call that r/range in this example:
(transduce (mapcat r/range) + 0 [5 5 5 5 5])

The when the mapcat transducer encounters r/range, the inner reduce call will dispatch through CollReduce upon Iterable, rather than IReduceInit.

the inner call to reduce within cat:
https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L7243

Comment by Alex Miller [ 12/Nov/14 12:55 PM ]

To restate the issue from Ghadi for my own sake:

The CollReduce protocol extends to IReduce, IReduceInit and Iterable. Because these are all interfaces, its possible for a custom coll to implement two or more of them. In that case, Clojure will arbitrarily pick which protocol impl is called - this can result in the Iterable version being called instead of IReduce/IReduceInit (which should be preferred).

transduce avoids this by explicitly checking for IReduceInit and preferring it over CollReduce.

Ghadi is suggesting that reduce should also make this preference (currently it does not).

Comment by Nicola Mometto [ 17/Nov/14 3:06 PM ]

If CollReduce could be direcly backed by the IReduce interface, this would remove the need for explicit IReduceInit checking at the callsite.

It's already possible to (defprotocol CollReduce :on-interface clojure.lang.IReduce ..), I'm proposing adding the ability to map the "reduce" method to the coll-reduce protocol-fn aswell and go with this solution

Comment by Alex Miller [ 17/Nov/14 3:21 PM ]

CollReduce extends to two interfaces (IReduceInit and Iterable) and for some impls this is ambiguous under the CollReduce protocol. The check in reduce and transduce is to force the choice of IReduceInit so it is not ambiguous. I think your suggestion re-introduces that issue? Or maybe I'm just not understanding what you mean.

Comment by Nicola Mometto [ 17/Nov/14 3:46 PM ]

Turns out defprotocol already has that capability via :on metadata field.

The attached patch is a proof of concept of my proposal, if there's interest in this approach I can fix the deftype/record/reify method parser to automatically pick the var name rather than having to specify the method name.

Comment by Nicola Mometto [ 17/Nov/14 3:52 PM ]

Ah, I see now the issue. Disregard my patch then.

Comment by Ghadi Shayban [ 14/Dec/14 11:58 AM ]

Note that unless this patch is applied, a plain reduce over an Eduction goes through the seq/iterator path of CollReduce, and not eduction's native IReduceInit path.

Comment by Ghadi Shayban [ 17/Dec/14 5:03 PM ]

with this patch + CLJ-1546

(reduce + [1 2 3]) doesn't work anymore, breaking a few tests.

Comment by Ghadi Shayban [ 17/Dec/14 5:16 PM ]

Should have left a bit more detail.
https://github.com/clojure/clojure/commit/ad7d9c46992cac0e812ce3dd47584c9bb2fda11f

This might not have anything to do with CLJ-1546, just happened to have them both applied. Seems like vectors are both IReduce+IReduceInit, but throw on the IReduce impl.

Vectors were made IReduce before IReduce was split into IReduceInit.

Comment by Nicola Mometto [ 17/Dec/14 5:19 PM ]

I've opened CLJ-1619 with a patch implementing the no-init arity of reduce for PersistentVector

Comment by Nicola Mometto [ 17/Dec/14 5:20 PM ]

An alternative fix would be to just make PersistentVectors IReduceInit rather than IReduce but I don't see the point in doing that since the implementation is trivial.





[CLJ-1568] Incorrect error locations reported in the stacktrace Created: 19/Oct/14  Updated: 22/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Bozhidar Batsov Assignee: Unassigned
Resolution: Unresolved Votes: 18
Labels: errormsgs, ft

Attachments: Text File 0001-CLJ-1568-fix-incorrect-error-locations.patch    
Patch: Code
Approval: Triaged

 Description   

The following code produces an incorrect stacktrace:

(ns clojure-demo.core)

(defn foo
  "I don't do a whole lot."
  [x]
  (println x "Hello, World!"))

(/ 1 0)
Exception in thread "main" java.lang.ArithmeticException: Divide by zero, compiling:(clojure_demo/core.clj:6:31)

The problem is actually on the 8th line. As a matter of fact - there's nothing at location 6:31.
This is a pretty serious problem as many tools parse stacktraces for error locations.
Here's a related discussion in cider's issue tracker.

Patch: 0001-CLJ-1568-fix-incorrect-error-locations.patch
Screened by: Alex Miller



 Comments   
Comment by Alex Miller [ 19/Oct/14 1:39 PM ]

Maybe a dupe of CLJ-1561 ?

Comment by Andy Fingerhut [ 19/Oct/14 4:16 PM ]

I tried out the example given in the description, with the latest Clojure master as of today plus the patch for CLJ-1561 called 0002-Mark-line-number-after-emitting-children.patch, dated Oct 10 2014.

The line:column number 6:31 is the same for that patched version as it is in the ticket description, which is for Clojure 1.6.0.

The issue of misleading line:column numbers is common between the two tickets, but at least the proposed improvement in CLJ-1561's patch is not effective for improving this issue.

Comment by Bozhidar Batsov [ 20/Oct/14 1:36 AM ]

I know that the issue list for 1.7 is pretty much finalised, but I think that this issue and and CLJ-1561 should be fixed as soon as possible.
Correct error reporting is extremely important IMO.

Comment by Nicola Mometto [ 20/Oct/14 8:28 AM ]

Attached a patch that fixes the issue by consuming all the whitespaces before retrieving line/column info for the next form.

Comment by Alex Miller [ 20/Oct/14 8:39 AM ]

Are there possible downsides to more eagerly consuming whitespace as done in the patch?

Comment by Nicola Mometto [ 20/Oct/14 8:44 AM ]

I can't think of any

Comment by Paul Stadig [ 22/Oct/14 2:59 PM ]

The defect on master does not have effect when using compile:

user=> (require 'clojure-demo.core)

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(clojure_demo/core.clj:6:31) 
user=> (load "/clojure_demo/core")

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(clojure_demo/core.clj:6:31) 
user=> (compile "clojure_demo/core")

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(core.clj:8:1) 

With the patch applied all the line numbers are the same in all cases:

user=> (require 'clojure-demo.core)

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(clojure_demo/core.clj:8:1) 
user=> (load "/clojure_demo/core")

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(clojure_demo/core.clj:8:1) 
user=> (compile "clojure_demo/core")

CompilerException java.lang.ArithmeticException: Divide by zero, compiling:(core.clj:8:1) 

Agreed that this seems to be orthogonal to CLJ-1561.





[CLJ-1561] Incorrect line numbers are emitted Created: 10/Oct/14  Updated: 22/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Paul Stadig Assignee: Unassigned
Resolution: Unresolved Votes: 22
Labels: errormsgs

Attachments: Text File 0001-Mark-line-number-after-emitting-children.patch     Text File 0002-Mark-line-number-after-emitting-children.patch    
Patch: Code
Approval: Triaged

 Description   

The Clojure JVM compiler marks the line number for a form before emitting the children for that form. Marking the line number before emitting children leads to incorrect line numbers when a runtime error occurs. For example, when

 (foo bar
      baz)

is emitted the compiler will visit the line number for the expression, then emit the children expressions ('bar' and 'baz') which will mark their own line numbers, then come back and emit the invoke bytecode for 'foo', but since the last line number to be marked was that of 'baz', if 'foo' throws an exception the line number of 'baz' will be reported instead of the line number for the expression as a whole.

This same issue was being manifested with special forms and inlined functions, and was especially bad in the case of the threading macro '->', because it is usually spread across several lines, and the line number reported could end up being very different than the line actually causing an exception.

A demonstration of the incorrect line numbers (and how the fix affects line numbers) can be seen here https://github.com/pjstadig/clojure-line-numbers



 Comments   
Comment by Jozef Wagner [ 10/Oct/14 1:57 PM ]

additions in your patch mixes tabs and spaces. Could you please update the patch so that your added lines indent only with tab characters? Not everyone has tab set at 4 spaces...

Comment by Paul Stadig [ 10/Oct/14 2:42 PM ]

There's already a mixture of just tabs, just spaces, and tabs & spaces in Compiler.java. I'm not sure what the "standard" is, but I've changed the patch to match the surrounding lines.

Comment by Paul Stadig [ 10/Oct/14 2:42 PM ]

Patch with whitespace changes.

Comment by Alex Miller [ 20/Oct/14 8:38 AM ]

These changes will affect the line number tables for a variety of Clojure constructs when compiled. It would be very helpful to me to have a set of examples that covered each case touched in the patch so that I could compile them and look at the bytecode vs the source. This would greatly accelerate the screening process.

Comment by Paul Stadig [ 20/Oct/14 2:29 PM ]

Alex,
I have created a repo on github that has a sample file demonstrating the line number changes.

https://github.com/pjstadig/clojure-line-numbers

Hope that helps!

BTW, I'd be glad to do a skype call or hangout, if you have questions.

Comment by Alex Miller [ 20/Oct/14 2:34 PM ]

This is very helpful, thanks!!

Comment by Alex Miller [ 22/Oct/14 11:35 AM ]

In the hunk at 3191 in KeywordInvokeExpr, a call to visitLineNumber was added, but the prior call 4 lines earlier was not removed. Should it be?

Comment by Paul Stadig [ 22/Oct/14 12:05 PM ]

I left that in thinking that if something goes wrong with the getstatic instruction (null pointer exception? class cast exception?) it should report the line number of the KeywordInvokeExpr. It may be that there isn't a realistic possibility that anything could actually happen with that getstatic instruction, but that was the thought process.

My general rule of thumb was if an emit method emits any instructions before it calls the emit method on another expr, then it should mark its line number before and after the recursive emit call (assuming that the recursive emit call would mark its own line number). In cases where an emit method immediately calls another emit method, then I don't bother to mark a line number until afterwards.





[CLJ-1560] Forbid closing over mutable fields completely Created: 10/Oct/14  Updated: 10/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Closing over mutable fields should be forbidden completely (and generate compiler exception), not just when trying to set! them. As the change of the mutable field does not propagate into closed over ones, this leads to surprising bugs:

(defprotocol P 
  (-set [this]) 
  (-get [this]) 
  (-get-fn [this]))

(deftype T [^:unsynchronized-mutable val] 
  P 
  (-set [this] (set! val :bar)) 
  (-get [this] val) 
  (-get-fn [this] (fn [] val)))

(def x (->T :foo))

(def xf (-get-fn x))

user> (-set x)
:bar
user> (-get x)
:bar
user> (xf)
:foo ;; should be :bar !!!


 Comments   
Comment by Jozef Wagner [ 10/Oct/14 1:42 PM ]

related issue CLJ-274





[CLJ-1553] Parallel transduce Created: 07/Oct/14  Updated: 07/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: transducers

Approval: Vetted

 Description   

Consider how to create a parallel path for transducers, similar to reducers fold.






[CLJ-1552] Consider kv support for transducers (similar to reducers fold) Created: 07/Oct/14  Updated: 16/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transducers

Approval: Vetted

 Description   

In reducers, fold over a map has special support for kv. Consider whether/how to add this for transducers.



 Comments   
Comment by Marshall T. Vandegrift [ 16/Dec/14 11:13 AM ]

We don't have a JIRA "unvote" feature, but I'd like to register my vote against this proposed enhancement. As a heavy user of clojure.core.reducers, I consider the switch to k-v semantics when reducing a map to be a significant mis-feature. As only an initial transformation function applied directly to a map is able to receive the k-v semantics (a limitation I can’t see how would not carry over to transducers), this behavior crops up most frequently when re-ordering operations and discovering that an intermediate map has now caused an airity error somewhere in the middle of a chain of threaded transformations. I’ve never found cause to invoke it intentionally.





[CLJ-1551] Consider transducer support for primitives Created: 07/Oct/14  Updated: 07/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: transducers

Approval: Vetted

 Description   

Need to consider how we can support primitives for transducers. In particular it may be that IFn needs overloading for L/D in addition to O.






[CLJ-1550] Classes generated by deftype and defrecord don't play nice with .getPackage Created: 07/Oct/14  Updated: 07/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Bozhidar Batsov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: bug


 Description   
(.getPackage String)
;; => #<Package package java.lang, Java Platform API Specification, version 1.7>
(deftype T [])
(.getPackage T)
;; => nil

This seems like a bug to me as it's not obvious why the class generated by deftype should exhibit different behaviour.



 Comments   
Comment by Alex Miller [ 07/Oct/14 8:54 AM ]

According to http://docs.oracle.com/javase/7/docs/api/java/lang/Class.html#getPackage() this method returns the package information found by the class loader or null if there is none. Its not clear to me that the current behavior is wrong per the spec. I would need to experiment more to see if this is unusual or not.

Comment by Bozhidar Batsov [ 07/Oct/14 9:05 AM ]

A bit of background for the issue. I'm no expert on the topic, but being able to procure all the class information except its package definitely looks strange to me.

Comment by Kevin Downey [ 07/Oct/14 11:46 AM ]

if you AOT compile(generate a class file on disk for a deftype), getPackage works fine, which suggests to me it is a jvm issue

Comment by Kevin Downey [ 07/Oct/14 11:49 AM ]

actually, it must just be that dynamicclassloader doesn't define a package for classes it loads

Comment by Alex Miller [ 07/Oct/14 12:13 PM ]

Yep, I believe that's correct.





[CLJ-1545] Add unchecked-divide, unchecked-remainder Created: 02/Oct/14  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Andy Fingerhut Assignee: Colin Taylor
Resolution: Unresolved Votes: 0
Labels: math, newbie

Attachments: File CLJ-1545-2.diff     File CLJ-1545.diff    
Patch: Code and Test
Approval: Triaged

 Description   

This appears like it might be an oversight that these are missing. There are unchecked-divide-int and unchecked-remainder-int functions, but not equivalents for longs, even though there are equivalents for longs for every other unchecked operation. The JVM has bytecodes for long division and remainder.

The Clojure documentation in the section "Support for Java Primitives" on page http://clojure.org/java_interop has links for unchecked-divide and unchecked-remainder, but since they don't exist in Clojure, the API link targets don't exist.

It seems like a good idea to either add these to Clojure, or remove them from the documentation.



 Comments   
Comment by Colin Taylor [ 03/Oct/14 6:17 PM ]

Having a go at this.

Comment by Colin Taylor [ 04/Oct/14 6:02 AM ]
  • Added tests for unchecked-divide-int and unchecked-remainder-int too.
  • Unchecked fns only support binary arity and will throw CompilerException(ArityException)s where checked will not.
  • Is there any value to (int,long) (long,int) overrides for java interop cases e.g. using java collections from Clojure in high perf code?
Comment by Alex Miller [ 04/Oct/14 9:13 AM ]

Thanks for taking this on Colin!

1) When I apply the patch (git apply CLJ-1545.diff), I get a bunch of whitespace errors which will need to be cleaned up but also the patch seems to fail to apply at all on the changes in test/clojure/test_clojure/numbers.clj. It looks like perhaps the diff is just not the right diff format. You might want to check out the instructions at http://dev.clojure.org/display/community/Developing+Patches about using git format-patch.

2) If you could put a more useful git commit message, that would be helpful. Something like "CLJ-1545 Adds missing unchecked-divide and unchecked-remainder for primitive longs."

Thanks!

Comment by Colin Taylor [ 04/Oct/14 4:47 PM ]

Uggh, sorry Alex.

New patch with better commit message.

Comment by Alex Miller [ 04/Oct/14 7:24 PM ]

The patch format looks better. Pulling out farther to the ticket itself, afaict Clojure will already use the right byteocode for checked or unchecked so this may not even be needed?

If I compile (without the patch):

(defn foo-div ^long [^long a ^long b]
  (quot a b))

then the bytecode for that fn is:

public final long invokePrim(long, long);
    Code:
       0: lload_1
       1: lload_3
       2: ldiv
       3: lreturn

similarly, quot of two longs yields the same code but with lrem. I think patch has no net effect on the resulting bytecode?

Comment by Andy Fingerhut [ 04/Oct/14 7:42 PM ]

Alex, did you do the testing in your previous comment with *unchecked-math* true or false? If false, then I would think that if CLJ-1254 is judged a bug, then the behavior you saw is a bug, too, that misses the same corner case.

Comment by Alex Miller [ 04/Oct/14 10:19 PM ]

The current results are the same with either unchecked-math setting, but I see your point.

Refreshing my memory of the (/ Long/MIN_VALUE -1) case, I think you're right. The (new) unchecked-divide / remainder should do what the current (checked) forms do and the regular division and remainder cases should be making the overflow check. I think CLJ-1254 should cover the quot changes.

Comment by Colin Taylor [ 04/Oct/14 10:19 PM ]

user=> (dotimes [_ 6] (time (dotimes [_ 50000000] (unchecked-divide 4 (System/currentTimeMillis)))))
"Elapsed time: 1806.942 msecs"
"Elapsed time: 1808.747 msecs"
"Elapsed time: 1865.074 msecs"
"Elapsed time: 1802.777 msecs"
"Elapsed time: 1839.468 msecs"
"Elapsed time: 1830.61 msecs"
nil
user=> (dotimes [_ 6] (time (dotimes [_ 50000000] (/ 4 (System/currentTimeMillis)))))
"Elapsed time: 5003.598 msecs"
"Elapsed time: 4998.182 msecs"
"Elapsed time: 4941.237 msecs"
"Elapsed time: 5036.517 msecs"
"Elapsed time: 4965.867 msecs"
"Elapsed time: 4982.693 msecs"





[CLJ-1536] Remove usage of sun.misc.Signal (which may not be available in Java 9) Created: 26/Sep/14  Updated: 26/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

It looks like Java 9 will not continue to provide access to "internal" classes like sun.misc.Signal. Clojure currently uses this in the REPL to trap ctrl-c (SIGINT) and cancel current evaluation instead of process shutdown.

There is a page of alternatives here:
https://wiki.openjdk.java.net/display/JDK8/Java+Dependency+Analysis+Tool

But there is no suggested alternative for sun.misc.Signal and I'm not aware of a portable solution to it.






[CLJ-1534] Adding condp-> and condp->> macros to core library Created: 24/Sep/14  Updated: 01/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Kuldeep Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: enhancement, macro

Attachments: File condp-threading-macros-25sept2014.diff    
Patch: Code

 Description   

After introduction of cond-> and cond->> macros in 1.5. It makes sense to have condp-> and condp->> macros in the core library.

(condp-> {}
(complement :a) (assoc :a 1)
:a (assoc :b 2)) ;=> {:b 2, :a 1}

In the above example the result of each expr which was evaluated is being passed to the next predicate.



 Comments   
Comment by Andy Fingerhut [ 01/Oct/14 6:37 PM ]

Kuldeep, I cannot comment on whether this change is of interest to the Clojure developers, because I do not know.

I can say that the patch you have attached is not in the expected format. See the page below for instructions on creating a patch in the expected format:

http://dev.clojure.org/display/community/Developing+Patches





[CLJ-1530] Make foo/bar/baz unreadable Created: 22/Sep/14  Updated: 28/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-fix-LispReader-and-EdnReader-so-that-foo-bar-baz-is-.patch    
Patch: Code and Test

 Description   

Currently keywords and symbols containing more than one slash are disallowed by the spec, but allowed by the readers.
This trivial patch makes them unreadable by the readers too.

Pre:

user=> :foo/bar/baz
:foo/bar/baz

Post:

user=> :foo/bar/baz
RuntimeException Invalid token: :foo/bar/baz  clojure.lang.Util.runtimeException (Util.java:221)


 Comments   
Comment by Andy Fingerhut [ 22/Sep/14 12:14 PM ]

Perhaps overlap with CLJ-1527 ?

Comment by Thomas Engelschmidt [ 28/Oct/14 4:36 AM ]

Please notice that keywords with more than one slash has a different hashcode across clojure version 1.5 and 1.6

This creates a problem when using a datomic version that works with clojure 1.5 under clojure 1.6 and the schema have one or more keys with more than one slash.





[CLJ-1520] assoc-in with empty key path assoc-es to nil Created: 05/Sep/14  Updated: 05/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Francis Avila Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   
(assoc-in {} [] 1) ;=> {nil 1}

This should probably throw an exception.

CLJ-373 has a patch (CLJ-373-nested-ops.patch) which fixes this (by throwing an exception on empty key paths), the related broken behavior of update-in, and documents empty key path behavior in get-in et al. I can pull just the assoc-in stuff out of that into a separate patch, but I am really hoping that all the issues in the patch addresses are resolved at once, I.e.:

(get-in {} [] :notfound) ;=> {} ; ok
(get-in {nil 1} [] :notfound) ;=> {nil 1} ; ok
(assoc-in {} [] 1) ;=> {nil 1} ; wat?
(assoc-in {nil 0} [] 1) ;=> {nil 1} ; wat?
(update-in {} [] identity) ;=> {nil nil} ; wat?
(update-in {nil 0} [] inc) ;=> {nil 1} ; wat?





[CLJ-1519] Added extra arity to clojure.core/ns-* fns Created: 04/Sep/14  Updated: 10/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Baranosky Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: enhancement, patch

Attachments: Text File new-ns-arity.patch    
Patch: Code and Test

 Description   

Hello,

Adds another arity where the "ns" parameter is set to a default value of *ns* in these fns:

ns-unmap, ns-resolve, ns-name, ns-map, ns-publics, ns-imports, ns-interns, ns-refers, ns-aliases, ns-unalias

I find I very often use ns-unalias and ns-unmap from the repl, and passing the *ns* arg gets a little tedious.






[CLJ-1515] Reify the result of range and add IReduceInit Created: 29/Aug/14  Updated: 12/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Timothy Baldridge Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None

Attachments: Text File clj-1515-2.patch     Text File clj-1515-3.patch     Text File clj-1515-4.patch     Text File clj-1515-5.patch     Text File clj-1515-6.patch     Text File clj-1515-7.patch     Text File clj-1515-8.patch     Text File clj-1515-9.patch     Text File clj-1515.patch     File patch.diff     File range-patch3.diff     File reified-range4.diff    
Patch: Code and Test
Approval: Vetted

 Description   

Currently range returns a lazy chunked seq. If the return value of range were reified into a type we could optimize common cases and add IReduce support.

Approach: this patch revives the unused (but previously existing) clojure.lang.Range class. This class acts as a lazy seq and implements several other appropriate interfaces such as Counted and Indexed. This type is implemented in Java since range is needed fairly on in core.clj before deftype is defined. The attached patch provides two Range impls sharing some common code in AbstractRange. Range uses Numbers.* methods for all math due to the input types to range being unknown. LongRange handles the specific (but very common) case of a long start/end/step for higher performance. The special case of (range) is just handled with (iterate inc' 0) (which is further optimized for reduce in CLJ-1603).

Note: The patch also includes a tiny tweak in filter that has nothing to do with this patch other than being found while testing. It is a perf boost for all filter operations by avoiding calling .nth twice for every element in every chunk. Notice the filter seq example below gets an extra improvement in perf. If desired, this change could be split out.

Performance:
timings done via criterium quick-bench

expr 1.6.0 1.7.0-alpha4 +patch
(count (filter odd? (take (* 1024 1024) (range)))) 183 ms 173 ms 170 ms
(transduce (take (* 1024 1024)) + (range)) n/a 67 ms 81 ms (w/CLJ-1603: 41 ms)
(count (range (* 1024 1024))) 75 ms 69 ms 0 ms
(reduce + (map inc (range (* 1024 1024)))) 71 ms 68 ms 46 ms
(reduce + (map inc (map inc (range (* 1024 1024))))) 89 ms 91 ms 69 ms
(count (filter odd? (range (* 1024 1024)))) 69 ms 65 ms 43 ms
(transduce (comp (map inc) (map inc)) + (range (* 1024 1024))) n/a 67 ms 36 ms
(doall (range 0 31)) 1.41 µs 1.51 µs 3.02 µs
(into [] (map inc (range 31))) 1.76 µs 1.77 µs 1.43 µs
(into [] (map inc) (range 31)) n/a 1.60 µs 0.63 µs
(doall (range 1/2 1000 1/3)) 1.58 ms 1.53 ms 1.66 ms
(into [] (range 1/2 1000 1/3)) 1.52 ms 1.51 ms 1.38 ms
(doall (range 0.5 1000 0.33)) 0.15 ms 0.14 ms 0.35 ms
(into [] (range 0.5 1000 0.33)) 0.13 ms 0.12 ms 0.08 ms

These results are a bit mixed but in general I think they make the most common and important things faster while some less important things are slightly slower. In general the "doall" examples are slower as this is kind of the worst case wrt overhead and values are retrieved via seq/next looping (the slowest option). Stacked sequence ops happen via the the chunked seq impl (which is a little faster), and the transduce/into will use the reduce impl (which is much faster).

Patch: clj-1515-9.patch

Screened by:

Screener question: (range) and range on non-longs both support auto-promotion towards infinity in this patch, which seems to be implied by the doc string but was not actually implemented or tested correctly afaict.



 Comments   
Comment by Alex Miller [ 29/Aug/14 3:19 PM ]

1) Not sure about losing chunked seqs - that would make older usage slower, which seems undesirable.
2) RangeIterator.next() needs to throw NoSuchElementException when walking off the end
3) I think Range should implement IReduce instead of relying on support for CollReduce via Iterable.
4) Should let _hash and _hasheq auto-initialize to 0 not set to -1. As is, I think _hasheq always would be -1?
5) _hash and _hasheq should be transient.
6) count could be cached (like hash and hasheq). Not sure if it's worth doing that but seems like a win any time it's called more than once.
7) Why the change in test/clojure/test_clojure/serialization.clj ?
8) Can you squash into a single commit?

Comment by Timothy Baldridge [ 29/Aug/14 3:40 PM ]

1) I agree, adding chunked seqs to this will dramatically increase complexity, are we sure we want this?
2) exception added
3) I can add IReduce, but it'll pretty much just duplicate the code in protocols.clj. If we're sure we want that I'll add it too.
4) fixed hash init values, defaults to -1 like ASeq
5) hash fields are now transient
6) at the cost of about 4 bytes we can cache the cost of a multiplication and an addition, doesn't seem worth it?
7) the tests in serialization.clj assert that the type of the collection roundtrips. This is no longer the case for range which starts as Range and ends as a list. The change I made converts range into a list so that it properly roundtrips. My assumption is that we shouldn't rely on all implementations of ISeq to properly roundtrip through EDN.
8) squashed.

Comment by Alex Miller [ 29/Aug/14 3:49 PM ]

6) might be useful if you're walking through it with nth, which hits count everytime, but doubt that's common
7) yep, reasonable

Comment by Andy Fingerhut [ 18/Sep/14 6:52 AM ]

I have already pointed out to Edipo in personal email the guidelines on what labels to use for Clojure JIRA tickets here: http://dev.clojure.org/display/community/Creating+Tickets

Comment by Timothy Baldridge [ 19/Sep/14 10:02 AM ]

New patch with IReduce directly on Range instead of relying on iterators

Comment by Alex Miller [ 01/Oct/14 2:00 PM ]

The new patch looks good. Could you do a test to determine the perf difference from walking the old chunked seq vs the new version? If the perf diff is negligible, I think we can leave as is.

Another idea: would it make sense to have a specialized RangeLong for the (very common) case where start, end, and step could all be primitive longs? Seems like this could help noticeably.

Comment by Timothy Baldridge [ 03/Oct/14 10:00 AM ]

Looks like chunked seqs do make lazy seq code about 5x faster in these tests.

Comment by Ghadi Shayban [ 03/Oct/14 10:22 AM ]

I think penalizing existing code possibly 5x is a hard cost to stomach. Is there another approach where a protocolized range can live outside of core? CLJ-993 has a patch that makes it a reducible source in clojure.core.reducers, but it's coll-reduce not IReduce, and doesn't contain an Iterator. Otherwise we might have to take the chunked seq challenge.

Alex: Re long/float. Old reified Ranged.java in clojure.lang blindly assumes ints, it would be nice to have a long vs. float version, though I believe the contract of reduce boxes numbers. (Unboxed math can be implemented very nicely as in Prismatic's Hiphip array manipulation library, which takes the long vs float specialization to the extreme with different namespaces)

Comment by Timothy Baldridge [ 03/Oct/14 10:38 AM ]

I don't think anyone is suggesting we push unboxed math all the way down through transducers. Instead, this patch contains a lot of calls to Numbers.*, if we were to assume that the start end and step params of range are all Longs, then we could remove all of these calls and only box when returning an Object (in .first) or when calling IFn.invoke (inside .reduce)

Comment by Alex Miller [ 03/Oct/14 10:46 AM ]

I agree that 5x slowdown is too much - I don't think we can give up chunked seqs if that's the penalty.

On the long case, I was suggesting what Tim is talking about, in the case of all longs, create a Range that stores long prims and does prim math, but still return boxed objects as necessary. I think the only case worth optimizing is all longs - the permutation of other options gets out of hand quickly.

Comment by Ghadi Shayban [ 03/Oct/14 11:00 AM ]

Tim, I'm not suggesting unboxed math, but the singular fast-path of all-Longs that you and Alex describe. I mistakenly lower-cased Long/Float.

Comment by Timothy Baldridge [ 31/Oct/14 11:30 AM ]

Here's the latest work on this, a few tests fail. If someone wants to take a look at this patch feel free, otherwise I'll continue to work on it as I have time/energy.

Comment by Nicola Mometto [ 14/Nov/14 12:51 PM ]

As discussed with Tim in #clojure, the current patch should not change ArrayChunk's reduce impl, that's an error.

Comment by Alex Miller [ 09/Dec/14 2:40 AM ]

Still a work in progress...

Comment by Nicola Mometto [ 09/Dec/14 8:44 AM ]

Alex, while this is still a work in progress, I see that the change on ArrayChunk#reduce from previous WIP patches not only has not been reverted but has been extended. I don't think the current approach makes sense as ArrayChunk#reduce is not part of the IReduce/IReduceInit contract but of the IChunk contract and changing the behaviour to be IReduce-like in its handling of reduced introduces the burden of having to use preserve-reduced on the reducing function to no apparent benefit.

Given that the preserve-reduced is done on the clojure side, it seems to me like directly invoking .reduce rather than routing through internal-reduce should be broken but I haven't tested it.

Comment by Alex Miller [ 09/Dec/14 9:49 AM ]

That's the work in progress part - I haven't looked at yet. I have not extended or done any work re ArrayChunk, just carried through what was on the prior patch. I'll be working on it again tomorrow.

Comment by Ghadi Shayban [ 10/Dec/14 11:14 PM ]

I am impressed and have learned a ton through this exercise.

quick review of clj-1515-2
1) withMeta gives the newly formed object the wrong meta.
2) LongRange/create() is the new 0-arity constructor for range, which sets the 'end' to Double/POSITIVE_INFINITY cast as a long. Current core uses Double/POSITIVE_INFINITY directly. Not sure how many programs rely upon iterating that far, or how they would break.
3) Relatedly, depending on the previous point: Because only all-long arguments receive chunking, the very common case of (range) with no args would be unchunked. Doesn't seem like too much of a stretch to add chunking to the other impl.
4) Though the commented invariants say that Range is never empty, the implementation uses a magic value of _count == 0 to mean not cached, which is surprising to me. hashcodes have the magic value of -1
5) s/instanceof Reduced/RT.isReduced
6) is the overflow behavior of "int count()" correct?

Comment by Alex Miller [ 11/Dec/14 12:06 AM ]

1) agreed!
2) Good point. I am definitely changing behavior on this (max of 9223372036854775807). I will look at whether this can be handled without affecting perf. Really, handling an infinite end point is not compatible with several things in LongRange.
3) I actually did implement chunking for the general Range and found it was slower (the original Clojure chunking is faster). LongRange is making up for that difference with improved primitive numerics.
4) Since empty is invalid, 0 and -1 are equally invalid. But I agree -1 conveys the intent better.
5) agreed
6) probably not. ties into 2/3.

Thanks for this, will address.

Comment by Alex Miller [ 11/Dec/14 12:11 AM ]

Added -4 patch that addresses 1,4,5 but not the (range) stuff.

Comment by Alex Miller [ 11/Dec/14 12:51 PM ]

Latest -7 patch addresses all feedback and perf #s updated.





[CLJ-1504] Add :inline to most core predicates Created: 15/Aug/14  Updated: 15/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-add-inline-to-some-core-predicates.patch    
Patch: Code

 Description   

This will allow instance? predicates calls to be emitted using the instanceof JVM bytecode and will also allow tools like core.typed or tools.analyzer.jvm to infer the type of a var/local on a per branch basis without having to special-case all the core predicates.



 Comments   
Comment by Jozef Wagner [ 15/Aug/14 1:32 PM ]

Related ticket CLJ-1227 and related quote from Alex:

definline is considered to be an experimental feature and Rich would like to discourage its use as the hope is to remove it in the future. The desired replacement is something like common lisp compiler macros that could allow the compiler to detect special situations and optimize the result but leave behind a function invocation for the case where no special behavior is available.

Comment by Nicola Mometto [ 15/Aug/14 1:42 PM ]

This patch uses "manual" :inline metadata on functions, it's used by many other core functions (like +,- et), not definline so Rich's comment doesn't apply.





[CLJ-1493] Fast keyword intern Created: 06/Aug/14  Updated: 11/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: dennis zhuang Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: keywords, performance
Environment:

Mac OS X 10.9.4 / 2.6 GHz Intel Core i5 / 8 GB 1600 MHz DDR3
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


Attachments: File fast_keyword_intern.diff    
Patch: Code
Approval: Triaged

 Description   

Keyword's intern(Symbol) method uses recursive invocation to get a valid keyword instance.I think it can be rewrite into a 'for loop'
to reduce method invocation cost.
So i developed this patch, and make some simple benchmark.Run the following command line three times after 'ant jar':

java -Xms64m -Xmx64m -cp test:clojure.jar clojure.main -e "(time (dotimes [n 10000000] (keyword (str n))))"

Before patched:

"Elapsed time: 27343.827 msecs"
"Elapsed time: 26172.653 msecs"
"Elapsed time: 25673.764 msecs"

After patched:

"Elapsed time: 24884.142 msecs"
"Elapsed time: 23933.423 msecs"
"Elapsed time: 25382.783 msecs"

It looks the patch make keyword's intern a little more fast.

The patch is attached and test.

Thanks.

P.S. I've signed the contributor agreement, and my email is killme2008@gmail.com .



 Comments   
Comment by Alex Miller [ 07/Aug/14 9:01 AM ]

Looks intriguing (and would be a nice change imo). I ran this on a json parsing benchmark I used for the keyword changes and saw ~3% improvement.

Comment by dennis zhuang [ 07/Aug/14 9:54 PM ]

Updated the patch, remove the 'k == null' clause in for loop,it's not necessary.

Comment by Andy Fingerhut [ 11/Aug/14 1:29 AM ]

Dennis, while JIRA can handle multiple patches with the same name, it can be confusing for people discussing the patches, and for some scripts I have to evaluate them. Please consider giving the patches different names (e.g. with version numbers in them), or removing older ones if they are obsolete.

Comment by dennis zhuang [ 11/Aug/14 9:19 AM ]

Hi,andy

Thank you for reminding me.I deleted the old patch.

Comment by dennis zhuang [ 11/Sep/14 10:34 AM ]

I am glad to see it is helpful.I benchmark the patch with current master branch,it's fine too.





[CLJ-1473] Bad pre/post conditions silently passed Created: 24/Jul/14  Updated: 11/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Brandon Bloom Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: errormsgs

Attachments: Text File 0001-Validate-that-pre-and-post-conditions-are-vectors.patch    
Patch: Code and Test
Approval: Triaged

 Description   

Before:

user=> ((fn [x] {:pre (pos? x)} x) -5) ; ouch!
-5
user=> ((fn [x] {:pre [(pos? x)]} x) -5) ; meant this
AssertionError Assert failed: (pos? x)  user/eval4075/fn--4076 (form-init5464179453862723045.clj:1)

After:

user=> ((fn [x] {:pre (pos? x)} x) -5)
CompilerException java.lang.IllegalArgumentException: Pre and post conditions should be vectors, compiling:(NO_SOURCE_PATH:1:2) 
user=> ((fn [x] {:pre [(pos? x)]} x) -5)                                  
AssertionError Assert failed: (pos? x)  user/eval2/fn--3 (NO_SOURCE_FILE:2)
user=> ((fn [x] {:post (pos? x)} x) -5)
CompilerException java.lang.IllegalArgumentException: Pre and post conditions should be vectors, compiling:(NO_SOURCE_PATH:3:2) 
user=> ((fn [x] {:post [(pos? x)]} x) -5)              
AssertionError Assert failed: (pos? x)  user/eval7/fn--8 (NO_SOURCE_FILE:4)





[CLJ-1472] The locking macro fails bytecode verification on ART runtime Created: 23/Jul/14  Updated: 11/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Adam Clements Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: None
Environment:

Android ART runtime


Attachments: Text File 0001-CLJ-1472-Locking-macro-without-explicit-monitor-ente.patch    
Patch: Code
Approval: Triaged

 Description   

Android ART runs compile time verification on bytecode and was failing on any usage of the locking macro. Examination of the bytecode as compared to a java synchronized block shows up a number of differences:
https://gist.github.com/AdamClements/2ae6c4919964b71eb470

Having the monitor-enter inside the try block seems wrong to me, as surely if the lock fails to be acquired, it shouldn't be released with monitor-exit. Moving the monitor enter outside the try block seems to have resolved the issue and android no longer complains about usages of locking and all clojure tests still pass.

Java's generated code goes further and catches any exceptions generated by the monitor-exit itself and retries indefinitely (I believe the logic is that then at least your deadlock is in the right place, and not next time something else attempts to acquire a lock on the same object). I don't think that this can be replicated in clojure without getting down to the bytecode emitting level though and it doesn't seem to be an issue for the ART verifier.



 Comments   
Comment by Adam Clements [ 24/Jul/14 11:17 AM ]

After using this a little more, I've found that moving this outside the try block breaks nREPL.

Looking at the bytecode, the monitorenter for the locking in clojure.tools.nrepl.middleware.session/session-out and in a few other places ends up in an entirely different method definition and we now get a JVM IllegalMonitorStateException as well as an ART verification error for this function.

Comment by Andy Fingerhut [ 01/Aug/14 9:08 PM ]

Adam, I cannot comment on whether your patch is of interest or not, but it is true that no patch will be committed to Clojure if the author has not signed a Contributor Agreement, which can now be done on-line at http://clojure.org/contributing

Comment by Adam Clements [ 04/Aug/14 4:24 PM ]

Uploaded a new patch (and signed the contributor agreement). This passes both the JVM and ART bytecode verification, The extra try/catch around the monitor exit is optional (verification passes with or without it) but where the java version retries monitor-exit indefinitely and shows the deadlock at the right time, without catching errors in the monitor-exit an undetermined monitor-enter in the future might fail, not showing up the actual bug.

It's not very pretty, but without finer grained control of the generated bytecode, this is the best I could do.

Comment by Adam Clements [ 25/Nov/14 8:31 AM ]

Have just tested with Lollipop, and this patch might no longer be sufficient.

Getting in touch with the ART guys to see if they can shed a little more light and verify whether it will work on the current master branch of AOSP

Comment by Adam Clements [ 25/Nov/14 9:49 AM ]

Bug filed with AOSP project, hopefully they can shed some light on whether it is our problem and if so how we can fix it.

https://code.google.com/p/android/issues/detail?id=80823

Comment by Adam Clements [ 28/Nov/14 11:03 AM ]

I have uploaded an alternative implementation of the locking macro (0001-CLJ-1472-Locking-macro-without-explicit-monitor-ente.patch) which cheats a little - the synchronized block is actually implemented in Java and so guarantees compatibility. This is at the cost of a little extra indirection and the naming/location could probably be better.

But it does fix the bug and work on all versions of android, android + art and the jvm. Would this approach be acceptable?

Comment by Kevin Downey [ 08/Dec/14 1:12 PM ]

I have yet to see any evidence that the bytecode clojure is generating in some way violates the jvm spec, so I suspect the issue is clojure requires a jvm to run, and android doesn't provide a jvm, just something that looks like one if you don't tread outside the beaten path.

Comment by Kevin Downey [ 08/Dec/14 1:27 PM ]

given the structured locking verbiage in https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.11.10, `(locking nil)` may generate bytecode whose runtime behavior violates structured locking. the first patch on this issue can cause the compiler to emit monitorenter/exit instructions in different methods, which definitely violates structured locking

Comment by Adam Clements [ 09/Dec/14 10:45 AM ]

Yes, the first patch was definitely wrong, I left it for some context to the conversation, but it's probably best to just remove it for clarity.

For anyone following this conversation who doesn't want to decompile and observe the bytecode, here's a gist with the difference between a java synchronized block and clojure locking https://gist.github.com/AdamClements/2ae6c4919964b71eb470

I'm finding it hard to work out where the deviation from the spec occurs too, though I can see the difference with the Java version, if anything, the Clojure version looks closer to what's described in the spec than the Java one!

If someone with more knowledge than me on the subject could engage on the AOSP bug https://code.google.com/p/android/issues/detail?id=80823 then perhaps we could settle this as an android bug which is too focussed on the java implementation rather than the JVM spec, or perhaps they'll find something that's wrong with the Clojure implementation. I have uploaded the original clojure behaviour and asked them for some more explanation on why it is failing.

Comment by Adam Clements [ 09/Dec/14 11:09 AM ]

The response from the ART guys about what they think we're violating is:

The section on "Structured locking" contains the following:

"[...] implementations [...] are permitted but not required to enforce
both of the following two rules guaranteeing structured locking. [...]"

ART currently enforces both rules at verification time, including

"At no point during a method invocation may the number of monitor exits
performed by T on M since the method invocation exceed the number of
monitor entries performed by T on M since the method invocation."

Comment by Adam Clements [ 09/Dec/14 11:32 AM ]

If for example instruction https://gist.github.com/AdamClements/2ae6c4919964b71eb470#file-test_locks-class-L24 or the monitor-enter itself on the next line were to fail, couldn't it could end up in the finally clause and attempt to release the lock even though it has never been captured?

I think this violates the structured locking rules in the jvm spec you linked to.

Comment by Kevin Downey [ 09/Dec/14 1:15 PM ]

an interesting question about structured locking, is does the specification refer to the static bytecode or the runtime behavior of the bytecode. given the bytecode linked(https://gist.github.com/AdamClements/2ae6c4919964b71eb470#file-test_locks-class-L24), the static bytecode has the same number of entries and exits, but the dynamic behavior may be different. I wonder which one the art guys claim to be enforcing at verification time (it seems like it would have to be the static bytecode, not the dynamic properties, but then they shouldn't be failing to verify this). looking at the google code issue, the comment https://code.google.com/p/android/issues/detail?id=80823#c6 was made by the same dev as https://code.google.com/p/android/issues/detail?id=80823#c3, so I sort of suspect there is some miscommunication going on. It is not clear in what context the dev is replying in, since in the previous comment you mention splitting monitor-enter and exit across methods. I think things would be much clearer if all patches, specialized clojure android builds, etc, were gotten rid of, then with a vanilla clojure jar you get a javap dump of what fails to verify, then just take that over to the android issue tracker and ask "hey, this fails to verify, why?"

Comment by Adam Clements [ 11/Dec/14 9:15 AM ]

Yeah, I shouldn't have confused it with the patched versions. The gist and the currently uploaded version use the vanilla clojure version of the locking macro now though.

I think the issue comes from the exception table and the instructions that covers. If line 24 can throw for example, you would end up at runtime with a monitor-exit, having never encountered a monitor-enter.





[CLJ-1467] Implement Comparable in PersistentList Created: 17/Jul/14  Updated: 14/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Pascal Germroth Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: collections

Attachments: Text File 0001-first-try-for-adding-compare.patch    

 Description   

PersistentVector implements Comparable already.



 Comments   
Comment by Bart Kastermans [ 13/Nov/14 11:17 AM ]

Patch for this issue; done with Jeroen van Dijk and Razvan Petruescu at a clojure meetup. Any feedback welcome; the learning for me here is not the fix, but learning how to deal with ant and jira etc.

Comment by Andy Fingerhut [ 13/Nov/14 12:31 PM ]

Looks like you have navigated the steps for creating a patch in the desired format, and attaching it to a JIRA ticket, just fine. I see your name on the list of contributors, which is a precondition before a patch can be committed to Clojure or a contrib library.

You've gotten past what are actually the easier parts. There is still the issue of whether this ticket is even considered by the Clojure core team to be an enhancement worth making a change to Clojure. Take a look at the JIRA workflow here if you haven't seen it already and are curious: http://dev.clojure.org/display/community/JIRA+workflow

If you like Pascal think that this is a change you really want to see in Clojure, you may vote on this or any other JIRA ticket (except ones you create yourself – the creator is effectively the 0th voter for a ticket). Log in and click on the Vote link near the top right, and/or Watch to get email updates of changes.

Comment by Bart Kastermans [ 14/Nov/14 3:12 AM ]

Andy, thanks for the info. I was not aware of the JIRA workflow.





[CLJ-1459] records should support transient Created: 05/Jul/14  Updated: 06/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Yongqian Li Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: defrecord


 Description   

user=> (defrecord R [a])
user.R
user=> (transient (->R nil))
ClassCastException user.R cannot be cast to clojure.lang.IEditableCollection clojure.core/transient (core.clj:3060)






[CLJ-1456] The compiler ignores too few or too many arguments to throw Created: 30/Jun/14  Updated: 03/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alf Kristian Støyle Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: compiler, ft

Attachments: Text File 0001-CLJ-1456-counting-forms-to-catch-malformed-throw-for.patch     Text File 0001-CLJ-1456-counting-forms-to-catch-malformed-throw-for.patch     Text File v3_0001-CLJ-1456-counting-forms-to-catch-malformed-throw-for.patch    
Patch: Code and Test
Approval: Triaged

 Description   

The compiler does not fail on "malformed" throw forms:

user=> (defn foo [] (throw))
#'user/foo

user=> (foo)
NullPointerException   user/foo (NO_SOURCE_FILE:1)

user=> (defn bar [] (throw Exception baz))
#'user/bar

user=> (bar)
ClassCastException java.lang.Class cannot be cast to java.lang.Throwable  user/bar (NO_SOURCE_FILE:1)

; This one works, but ignored-symbol, should probably not be ignored
user=> (defn quux [] (throw (Exception. "Works!") ignored-symbol))
#'user/quux

user=> (quux)
Exception Works!  user/quux (NO_SOURCE_FILE:1)

The compiler can easily avoid these by counting forms.

Patch: v3_0001-CLJ-1456-counting-forms-to-catch-malformed-throw-for.patch

Screened by: Alex Miller



 Comments   
Comment by Alf Kristian Støyle [ 30/Jun/14 11:56 AM ]

Not sure how to create a test for the attached patch. Will happily do so if anyone has a suggestion.

Comment by Alex Miller [ 30/Jun/14 12:23 PM ]

Re testing, I think the examples you give are good - you should add tests to test/clojure/test_clojure/compilation.clj that eval the form and expect compilation errors. I'm sure you can find similar examples.

Comment by Alf Kristian Støyle [ 30/Jun/14 2:01 PM ]

Newest patch also contains a few tests.

Comment by Andy Fingerhut [ 29/Aug/14 4:54 PM ]

All patches dated Jun 30 2014 and earlier no longer applied cleanly to latest master after some commits were made to Clojure on Aug 29, 2014. They did apply cleanly before that day.

I have not checked how easy or difficult it might be to update this patch. See section "Updating Stale Patches" on this wiki page for some tips on updating patches: http://dev.clojure.org/display/community/Developing+Patches

Alf, it can help avoid confusion if different patches have different file names. JIRA lets you create multiple attachments with the same name, but I wouldn't recommend it.

Comment by Alf Kristian Støyle [ 30/Aug/14 2:18 AM ]

It was easy to fix the patch. Uploaded the new patch v3_0001-CLJ-1456-counting-forms-to-catch-malformed-throw-for.patch, which applies cleanly to the current master.





[CLJ-1454] Companion to swap! which returns the old value Created: 28/Jun/14  Updated: 10/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Philip Potter Assignee: Unassigned
Resolution: Unresolved Votes: 6
Labels: atom

Approval: Triaged

 Description   

Sometimes, when mutating an atom, it's desirable to know what the value before the swap happened. The existing swap! function returns the new value, so is unsuitable for this use case. Currently, the only option is to roll your own using a loop and compare-and-set!

An example of this would be where the atom contains a PersistentQueue and you want to atomically remove the head of the queue and process it: if you run (swap! a pop), you have lost the reference to the old head of the list so you can't process it.

It would be good to have a new function swap-returning-old! which returned the old value instead of the new.



 Comments   
Comment by Philip Potter [ 28/Jun/14 4:00 PM ]

Overtone already defines functions like this in overtone.helpers.ref, which get used by overtone.libs.event. These return both the old and the new value, although in all existing use cases only the old value gets used.

flatland/useful defines a trade! fn which returns the old value, although the implementation is less clean than a compare-and-set! based solution would be.

Comment by Philip Potter [ 29/Jun/14 6:23 AM ]

Chris Ford suggested "swap-out!" as a name for this function. I definitely think "swap-returning-old!" isn't the ideal name.

Comment by Jozef Wagner [ 30/Jun/14 1:33 AM ]

I propose a switch! name. The verb switch is defined as "substitute (two items) for each other; exchange.", and as you get the old value back, it evokes slightly the exchange of items.

Comment by Philip Potter [ 30/Jun/14 3:03 AM ]

Medley also has a deref-swap! which does the same thing.

Comment by Alex Miller [ 30/Jun/14 8:20 AM ]

I think deref-swap! seems like a morally equivalent name to Java's AtomicReference.getAndSet() which is the same idea.

Comment by Philip Potter [ 30/Jun/14 1:19 PM ]

Funny you say that Alex, because prismatic/plumbing defines a get-and-set! (also defined by other projects), equivalent to deref-reset! in medley. Plumbing also defines swap-pair! which returns both old and new values, like the overtone fn, although once again the only usage I can find only uses the old value.

Comment by Alex Miller [ 30/Jun/14 3:37 PM ]

I think it's important to retain the notion that you are not switching/exchanging values but applying the update model of applying a function to the old value to produce the new value. I don't even particularly like "swap!" as I think that aspect is lost in the name (alter and alter-var-root are better). I like that "deref-swap!" combines two words with existing connotations and orders them appropriately.

Comment by Timothy Baldridge [ 30/Jun/14 3:43 PM ]

except that that naming doesn't fit well compared to functions like nfirst which are defined as (comp next first). This function is not (comp deref swap!).





[CLJ-1453] Most Iterator implementations do not correctly implement next failing to throw the required NoSuchElementException Created: 24/Jun/14  Updated: 07/Aug/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Meikel Brandmeyer Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: interop

Attachments: Text File 0001-Fix-iterator-implementations-to-throw-NSEE-when-exha.patch     Text File 0001-Throw-NSEE-in-gvec-iterator.patch    
Patch: Code
Approval: Triaged

 Description   

Most implementations of Iterators for Clojure's collections do not implement the next method correctly. In case the iterator is exhausted the methods fail with some case dependent error, but not with the NoSuchElementException as required by the official Java contract for that method. This causes problems when working with Java libraries relying on that behaviour.

Issue encountered in real world code using http://pipes.tinkerpop.com.

To reproduce:

(-> [] .iterator .next)

This throws a NPE instead of NSEE.

(doto (.iterator [1 2]) .next .next .next)

This throws an ArrayIndexOutOfBoundsException instead of NSEE.

The attached patch fixes the methods by adding a check for hasNext before actually trying to provide the next element. If there is no next element the correct exception is thrown.

Up-to-date patch file: 0001-Fix-iterator-implementations-to-throw-NSEE-when-exha.patch



 Comments   
Comment by Tim McCormack [ 15/Jul/14 9:56 PM ]

To establish a baseline, this piece of code checks all the iterators I could find within Clojure 1.6.0 and makes sure they throw an appropriate exception:

iterator-checker.clj
(defn next-failure
  []
  (let [ok (atom [])]
    (doseq [[tp v]
            (sorted-map
             :vec-0 []
             :vec-n [1 2 3]
             :vec-start (subvec [1 2 3 4] 0 1)
             :vec-end (subvec [1 2 3 4] 3 4)
             :vec-ls-0 (.listIterator [])
             :vec-ls-n (.listIterator [1 2 3])
             :vec-start-ls (.listIterator (subvec [1 2 3 4] 0 1))
             :vec-end-ls (.listIterator (subvec [1 2 3 4] 3 4))
             :seq ()
             :list-n '(1 2 3)
             :set-hash-0 (hash-set)
             :set-hash-n (hash-set 1 2 3)
             :set-sor-0 (sorted-set)
             :set-sor-n (sorted-set 1 2 3)
             :map-arr-0 (array-map)
             :map-arr-n (array-map 1 2, 3 4)
             :map-hash-0 (hash-map)
             :map-hash-n (hash-map 1 2, 3 4)
             :map-sor-n (sorted-map)
             :map-sor-n (sorted-map 1 2, 3 4)
             :map-sor-ks-0 (.keys (sorted-map))
             :map-sor-ks-n (.keys (sorted-map 1 2, 3 4))
             :map-sor-vs-0 (.vals (sorted-map))
             :map-sor-vs-n (.vals (sorted-map 1 2, 3 4))
             :map-sor-rev-0 (.reverseIterator (sorted-map))
             :map-sor-rev-n (.reverseIterator (sorted-map 1 2, 3 4))
             :map-ks-0 (.keySet {})
             :map-ks-n (.keySet {1 2, 3 4})
             :map-vs-0 (.values {})
             :map-vs-n (.values {1 2, 3 4})
             :gvec-int-0 (vector-of :long)
             :gvec-int-n (vector-of :long 1 2 3))]
      (let [it (if (instance? java.util.Iterator v)
                 v
                 (.iterator v))]
        (when-not it
          (println "Null iterator:" tp))
        (try (dotimes [_ 50]
               (.next it))
             (catch java.util.NoSuchElementException nsee
               (swap! ok conj tp))
             (catch Throwable t
               (println tp "threw" (class t))))))
    (println "OK:" @ok)))

The output as of current Clojure master (201a0dd970) is:

:gvec-int-0 threw java.lang.IndexOutOfBoundsException
:gvec-int-n threw java.lang.IndexOutOfBoundsException
:map-arr-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-arr-n threw java.lang.ArrayIndexOutOfBoundsException
:map-hash-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-ks-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-ks-n threw java.lang.ArrayIndexOutOfBoundsException
:map-sor-ks-0 threw java.util.EmptyStackException
:map-sor-ks-n threw java.util.EmptyStackException
:map-sor-n threw java.util.EmptyStackException
:map-sor-rev-0 threw java.util.EmptyStackException
:map-sor-rev-n threw java.util.EmptyStackException
:map-sor-vs-0 threw java.util.EmptyStackException
:map-sor-vs-n threw java.util.EmptyStackException
:map-vs-0 threw java.lang.ArrayIndexOutOfBoundsException
:map-vs-n threw java.lang.ArrayIndexOutOfBoundsException
:vec-0 threw java.lang.NullPointerException
:vec-end threw java.lang.ArrayIndexOutOfBoundsException
:vec-end-ls threw java.lang.IndexOutOfBoundsException
:vec-ls-0 threw java.lang.IndexOutOfBoundsException
:vec-ls-n threw java.lang.IndexOutOfBoundsException
:vec-n threw java.lang.ArrayIndexOutOfBoundsException
:vec-start threw java.lang.ArrayIndexOutOfBoundsException
:vec-start-ls threw java.lang.IndexOutOfBoundsException
OK: [:list-n :map-hash-n :seq :set-hash-0 :set-hash-n :set-sor-0 :set-sor-n]
Comment by Tim McCormack [ 15/Jul/14 9:57 PM ]

0001-Fix-iterator-implementations-to-throw-NSEE-when-exha.patch missed one thing: clojure.gvec. With the patch in place, my checker outputs the following:

:gvec-int-0 threw java.lang.IndexOutOfBoundsException
:gvec-int-n threw java.lang.IndexOutOfBoundsException
OK: [:list-n :map-arr-0 :map-arr-n :map-hash-0 :map-hash-n :map-ks-0 :map-ks-n :map-sor-ks-0 :map-sor-ks-n :map-sor-n :map-sor-rev-0 :map-sor-rev-n :map-sor-vs-0 :map-sor-vs-n :map-vs-0 :map-vs-n :seq :set-hash-0 :set-hash-n :set-sor-0 :set-sor-n :vec-0 :vec-end :vec-end-ls :vec-ls-0 :vec-ls-n :vec-n :vec-start :vec-start-ls]

That should be a quick fix.

Comment by Michał Marczyk [ 15/Jul/14 10:01 PM ]

CLJ-1416 includes a fix for gvec's iterator impls (and some other improvements to interop).

Comment by Tim McCormack [ 15/Jul/14 10:17 PM ]

Attaching a fix for the gvec iterator. Combined with the existing patch, this fixes all broken iterators that I could find.

Comment by Andy Fingerhut [ 07/Aug/14 10:25 AM ]

I believe this Clojure commit: https://github.com/clojure/clojure/commit/e7215ce82215bda33f4f0887cb88570776d558a0

introduces more implementations of the java.util.Iterator interface where next() returns null instead of throwing a NoSuchElementException





[CLJ-1446] (def v) with no init supplied destroys #'v metadata Created: 13/Jun/14  Updated: 13/Jun/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Nahuel Greco Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

(def a) destroys #'a metadata, check this:

(def ^:mykey a 1)

(meta #'a)              ;; ok, :mykey is present

(let [v (def a)]
   [(meta v)            ;; NO :mykey present, metadata destroyed
    (identical? v #'a)  ;; true, we are talking of the same var
   ])

(meta #'a)              ;; NO :mykey present

If this is not a bug but a "feature", then we have at least two problems:

1- The def special form documentation doesn't state this behaviour at all, it needs to be clarified. With the current documentation it seems as doing a def with no init supplied will not make any side-effect at all, and this is not true for the var metadata.

2- defmulti uses this form to lookup the var and check if it already binds to a MultiFn, if that is the case then defmulti does nothing... but it really does something, defmulti will destroy the original var metadata in the (supposedly non-destructive) check. This is the involved defmulti fragment:

(let [v# (def ~mm-name)]
  (when-not (and (.hasRoot v#) (instance? clojure.lang.MultiFn (deref v#)))
   ...


 Comments   
Comment by Alex Miller [ 13/Jun/14 4:14 PM ]

I think this is mostly a dupe of CLJ-1148 but I'll leave it as it states the specific problem more precisely.

Comment by Nahuel Greco [ 13/Jun/14 7:35 PM ]

Alex Miller: It seems CLJ-1148 is an special case where this problem shows, but the patches in CLJ-1148 only fixes the issues for defonce, not generally for def, not for defmulti and not clarifies this behaviour in the def special form documentation.





[CLJ-1435] 'numerator and 'denominator fail to handle integral values (i.e. N/1) Created: 30/May/14  Updated: 30/May/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Aaron Brooks Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None


 Description   

Because ratio values reduce to lowest terms and, for integral values where the lowest term is N/1, are auto-converted to BigInts (and formerly Longs), the current behavior of clojure.core/numerator and clojure.core/denominator yield unexpected results.

user=> (numerator 1/3)
1
user=> (numerator (+ 1/3 2/3))

ClassCastException clojure.lang.BigInt cannot be cast to clojure.lang.Ratio  clojure.core/numerator (core.clj:3306)
user=> (denominator 1/3)
3
user=> (denominator (+ 1/3 2/3))

ClassCastException clojure.lang.BigInt cannot be cast to clojure.lang.Ratio  clojure.core/denominator (core.clj:3314)
user=>

The auto-conversion to Longs is not really the problem in my mind. I'd like to see numerator return the original value when presented with a BigInt and denominator always return 1 when presented with a BigInt. It seems reasonable to request the same for Longs.

If desired, I'd be happy to produce a patch.



 Comments   
Comment by Andy Fingerhut [ 30/May/14 6:35 PM ]

I don't know the official stance on this ticket, but will add some notes.

Aaron, numerator and denominator are pretty clearly documented to work on Ratio types only.

It is pretty easy to write my-numerator and my-denominator that work exactly as you wish, checking for the type of arg and using numerator, denominator for Ratio types, and doing whatever you think is correct for other numeric types.

Comment by Aaron Brooks [ 30/May/14 7:44 PM ]

I'm aware that they are documented as such. Part of my point is that you can be working entirely with Ratio types and, via arithmetic operations between them, sometimes wind up with a non-Ratio number unexpectedly.

Also consider:

user=> (numerator 2/1)
ClassCastException java.lang.Long cannot be cast to clojure.lang.Ratio  clojure.core/numerator (core.clj:3238)

You're then left either implementing a try/catch correction or always checking the type before using numerator or denominator which is a loss in performance.

The patch I have in mind is creating a protocol, extended to Ratio, BigInt and Long which calls the appropriate method (Ratios) or returns either the given number or 1 (numerator/denominator) for the integral types. I expect this to maintain the current level of performance in the cases where it works and behave properly in the cases currently not handled.





[CLJ-1431] Switch from MurmurHash3 to SipHash to prevent DoS collision attack (hash flooding) Created: 25/May/14  Updated: 26/May/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.4, Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: James Thornton Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: security


 Description   

Clojure is using Murmur3 throughout:
https://github.com/clojure/clojure/commit/dff9600387b962f16fc78e6477e10e34651fd366

DJB, Jean-Philippe Aumasson, and Martin Boßlet have shown that Murmur3 is not resilient against hash collision attacks:
http://www.ocert.org/advisories/ocert-2012-001.html
https://131002.net/siphash/

"Hash-flooding DoS reloaded: attacks and defenses" talk by DJB, Jean-Philippe Aumasson, and Martin Boßlet
http://media.ccc.de/browse/congress/2012/29c3-5152-en-hashflooding_dos_reloaded_h264.html

"Breaking Murmur: Hash-flooding DoS Reloaded"
http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/

Python, Ruby, JRuby, Haskell, Rust, Perl, Redis... have all switched to SipHash
https://en.wikipedia.org/wiki/SipHash

Last year Google dropped CityHash from Guava and replaced it with SipHash
https://code.google.com/p/guava-libraries/issues/detail?id=1232

SipHash Guava Implementation
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/hash/SipHashFunction.java

SipHash Java reference implementation
https://github.com/emboss/siphash-java/blob/master/src/main/java/com/github/emboss/siphash/SipHash.java



 Comments   
Comment by Alex Miller [ 26/May/14 12:56 AM ]

Thanks, we've talked about this issue and some possible things we could do, but didn't have a ticket for it yet.

Comment by Alex Miller [ 26/May/14 1:08 AM ]

While the Java 7 approach relied on (attempting) to properly seed hash maps with string hash codes, that was all dropped in Java 8, which addressed DoS collision hash attacks by instead improving the data structure to switch from linear collisions to a red/black tree (log-time) for collisions. It's possible a similar approach could work in Clojure as well.

One workaround that could be used now is to wrap map keys in a custom type that implements IHashEq and implements an alternate hash function.





[CLJ-1424] Feature Expressions Created: 15/May/14  Updated: 07/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Ghadi Shayban Assignee: Alex Miller
Resolution: Unresolved Votes: 1
Labels: reader

Attachments: File CLJ-1424-2.diff     File clj-1424-3.diff     File clojure-feature-expressions.diff    
Patch: Code
Approval: Incomplete

 Description   

Feature expressions based directly on Common Lisp. See Clojure design docs, which includes discussion and links to Common Lisp documentation for feature expressions here: http://dev.clojure.org/display/design/Feature+Expressions

#+ #- and or not
are supported. Unreadable tagged literals are suppressed through the *suppress-read* dynamic var. For example, with *features* being #{:clj}, which is the default, the following should read :foo

#+cljs #js {:one :two} :foo

The initial *features* set can be augmented (clj will always be included) with the clojure.features System property:

-Dclojure.features=production,embedded

Patch: clj-1424-3.diff

Questions: Should *suppress-read* override *read-eval*?

Related: CLJS-27, TRDR-14



 Comments   
Comment by Jozef Wagner [ 16/May/14 2:19 AM ]

Has there been a decision that CL syntax is going to be used? Related discussion can be found at design page, google groups discussion and another discussion.

Comment by Alex Miller [ 16/May/14 8:34 AM ]

No, no decisions on anything yet.

Comment by Ghadi Shayban [ 19/May/14 7:25 PM ]

Just to echo a comment from TRDR-14:

This is WIP and just one approach for feature expressions. There seem to be at least two couple diverging approaches emerging from the various discussion (Brandon Bloom's idea of read-time splicing being the other.)

In any case having all Clojure platforms be ready for the change is probably essential. Also backwards compatibility of feature expr code to Clojure 1.6 and below is also not trivial.

Comment by Kevin Downey [ 04/Aug/14 1:39 PM ]

if you have ever tried to do tooling for a language where the "parser" tossed out information or did some partial evaluation, it is a pain. this is basically what the #+cljs style feature expressions and bbloom's read time splicing both do with clojure's reader. I think resolving this at read time instead of having the compiler do it before macro expansion is a huge mistake and makes the reader much less useful for reading code.

Comment by Ghadi Shayban [ 04/Aug/14 2:00 PM ]

Kevin, what kind of tooling use case are you alluding to?

Comment by Kevin Downey [ 04/Aug/14 3:24 PM ]

any use case that involves reading code and not immediately handing it off to the compiler. if I wanted to write a little snippet to read in a function, add an unused argument to every arity then pprint it back, reader resolved feature expressions would not round trip.

if I want to write snippet of code to generate all the methods for a deftype (not a macro, just at the repl write a `for` expression) I can generate a clojure data structure, call pprint on it, then paste it in as code, reader feature expressions don't have a representation as data so I cannot do that, I would have to generate strings directly.

Comment by Alex Miller [ 22/Aug/14 9:10 AM ]

Changing Patch setting so this is not in Screenable yet (as it's still a wip).

Comment by Alex Miller [ 07/Nov/14 4:39 PM ]

Latest patch brings up to par with related patches in CLJS-27 and TRDR-14 and importantly adds support for loading .cljc files as Clojure files.

Comment by Andy Fingerhut [ 07/Nov/14 5:55 PM ]

Maybe undesirable behavior demonstrated below with latest Clojure master plus patch clj-1424-3.diff, due to the #+cljs skipping the comment, but not the (dec a). I thought it could be fixed simply by moving RT.suppressRead() check after (ret == r) check in read(), but that isn't correct.

user=> (read-string "(defn foo [a] #+clj (inc a) #+cljs ; foo\n (dec a))")
(defn foo [a] (inc a) (dec a))




[CLJ-1406] Libs are blindly added into loaded-libs even if an error occurs during loading Created: 17/Apr/14  Updated: 17/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: OHTA Shogo Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-modify-clojure.core-load-lib-so-that-it-removes-the-.patch    
Patch: Code

 Description   

Suppose you have a lib that causes some errors during loading, like the following:

(ns broken-lib)

(} ; this line will cause a reader error

And then, if you require the lib, it would be added into loaded-libs in spite of the reader error, which makes require succeed silently after that.

user=> (contains? (loaded-libs) 'broken-lib)
false
user=> (require 'broken-lib)
CompilerException java.lang.RuntimeException: Unmatched delimiter: }, compiling:(broken_lib.clj:3:3) 
user=> (contains? (loaded-libs) 'broken-lib)
true
user=> (require 'broken-lib)
nil
user=>

Cause:
The patch for CLJ-1116 made the ns macro blindly add the lib being defined into loaded-libs even if an error occurs during loading.

Approach:
Modify clojure.core/load-lib so that it removes the lib from loaded-libs on error.



 Comments   
Comment by Alex Miller [ 17/Apr/14 9:07 AM ]

This patch seems somewhat removed from the cause - is there some way to instead prevent the lib from being added to loaded-libs in the first place?

Comment by OHTA Shogo [ 17/Apr/14 9:21 AM ]

To do so, I think we need to revert CLJ-1116.





[CLJ-1399] missing field munging when recreating deftypes serialized in to byte code Created: 02/Apr/14  Updated: 02/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Kevin Downey Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None

Attachments: File clj-1399.diff    
Patch: Code

 Description   

to embed deftypes in the bytecode the compiler emits the value of each field, then emits a call to the deftypes underlying class's constructor.

to get a list of fields the compiler calls .getBasis.

the getBasis fields are the "clojure" level field names of the deftype, which the actual "jvm" level field names have been munged (replacing - with _, etc), so the compiler tries to generate code to set values on non-existent fields

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L4579

https://www.refheap.com/70731

you can work around this by using field names that don't require munging. a solution might be just calling munge in the emission of field sets of ITypes



 Comments   
Comment by Kevin Downey [ 02/Apr/14 4:26 PM ]

reproducing case

$ rlwrap java -server -Xmx1G -Xms1G -jar /Users/hiredman/src/clojure/target/clojure-1.6.0-master-SNAPSHOT.jar
Clojure 1.6.0-master-SNAPSHOT
user=> (deftype Foo [hello-world])
user.Foo
user=> (alter-var-root #'default-data-readers assoc 'foo (fn [x] (Foo. x)))
{foo #<user$eval6$fn__7 user$eval6$fn__7@2f953efd>, inst #'clojure.instant/read-instant-date, uuid #'clojure.uuid/default-uuid-reader}
user=> #foo "1"
CompilerException java.lang.IllegalArgumentException: No matching field found: hello-world for class user.Foo, compiling:(NO_SOURCE_PATH:0:0)
user=>
Comment by Kevin Downey [ 02/Apr/14 4:39 PM ]

this patch fixes the issue on the latest master for me

Comment by Chas Emerick [ 02/Apr/14 4:57 PM ]

FWIW, this was precipitated by real experience (I think I created the refheap paste). The workaround is easy (don't use dashes in field names of deftypes you want to return from data reader functions), but I wouldn't expect anyone to guess that that wasn't already oversensitized to munging edge cases.





[CLJ-1381] Improve support for extending protocols to primitive arrays Created: 13/Mar/14  Updated: 18/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: protocols


 Description   

It is possible to extend protocols to primitive arrays but specifying the class for the type is a little tricky:

(defprotocol P (p [_]))
(extend-protocol P (Class/forName "[B") (p [_] "bytes"))
(p (byte-array 0))   ;; => "bytes"

However, things go bad if you try to do more than one of these:

(extend-protocol P 
  (Class/forName "[B") (p [_] "bytes")
  (Class/forName "[I") (p [_] "ints"))
CompilerException java.lang.UnsupportedOperationException: nth not supported on this type: Character, compiling:(NO_SOURCE_PATH:1:1)
	clojure.lang.Compiler.analyze (Compiler.java:6380)
	clojure.lang.Compiler.analyze (Compiler.java:6322)
	clojure.lang.Compiler$MapExpr.parse (Compiler.java:2879)
	clojure.lang.Compiler.analyze (Compiler.java:6369)
	clojure.lang.Compiler.analyze (Compiler.java:6322)
	clojure.lang.Compiler$InvokeExpr.parse (Compiler.java:3624)
	clojure.lang.Compiler.analyzeSeq (Compiler.java:6562)
	clojure.lang.Compiler.analyze (Compiler.java:6361)
	clojure.lang.Compiler.analyze (Compiler.java:6322)
	clojure.lang.Compiler$BodyExpr$Parser.parse (Compiler.java:5708)
	clojure.lang.Compiler$FnMethod.parse (Compiler.java:5139)
	clojure.lang.Compiler$FnExpr.parse (Compiler.java:3751)
Caused by:
UnsupportedOperationException nth not supported on this type: Character
	clojure.lang.RT.nthFrom (RT.java:857)
	clojure.lang.RT.nth (RT.java:807)
	clojure.core/emit-hinted-impl/hint--5951/fn--5953 (core_deftype.clj:758)
	clojure.core/map/fn--4207 (core.clj:2487)
	clojure.lang.LazySeq.sval (LazySeq.java:42)
	clojure.lang.LazySeq.seq (LazySeq.java:60)
	clojure.lang.RT.seq (RT.java:484)
	clojure.lang.RT.countFrom (RT.java:537)
	clojure.lang.RT.count (RT.java:530)
	clojure.lang.Cons.count (Cons.java:49)
	clojure.lang.Compiler.analyze (Compiler.java:6352)
	clojure.lang.Compiler.analyze (Compiler.java:6322)

The code in {parse-impls} is seeing the second {(Class/forName "[I")} as a function, not as a new type. One workaround for this is to only extend the protocol to one type at a time.

It would be even better (moving into enhancement area) if there was a syntax here to specify primitive array types - we already have the syntax of {bytes, ints, longs}, etc for type hints and those seem perfectly good to me.



 Comments   
Comment by Nahuel Greco [ 18/Sep/14 6:08 PM ]

It also breaks when extending only one array type:

(extend-protocol P
  String               (p [_] "string")
  (Class/forName "[B") (p [_] "ints") 
  )

;=> CompilerException java.lang.UnsupportedOperationException: nth not supported on this type ...

But changing the declaration order fixes it:

(extend-protocol P
  (Class/forName "[B") (p [_] "ints") 
  String               (p [_] "string")
  )

;=> OK




[CLJ-1376] Initialize internal maps to more efficient version Created: 11/Mar/14  Updated: 11/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance


 Description   

In reviewing some hashing stuff, I noticed that there are many places internal to Clojure that use maps initialized with PersistentHashMap.EMPTY. Many of these maps are likely to have a small number of entries such that a PersistentArrayMap might be more efficient.

These are the candidates:

src/jvm/clojure/lang/ARef.java
19:private volatile IPersistentMap watches = PersistentHashMap.EMPTY;

src/jvm/clojure/lang/Compiler.java
3009:				IPersistentMap m = PersistentHashMap.EMPTY;
3819:					       KEYWORDS, PersistentHashMap.EMPTY,
3820:					       VARS, PersistentHashMap.EMPTY,
3964:	IPersistentMap closes = PersistentHashMap.EMPTY;
3977:	IPersistentMap keywords = PersistentHashMap.EMPTY;
3978:	IPersistentMap vars = PersistentHashMap.EMPTY;
5121:                            ,CLEAR_SITES, PersistentHashMap.EMPTY
7259:			       KEYWORDS, PersistentHashMap.EMPTY,
7260:			       VARS, PersistentHashMap.EMPTY
7418:			IPersistentMap opts = PersistentHashMap.EMPTY;
7475:			IPersistentMap fmap = PersistentHashMap.EMPTY;
7522:					       KEYWORDS, PersistentHashMap.EMPTY,
7523:					       VARS, PersistentHashMap.EMPTY,
7912:                            ,CLEAR_SITES, PersistentHashMap.EMPTY

src/jvm/clojure/lang/LispReader.java
755:					RT.map(GENSYM_ENV, PersistentHashMap.EMPTY));

src/jvm/clojure/lang/MultiFn.java
39:	this.methodTable = PersistentHashMap.EMPTY;
41:	this.preferTable = PersistentHashMap.EMPTY;
49:		methodTable = methodCache = preferTable = PersistentHashMap.EMPTY;

src/jvm/clojure/lang/Var.java
48:	final static Frame TOP = new Frame(PersistentHashMap.EMPTY, null);
175:	setMeta(PersistentHashMap.EMPTY);
341:	IPersistentMap ret = PersistentHashMap.EMPTY;

Approach: Two possible approaches - initialize to PersistentArrayMap.EMPTY or call RT.map(). The latter requires function invocation so is slightly slower, but has the benefit of localizing map construction into a single place.






[CLJ-1373] LazySeq should utilize cached hash from its underlying seq. Created: 09/Mar/14  Updated: 20/Mar/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections, performance
Environment:

1.6.0 master SNAPSHOT


Attachments: File clj-1373.diff    
Patch: Code
Approval: Triaged

 Description   

Even if underlying seq contains a cached hash, LazySeq computes it every time.

user=> (def a (range 100000))
#'user/a
user=> (time (hash a))
"Elapsed time: 46.904273 msecs"
375952610
user=> (time (hash a)) ;; RECOMPUTE
"Elapsed time: 10.879098 msecs"
375952610
user=> (def b (seq a))
#'user/b
user=> (time (hash b))
"Elapsed time: 10.572522 msecs"
375952610
user=> (time (hash b)) ;; CACHED HASH
"Elapsed time: 0.024927 msecs"
375952610
user=> (def c (lazy-seq b))
#'user/c
user=> (time (hash c))
"Elapsed time: 12.207651 msecs"
375952610
user=> (time (hash c))
"Elapsed time: 10.995798 msecs"
375952610


 Comments   
Comment by Jozef Wagner [ 09/Mar/14 9:20 AM ]

Added patch which checks if underlying seq implements IHashEq and if yes, uses that hash instead of recomputing.





[CLJ-1372] Inconsistent hash with java collections Created: 09/Mar/14  Updated: 12/Nov/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Jozef Wagner Assignee: Unassigned
Resolution: Unresolved Votes: 7
Labels: collections, interop
Environment:

1.6.0 master


Attachments: Text File 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-alternative.patch     Text File 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch     Text File 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-substring.patch     Text File 0005-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch     Text File 0006-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch     Text File 0007-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch     File clj-1372-2.diff     File clj-1372.diff    
Patch: Code and Test
Approval: Triaged

 Description   

c.c/hash always use hashCode for java collections, which is incompatible when comparing with Clojure collections, which use Murmur3.

user=> (== (hash (java.util.ArrayList. [1 2 3])) (hash [1 2 3]))
false
user=> (= (java.util.ArrayList. [1 2 3]) [1 2 3])
true

One way to fix it is to add a special case in Util/hasheq for java.util.Collections, as it is now for Strings.

Link to a discussion of this topic in the Clojure group: https://groups.google.com/forum/#!topic/clojure/dQhdwZsyIEw



 Comments   
Comment by Jozef Wagner [ 09/Mar/14 8:02 AM ]

Same problem for maps, so hasheq should have a special case for java.util.Map too.

Comment by Jozef Wagner [ 09/Mar/14 9:21 AM ]

Added patch with fix for j.u. Map, Set and List.

Comment by Andy Fingerhut [ 09/Mar/14 6:02 PM ]

Add patch clj-1372-2.diff that is identical to Jozef Wagner's clj-1372.diff, except it also adds some new tests that fail without his changes, and pass with them.

Comment by Alex Miller [ 10/Mar/14 9:31 AM ]

I think the contract on equiv/hasheq is more narrowly scoped than this and only applies if both collections are IPersistentCollection. In other words, I don't think this is wanted or required.

Note that the Java .equals/.hashCode contract is maintained here - these collections will compare as .equals() and do have the same .hashCode().

Comment by Jozef Wagner [ 10/Mar/14 9:38 AM ]

Without the patch the following statement is not valid: "If two objects are equal with c.c/=, than their hash returned by c.c/hash is the same number". We can say that this is valid only iff both objects are 'clojure' objects, but this goes against clojures interop principles (interop is easy, fast, no surprises).

Comment by Jozef Wagner [ 10/Mar/14 9:54 AM ]

Manifestation of this bug

user=> (assoc (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]) :bar)
{[1 2 3] :bar, [1 2 3] :foo}
user=> (get (hash-map [1 2 3] :foo) (java.util.ArrayList. [1 2 3]))
nil
Comment by Alex Miller [ 10/Mar/14 10:58 AM ]

I agree that would be a nice thing to say without qualification.

There is a real cost to adding more branches in hasheq - adding those collection checks affects every hasheq. Running a full Clojure build, I see the following set of classes with >100 occurences where this happens (note that exactly 0 of these are the Java collections - this case doesn't exist in the Clojure build itself):

clojure.lang.Var 107001502
java.lang.Class 2651389
java.lang.Character 2076322 
java.util.UUID 435235 
java.util.Date 430956
clojure.lang.Compiler$LocalBinding 116830
java.lang.Boolean 112361
java.util.regex.Pattern 325

We'd be adding 4 more instanceof checks in the path of every one of those hasheqs. This would also likely blow any JVM inlining.

Rich says "all bets should be off for hasheq/equiv of non-values" where Java collections obviously fall into the class of "non-values".

Comment by Andy Fingerhut [ 10/Mar/14 11:04 AM ]

Would a doc patch be considered? Say one that modified the doc of clojure.core/hash to include a phrase indicating that it is only promised to be consistent with clojure.core/= for immutable values? It could even perhaps mention that Floats are out, too: see CLJ-1036

Comment by Alex Miller [ 10/Mar/14 12:00 PM ]

I think it would be preferred to do any detailed docs about hash at http://clojure.org/data_structures rather than in the docstring. Although the docstring on hash probably could use an update and a pointer to the web site after the latest changes.

Comment by Jozef Wagner [ 10/Mar/14 12:14 PM ]

Neverthless it is a breaking change from 1.5, and it should be mentioned in changelog. What still bugs me is that c.c/= is supported in such cases but the c.c/hash is not. If supporting c.c/hash is expensive, isn't it better to drop support for c.c/= in such cases? It will eliminate surprises such as:

user=> (apply distinct? (hash-set [1 2 3] (java.util.Collections/unmodifiableList [1 2 3])))
false
Comment by Alex Miller [ 10/Mar/14 2:05 PM ]

I'm not sure it's a "breaking" change if something not considered to be guaranteed changes. But I take your point.

I don't think it's feasible to drop = support for Clojure and Java collections - that seems important and useful. And if it were free to do so, I would like to be able to say without qualification that if equiv=true, then hasheq is the same.

It's unclear to me that the examples listed on this ticket are actually real problems people are likely to encounter. The main users of hasheq are hash map and hash set. So to manifest, you would need to be putting a mixture of Clojure and Java collections into one of those, in particular a mixture of collections that compare as equal.

Still thinking about it.

Comment by Jozef Wagner [ 10/Mar/14 3:27 PM ]

Sorry for spamming but there may be another option, to not fallback into hashCode in hasheq, but to instead throw in cases where hasheq is requested for non-values. This will lead to a cleaner separation of hash types. Of course it will prevent putting non-values into hash-set.

Comment by Alex Miller [ 10/Mar/14 3:34 PM ]

There is no simple check for "valueness" though?

Comment by Andy Fingerhut [ 10/Mar/14 3:37 PM ]

An idea, for what it might be worth: Add one test for instance of java.util.Collection in Util.hasheq method instead of 3 separate tests for Set, List, and Map. It doesn't cover Map.Entry.

Comment by Alex Miller [ 10/Mar/14 3:38 PM ]

Map doesn't extend Collection either.

Comment by Stuart Halloway [ 11/Mar/14 10:44 AM ]

I think this needs more consideration and should not hold up 1.6.

Comment by Andy Fingerhut [ 20/Mar/14 2:01 PM ]

Both patches clj-1372.diff and clj-1372-2.diff fail to apply cleanly as of latest Clojure master on Mar 20 2014. They did apply cleanly before the Mar 19 2014 commit, I believe, and the only issue appears to be a changed line of diff context. Given the discussion about whether such a change is desired, it sounds like more thought is needed before deciding what change should be made, if any.

Comment by Mike Anderson [ 11/May/14 2:31 PM ]

This is a pretty bad defect. It absolutely needs to be fixed. It's not really about whether using a mix of Clojure and Java collections is a likely use case or not (it probably isn't...), it's about providing consistent guarantees that people can rely upon.

For example, now I'm really unsure about whether some of the library functions I have that use sets or maps are broken or not. I'd be particularly worried about anything that implements object caches / memoisation / interning based on hashed values. Such code may now have some really nasty subtle defects.

Since they are library functions, I can't guarantee what kind of objects are passed in so the code has to work with all possible inputs (either that or I need to write a clear docstring and throw an exception if the input is not supported).

Comment by Michał Marczyk [ 12/May/14 11:29 PM ]

This patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) makes hasheq consistent with = for java.util.{List,Map,Map.Entry,Set}. Additionally it extends the special treatment of String (return hasheq of hashCode) to all types not otherwise handled (see below for a comment on this).

It is also available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-2

An earlier version is available here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq

If I understand correctly, what needs to be benchmarked is primarily the "dispatch time" for clojure.lang.Util/hasheq given a Clojure type. So, I ran a Criterium benchmark repeatedly hashing the same persistent hash map, on the theory that this will measure just the dispatch time on IHashEq instances. I then ran a separate benchmark hashing a PHM, a string and a long and adding up the results with unchecked-add. Hopefully this is a good start; I've no doubt additional benchmarks would be useful.

The results are somewhat surprising to me: hasheq on PHM is actually slightly faster in this benchmark on my build than on 1.6.0; the "add three hasheqs" benchmark is slightly faster on 1.6.0.

;;; 1.6.0

;;; NB. j.u.HM benchmark irrelevant
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.24405836928592 % of runtime
Evaluation count : 5549560980 in 60 samples of 92492683 calls.
             Execution time mean : 9.229881 ns
    Execution time std-deviation : 0.156716 ns
   Execution time lower quantile : 8.985994 ns ( 2.5%)
   Execution time upper quantile : 9.574039 ns (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 6.2652 % Variance is slightly inflated by outliers
Evaluation count : 35647680 in 60 samples of 594128 calls.
             Execution time mean : 1.695145 µs
    Execution time std-deviation : 20.186554 ns
   Execution time lower quantile : 1.670049 µs ( 2.5%)
   Execution time upper quantile : 1.740329 µs (97.5%)
                   Overhead used : 1.741068 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.028614538339401 % of runtime
Evaluation count : 1029948300 in 60 samples of 17165805 calls.
             Execution time mean : 56.797488 ns
    Execution time std-deviation : 0.732221 ns
   Execution time lower quantile : 55.856731 ns ( 2.5%)
   Execution time upper quantile : 58.469940 ns (97.5%)
                   Overhead used : 1.836671 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; patch applied

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5537698680 in 60 samples of 92294978 calls.
             Execution time mean : 8.973200 ns
    Execution time std-deviation : 0.157079 ns
   Execution time lower quantile : 8.733544 ns ( 2.5%)
   Execution time upper quantile : 9.289350 ns (97.5%)
                   Overhead used : 1.744772 ns
Evaluation count : 2481600 in 60 samples of 41360 calls.
             Execution time mean : 24.287800 µs
    Execution time std-deviation : 288.124326 ns
   Execution time lower quantile : 23.856445 µs ( 2.5%)
   Execution time upper quantile : 24.774097 µs (97.5%)
                   Overhead used : 1.744772 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.298136122909759 % of runtime
Evaluation count : 954751500 in 60 samples of 15912525 calls.
             Execution time mean : 61.681794 ns
    Execution time std-deviation : 0.712110 ns
   Execution time lower quantile : 60.622003 ns ( 2.5%)
   Execution time upper quantile : 62.904801 ns (97.5%)
                   Overhead used : 1.744772 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

As a side note, the earlier version of the patch available on the other branch doesn't have a separate branch for String. This made hasheq faster for objects implementing IHashEq, but slowed down the "three hashes" benchmark roughly by a factor of 2.

Comment by Alex Miller [ 12/May/14 11:39 PM ]

Just for clarity, please refer to patches attached here by name so as time goes on we don't have to correlate attachment time with comment time.

I'm not particularly worried about the cost of things that implement IHashEq as they should be unaffected other than potential inlining issues. I am curious about the cost of hasheq for objects that fall through to the end of the cases and pay the cost for all of the checks. The list farther up in the comments is a good place to start - things like Class, Character, and Var (which could possibly be addressed in Var).

Comment by Michał Marczyk [ 12/May/14 11:47 PM ]

Good point, I've edited the above comment to include the patch name.

Thanks for the benchmarking suggestions – I'll post some new results in ~6 minutes.

Comment by Michał Marczyk [ 13/May/14 12:18 AM ]

First, for completeness, here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-alternative.patch) which doesn't do the extra murmuring for types not otherwise handled. It's slower for the single PHM case; see below for details. Also, here's the branch on GitHub:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-3

As for the new results, the perf hit is quite large, I'm afraid:

;;; with patch (murmur hashCode for default version)
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.409118084170768 % of runtime
Evaluation count : 655363680 in 60 samples of 10922728 calls.
             Execution time mean : 96.459888 ns
    Execution time std-deviation : 1.019817 ns
   Execution time lower quantile : 95.079086 ns ( 2.5%)
   Execution time upper quantile : 98.684168 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 675919140 in 60 samples of 11265319 calls.
             Execution time mean : 88.965959 ns
    Execution time std-deviation : 0.825226 ns
   Execution time lower quantile : 87.817159 ns ( 2.5%)
   Execution time upper quantile : 90.755688 ns (97.5%)
                   Overhead used : 1.708347 ns
Evaluation count : 574987680 in 60 samples of 9583128 calls.
             Execution time mean : 103.881498 ns
    Execution time std-deviation : 1.103615 ns
   Execution time lower quantile : 102.257474 ns ( 2.5%)
   Execution time upper quantile : 106.071144 ns (97.5%)
                   Overhead used : 1.708347 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; 1.6.0
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.3353133083866688 % of runtime
Evaluation count : 1829305260 in 60 samples of 30488421 calls.
             Execution time mean : 34.205701 ns
    Execution time std-deviation : 0.379106 ns
   Execution time lower quantile : 33.680636 ns ( 2.5%)
   Execution time upper quantile : 34.990138 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 1 (1.6667 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1858100340 in 60 samples of 30968339 calls.
             Execution time mean : 30.401309 ns
    Execution time std-deviation : 0.213878 ns
   Execution time lower quantile : 30.095976 ns ( 2.5%)
   Execution time upper quantile : 30.871497 ns (97.5%)
                   Overhead used : 1.718257 ns
Evaluation count : 1592932200 in 60 samples of 26548870 calls.
             Execution time mean : 36.292934 ns
    Execution time std-deviation : 0.333512 ns
   Execution time lower quantile : 35.795063 ns ( 2.5%)
   Execution time upper quantile : 36.918183 ns (97.5%)
                   Overhead used : 1.718257 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

One PHM and Class/Character/Var results with the new patch (no extra murmur step in the default case):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.258952964663877 % of runtime
Evaluation count : 1007768460 in 60 samples of 16796141 calls.
             Execution time mean : 58.195608 ns
    Execution time std-deviation : 0.482804 ns
   Execution time lower quantile : 57.655857 ns ( 2.5%)
   Execution time upper quantile : 59.154655 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
Evaluation count : 647944080 in 60 samples of 10799068 calls.
             Execution time mean : 91.275863 ns
    Execution time std-deviation : 0.659943 ns
   Execution time lower quantile : 90.330980 ns ( 2.5%)
   Execution time upper quantile : 92.711120 ns (97.5%)
                   Overhead used : 1.567532 ns
Evaluation count : 699506160 in 60 samples of 11658436 calls.
             Execution time mean : 84.564131 ns
    Execution time std-deviation : 0.517071 ns
   Execution time lower quantile : 83.765607 ns ( 2.5%)
   Execution time upper quantile : 85.569206 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 594919980 in 60 samples of 9915333 calls.
             Execution time mean : 100.336792 ns
    Execution time std-deviation : 0.811312 ns
   Execution time lower quantile : 99.313490 ns ( 2.5%)
   Execution time upper quantile : 102.167675 ns (97.5%)
                   Overhead used : 1.567532 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Comment by Michał Marczyk [ 13/May/14 1:05 AM ]

Here's a new patch (0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M-substring.patch) that takes the outrageous approach of replacing the Iterable/Map/Entry test with a .startsWith("java.util.") on the class name. (I experimented with .getClass().getPackage(), but the performance of that was terrible.) The branch is here:

https://github.com/michalmarczyk/clojure/tree/alien-hasheq-4

Hash perf on the "fall-through" cases with this patch seems to be very good:

user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.31690036780011 % of runtime
Evaluation count : 1661453640 in 60 samples of 27690894 calls.
             Execution time mean : 35.099750 ns
    Execution time std-deviation : 0.422800 ns
   Execution time lower quantile : 34.454839 ns ( 2.5%)
   Execution time upper quantile : 35.953584 ns (97.5%)
                   Overhead used : 1.556642 ns
Evaluation count : 1630167600 in 60 samples of 27169460 calls.
             Execution time mean : 35.487409 ns
    Execution time std-deviation : 0.309872 ns
   Execution time lower quantile : 35.083030 ns ( 2.5%)
   Execution time upper quantile : 36.190015 ns (97.5%)
                   Overhead used : 1.556642 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 3 (5.0000 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1440434700 in 60 samples of 24007245 calls.
             Execution time mean : 40.894457 ns
    Execution time std-deviation : 0.529510 ns
   Execution time lower quantile : 40.055991 ns ( 2.5%)
   Execution time upper quantile : 41.990985 ns (97.5%)
                   Overhead used : 1.556642 ns
nil
Comment by Michał Marczyk [ 13/May/14 1:28 AM ]

The new patch (...-substring.patch) returns hashCode for java.util.** classes other than List, Map, Map.Entry and Set, of course, so no behaviour change there.

Here are the benchmarks for repeated PHM lookups (slightly slower than 1.6.0 apparently, though within 1 ns) and the "add three hasheqs" benchmark (66 ns with patch vs. 57 ns without):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
Evaluation count : 5183841240 in 60 samples of 86397354 calls.
             Execution time mean : 10.076893 ns
    Execution time std-deviation : 0.182592 ns
   Execution time lower quantile : 9.838456 ns ( 2.5%)
   Execution time upper quantile : 10.481086 ns (97.5%)
                   Overhead used : 1.565749 ns
Evaluation count : 3090420 in 60 samples of 51507 calls.
             Execution time mean : 19.596627 µs
    Execution time std-deviation : 224.380257 ns
   Execution time lower quantile : 19.288347 µs ( 2.5%)
   Execution time upper quantile : 20.085620 µs (97.5%)
                   Overhead used : 1.565749 ns
nil

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.418253438197936 % of runtime
Evaluation count : 879210900 in 60 samples of 14653515 calls.
             Execution time mean : 66.939309 ns
    Execution time std-deviation : 0.747984 ns
   Execution time lower quantile : 65.667310 ns ( 2.5%)
   Execution time upper quantile : 68.155046 ns (97.5%)
                   Overhead used : 1.724002 ns
nil

It is important to note that I have obtained the no-patch result for the "three hasheqs" benchmarks on a fresh JVM when benchmarking 1.6.0, so that's also how I repeated the benchmark with the patch applied. Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?

Comment by Michał Marczyk [ 13/May/14 8:04 AM ]

Here's a new patch (0005-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch) that introduces a new isAlien static method that checks for instanceof Map/Map.Entry/Iterable and uses this method to test for "alien collection".

Initial benchmarking results are promising:

;;; "fall-through" benchmark
user=> (let [class-instance java.lang.String character-instance \a var-instance #'hash] (c/bench (clojure.lang.Util/hasheq class-instance)) (c/bench (clojure.lang.Util/hasheq character-instance)) (c/bench (clojure.lang.Util/hasheq var-instance)))
WARNING: Final GC required 1.258979068087473 % of runtime
Evaluation count : 1598432100 in 60 samples of 26640535 calls.
             Execution time mean : 36.358882 ns
    Execution time std-deviation : 0.566925 ns
   Execution time lower quantile : 35.718889 ns ( 2.5%)
   Execution time upper quantile : 37.414722 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1626362460 in 60 samples of 27106041 calls.
             Execution time mean : 35.426993 ns
    Execution time std-deviation : 0.294517 ns
   Execution time lower quantile : 35.047064 ns ( 2.5%)
   Execution time upper quantile : 36.058667 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 1 outliers in 60 samples (1.6667 %)
	low-severe	 1 (1.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
Evaluation count : 1461423180 in 60 samples of 24357053 calls.
             Execution time mean : 39.541873 ns
    Execution time std-deviation : 0.423707 ns
   Execution time lower quantile : 38.943560 ns ( 2.5%)
   Execution time upper quantile : 40.499433 ns (97.5%)
                   Overhead used : 1.823120 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

;;; "three hasheqs" benchmark
user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] #_(assert (= (hash phm) (hash juhm))) (c/bench (unchecked-add (clojure.lang.Util/hasheq phm) (unchecked-add (clojure.lang.Util/hasheq "foo") (clojure.lang.Util/hasheq 123)))))
WARNING: Final GC required 1.5536755331464491 % of runtime
Evaluation count : 820376460 in 60 samples of 13672941 calls.
             Execution time mean : 71.999365 ns
    Execution time std-deviation : 0.746588 ns
   Execution time lower quantile : 70.869739 ns ( 2.5%)
   Execution time upper quantile : 73.565908 ns (97.5%)
                   Overhead used : 1.738155 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Comment by Michał Marczyk [ 13/May/14 8:28 AM ]

Ah, I left out the repeated phm hasheq lookup + hasheq of a java.util.HashMap instance pair of benchmarks from the above – here it is for completeness (no surprises though):

user=> (let [phm (apply hash-map (interleave (range 128) (range 128))) juhm (java.util.HashMap. phm)] (assert (= (hash phm) (hash juhm))) (c/bench (clojure.lang.Util/hasheq phm)) (c/bench (clojure.lang.Util/hasheq juhm)))
WARNING: Final GC required 1.260853406580491 % of runtime
Evaluation count : 5369135760 in 60 samples of 89485596 calls.
             Execution time mean : 10.380464 ns
    Execution time std-deviation : 3.407284 ns
   Execution time lower quantile : 9.510624 ns ( 2.5%)
   Execution time upper quantile : 11.461485 ns (97.5%)
                   Overhead used : 1.566301 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 96.4408 % Variance is severely inflated by outliers
Evaluation count : 3078180 in 60 samples of 51303 calls.
             Execution time mean : 19.717981 µs
    Execution time std-deviation : 209.896848 ns
   Execution time lower quantile : 19.401811 µs ( 2.5%)
   Execution time upper quantile : 20.180163 µs (97.5%)
                   Overhead used : 1.566301 ns

Found 2 outliers in 60 samples (3.3333 %)
	low-severe	 2 (3.3333 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
Comment by Alex Miller [ 13/May/14 9:17 AM ]

Please don't submit any patches that change hashcode for anything other than making Java collections match Clojure collections - any other change is out of scope of this ticket.

In general, I would prefer just the execution time mean report for the moment rather than everything - the full criterium output makes these comments much harder to read and compare.

Comment by Alex Miller [ 13/May/14 9:33 AM ]

Could I get a summary of approaches, and a timing of 1.6.0 vs each patch for a consistent set of tests - say time of hash for Long, PHM, juHM, Class, and the "three hasheqs" test?

Comment by Rich Hickey [ 13/May/14 9:47 AM ]

"Hashing many different types changes the results noticeably – presumably HotSpot backs off from some optimizations after seeing several different types passed in to hasheq?"

Right - if your benchmarks do not treat this site as megamorphic you will get all sorts of distorted results.

Comment by Michał Marczyk [ 14/May/14 3:15 AM ]

Ok, I have what I think is an improved microbenchmark for this: xor of hasheqs for a long, a double, a string, a class, a character and a PHM (single instance, so it'll be a hash lookup). The results are not very encouraging.

Single form including the require to make it convenient to run; also bundled is a j.u.HashMap (128 entries) hasheq benchmark:

(do
  (require '[criterium.core :as c])
  (let [l    41235125123
        d    123.456
        s    "asdf;lkjh"
        k    BigInteger
        c    \S
        phm  (apply hash-map (interleave (range 128) (range 128)))
        juhm (java.util.HashMap. phm)
        f    (fn f []
               (-> (clojure.lang.Util/hasheq l)
                   (bit-xor (clojure.lang.Util/hasheq d))
                   (bit-xor (clojure.lang.Util/hasheq s))
                   (bit-xor (clojure.lang.Util/hasheq k))
                   (bit-xor (clojure.lang.Util/hasheq c))
                   (bit-xor (clojure.lang.Util/hasheq phm))))]
    (c/bench (f))
    (c/bench (hash juhm))))

Mean execution time as reported by Criterium:

version xor (ns) j.u.HM (µs)
unpatched 1.6.0 148.128748 1.701640
0005 patch 272.039667 21.201178
original patch 268.670316 21.169436
-alternative patch 271.747043 20.755397

The substring patch is broken (see below), so I skipped it. The patch I'm describing as the "original" one is attached as 0001-CLJ-1372-consistent-hasheq-for-java.util.-List-Map-M.patch.

Decisions common to all the patches:

1. One extra if statement in hasheq just above the default return with a three-way instanceof check.

2. The types tested for are j.u.Iterable, j.u.Map.Entry and j.u.Map.

3. Murmur3.hashOrdered takes Iterable, so that's why it's on the list. Map does not extend Iterable, so it's listed separately. Map.Entry is on the list, because ultimately the way to hash maps is to iterate over and hash their entries.

4. The actual hashing of the "alien" / host types is done by a separate static method – clojure.lang.Util.doalienhasheq – on the theory that this will permit hasheq to be inlined more aggressively and limit the worst of the perf hit to alien collections.

5. doalienhasheq checks for Map, Map.Entry, Set and List; entries are converted to lists for hashing, maps are hashed through entry sets and lists and sets are passed directly to Murmur3.

6. There is also a default case for other Iterable types – we must return hashCode or the result of composing some other function with hashCode for these, since we use equals to test them for equivalence.

The 0005 patch has hasheq call a separate private static method to perform the three-way type check, whereas the others put the check directly in the actual if test. The -alternative patch and the 0005 patch return hashCode in the default case, whereas the original patch composes Murmur3.hashInt with hashCode.

The substring patch only works for java.util.** classes and so doesn't solve the problem (it wouldn't correctly hash Guava collections, for example).

All of the patches change c.l.Util.hasheq and add one or two new static methods to clojure.lang.Util that act as helpers for hasheq. None of them changes anything else. Murmuring hashCode was a performance experiment that appeared to have a slight positive impact on some of the "fast cases" (in fact it's still the best performer among the current three patches in the microbenchmark presented above, although the margin of victory is of course extremely tiny). Thus I think all the current patches are in fact limited in scope to changes directly relevant to the ticket; the -alternative patch and the 0005 patch certainly are.

Comment by Michał Marczyk [ 14/May/14 3:29 AM ]

For completeness, branching on Map, Set etc. directly in hasheq, as with Jozef's original patch, results in the following timings in the microbenchmark introduced in my previous comment:

xor 315.866626 ns
juhm 18.520133 µs
Comment by Michał Marczyk [ 14/May/14 4:01 AM ]

New patch (0006) that leaves out the Map.Entry check; instead, two methods are introduced in the Murmur3 class to handle j.u.maps.

Java map entries aren't really integrated into Clojure – you can't use them like vectors, can't call seq on them etc. – so I don't think they need to match Clojure map entries in hasheq as long as j.u.maps do.

Timings:

xor 233.341689 ns
juhm 9.104637 µs
Comment by Michał Marczyk [ 14/May/14 4:17 AM ]

Checking for Map/Iterable in-line doesn't seem to affect xor benchmark results very much, but makes juhm hashing quicker. This is rather surprising to me. In any case, here's a new patch (0007) and the timings:

xor 233.062337 ns
juhm 8.629149 µs
Comment by Alex Miller [ 14/May/14 7:17 AM ]

What are equivalent timings without the patch?

Comment by Michał Marczyk [ 14/May/14 7:43 AM ]

They're listed in the table in the comment introducing the benchmark – 148.128748 ns for xor, 1.701640 µs for juhm.

Comment by Alex Miller [ 14/May/14 8:02 AM ]

What if we override hasheq for different types instead of using instanceof?

Comment by Michał Marczyk [ 14/May/14 12:50 PM ]

Overloaded methods are resolved statically, so there's no avoiding testing for type in the Object overload.

A more specific overload could be used to speed up hashing for its parameter type given a type hint or for literals, since the compiler would emit calls to that overload given appropriate compile-time information. There wouldn't be any speed-up in "implicit" hashing during hash map / set ops, however.

Comment by Daniel Compton [ 12/Nov/14 9:19 PM ]

This hit me when upgrading Factual/skuld from 1.5.1 to 1.6. clojure.data.fressian serialises c.l.PersistentHashSet sets into java.util.HashSet. This breaks equality checking in https://github.com/Factual/skuld/blob/b720feb142e6d274e85be208dc1d6d8634801719/test/skuld/net_test.clj#L8-L29 as we are comparing a set of maps where the original set contains a PersistentSet and the serialised and deserialised one contains a HashSet.

Comment by Daniel Compton [ 12/Nov/14 11:54 PM ]

This has come up again for me, details are in http://dev.clojure.org/jira/browse/DFRS-7





[CLJ-1364] Primitive VecSeq does not implement equals or hashing methods Created: 19/Feb/14  Updated: 19/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: collections


 Description   

VecSeq (as produced by (seq (vector-of :int 1 2 3))) does not implements equals, hashCode, or hasheq and does not play with any other Clojure collections or sequences appropriately in this regard.

user=> (def rs (range 3))
user=> (def vs (seq (vector-of :int 0 1 2)))
user=> rs
(0 1 2)
user=> vs
(0 1 2)
user=> (.equals rs vs)
true
user=> (.equals vs rs)    ;; expect: true
false
user=> (.equiv rs vs)
true
user=> (.equiv vs rs)
true
user=> (.hashCode rs)
29824
user=> (.hashCode vs)     ;; expect to match (.hashCode rs)
2081327893
user=> (System/identityHashCode vs)  ;; show that we're just getting Object hashCode
2081327893
user=> (.hasheq rs)
29824
user=> (.hasheq vs)       ;; expect same as (.hasheq rs) but not implemented at all
IllegalArgumentException No matching field found: hasheq for class clojure.core.VecSeq  clojure.lang.Reflector.getInstanceField (Reflector.java:271)





[CLJ-1360] clojure.string/split strips trailing delimiters Created: 18/Feb/14  Updated: 18/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Tim McCormack Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None


 Description   

clojure.string/split and clojure.string/split-lines inherit the bizarre default behavior of java.lang.String#split(String,int) in stripping trailing consecutive delimiters:

(clojure.string/split "banana" #"an")
⇒ ["b" "" "a"]
(clojure.string/split "banana" #"na")
⇒ ["ba"]
(clojure.string/split "nanabanana" #"na")
⇒ ["" "" "ba"]

In the case of split-lines, processing a file line by line and rejoining results in truncation of trailing newlines in the file. In both cases, the behavior is surprising and cannot be inferred from the docstrings.

This behavior should either be fixed or documented.



 Comments   
Comment by Andy Fingerhut [ 18/Feb/14 10:51 AM ]

Probably documenting would be safer than changing the behavior at this point, given that some people may actually rely on the current behavior after testing, deploying, etc.

I don't currently have a suggestion for a modified doc string, but note that there are examples of this behavior and how one can use an extra "-1" limit argument at the end to get all split strings: http://clojuredocs.org/clojure_core/clojure.string/split





[CLJ-1322] doseq with several bindings causes "ClassFormatError: Invalid Method Code length" Created: 10/Jan/14  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Miikka Koskinen Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: None
Environment:

Clojure 1.5.1, java 1.7.0_25, OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)


Attachments: Text File doseq-bench.txt     Text File doseq.patch     File script.clj    
Patch: Code
Approval: Incomplete

 Description   

Important Perf Note the new impl is faster for collections that are custom-reducible but not chunked, and is also faster for large numbers of bindings. The original implementation is hand tuned for chunked collections, and wins for larger chunked coll/smaller binding count scenarios, presumably due to the fn call/return tracking overhead of reduce. Details are in the comments.
Screened By
Patch doseq.patch

user=> (def a1 (range 10))
#'user/a1
user=> (doseq [x1 a1 x2 a1 x3 a1 x4 a1 x5 a1 x6 a1 x7 a1 x8 a1] (do))
CompilerException java.lang.ClassFormatError: Invalid method Code length 69883 in class file user$eval1032, compiling:(NO_SOURCE_PATH:2:1)

While this example is silly, it's a problem we've hit a couple of times. It's pretty surprising when you have just a couple of lines of code and suddenly you get the code length error.



 Comments   
Comment by Kevin Downey [ 18/Apr/14 12:20 AM ]

reproduces with jdk 1.8.0 and clojure 1.6

Comment by Nicola Mometto [ 22/Apr/14 5:35 PM ]

A potential fix for this is to make doseq generate intermediate fns like `for` does instead of expanding all the code directly.

Comment by Ghadi Shayban [ 25/Jun/14 8:39 PM ]

Existing doseq handles chunked-traversal internally, deciding the
mechanics of traversal for a seq. In addition to possibly conflating
concerns, this is causing a code explosion blowup when more bindings are
added, approx 240 bytes of bytecode per binding (without modifiers).

This approach redefs doseq later in core.clj, after protocol-based
reduce (and other modern conveniences like destructuring.)

It supports the existing :let, :while, and :when modifiers.

New is a stronger assertion that modifiers cannot come before binding
expressions. (Same semantics as let, i.e. left to right)

valid: [x coll :when (foo x)]
invalid: [:when (foo x) x coll]

This implementation does not suffer from the code explosion problem.
About 25 bytes of bytecode + 1 fn per binding.

Implementing this without destructuring was not a party, luckily reduce
is defined later in core.

Comment by Andy Fingerhut [ 26/Jun/14 12:25 AM ]

For anyone reviewing this patch, note that there are already many tests for correct functionality of doseq in file test/clojure/test_clojure/for.clj. It may not be immediately obvious, but every test for 'for' defined with deftest-both is a test for 'for' and also for 'doseq'.

Regarding the current implementation of doseq: it in't simply that it is too many bytes per binding, it is that the code size doubles with each additional binding. See these results, which measures size of the macroexpanded form rather than byte code size, but those two things should be fairly linearly related to each other here:

(defn formsize [form]
  (count (with-out-str (print (macroexpand form)))))

user=> (formsize '(doseq [x (range 10)] (print x)))
652
user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y)))
1960
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z)))
4584
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w)))
9947
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p)))
20997

Here are results for the same expressions after Ghadi's patch doseq.patch dated June 25 2014:

user=> (formsize '(doseq [x (range 10)] (print x)))
93
user=> (formsize '(doseq [x (range 10) y (range 10)] (print x y)))
170
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10)] (print x y z)))
247
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10)] (print x y z w)))
324
user=> (formsize '(doseq [x (range 10) y (range 10) z (range 10) w (range 10) p (range 10)] (print x y z w p)))
401

It would be good to see some performance results with and without this patch, too.

Comment by Stuart Halloway [ 28/Jun/14 2:21 PM ]

In the tests below, the new impl is called "doseq2", vs. the original impl "doseq"

(def hund (into [] (range 100)))
(def ten (into [] (range 10)))
(def arr (int-array 100))
(def s "superduper")

;; big seq, few bindings: doseq2 LOSES
(dotimes [_ 5]
  (time (doseq [a (range 100000000)])))
;; 1.2 sec

(dotimes [_ 5]
  (time (doseq2 [a (range 100000000)])))
;; 1.8 sec

;; small unchunked reducible, few bindings: doseq2 wins
(dotimes [_ 5]
  (time (doseq [a s b s c s])))
;; 0.5 sec

(dotimes [_ 5]
  (time (doseq2 [a s b s c s])))
;; 0.2 sec

(dotimes [_ 5]
  (time (doseq [a arr b arr c arr])))
;; 40 msec

(dotimes [_ 5]
  (time (doseq2 [a arr b arr c arr])))
;; 8 msec

;; small chunked reducible, few bindings: doseq2 LOSES
(dotimes [_ 5]
  (time (doseq [a hund b hund c hund])))
;; 2 msec

(dotimes [_ 5]
  (time (doseq2 [a hund b hund c hund])))
;; 8 msec

;; more bindings: doseq2 wins bigger and bigger
(dotimes [_ 5]
  (time (doseq [a ten b ten c ten d ten ])))
;; 2 msec

(dotimes [_ 5]
  (time (doseq2 [a ten b ten c ten d ten ])))
;; 0.4 msec

(dotimes [_ 5]
  (time (doseq [a ten b ten c ten d ten e ten])))
;; 18 msec

(dotimes [_ 5]
  (time (doseq2 [a ten b ten c ten d ten e ten])))
;; 1 msec
Comment by Ghadi Shayban [ 28/Jun/14 6:23 PM ]

Hmm, I cannot reproduce your results.

I'm not sure whether you are testing with lein, on what platform, what jvm opts.

Can we test using this little harness instead directly against clojure.jar? I've attached a the harness and two runs of results (one w/ default heap, the other 3GB w/ G1GC)

I added a medium and small (range) too.

Anecdotally, I see doseq2 outperform in all cases except the small range. Using criterium shows a wider performance gap favoring doseq2.

I pasted the results side by side for easier viewing.

core/doseq                          doseq2
"Elapsed time: 1610.865146 msecs"   "Elapsed time: 2315.427573 msecs"
"Elapsed time: 2561.079069 msecs"   "Elapsed time: 2232.479584 msecs"
"Elapsed time: 2446.674237 msecs"   "Elapsed time: 2234.556301 msecs"
"Elapsed time: 2443.129809 msecs"   "Elapsed time: 2224.302855 msecs"
"Elapsed time: 2456.406103 msecs"   "Elapsed time: 2210.383112 msecs"

;; med range, few bindings:
core/doseq                          doseq2
"Elapsed time: 28.383197 msecs"     "Elapsed time: 31.676448 msecs"
"Elapsed time: 13.908323 msecs"     "Elapsed time: 11.136818 msecs"
"Elapsed time: 18.956345 msecs"     "Elapsed time: 11.137122 msecs"
"Elapsed time: 12.367901 msecs"     "Elapsed time: 11.049121 msecs"
"Elapsed time: 13.449006 msecs"     "Elapsed time: 11.141385 msecs"

;; small range, few bindings:
core/doseq                          doseq2
"Elapsed time: 0.386334 msecs"      "Elapsed time: 0.372388 msecs"
"Elapsed time: 0.10521 msecs"       "Elapsed time: 0.203328 msecs"
"Elapsed time: 0.083378 msecs"      "Elapsed time: 0.179116 msecs"
"Elapsed time: 0.097281 msecs"      "Elapsed time: 0.150563 msecs"
"Elapsed time: 0.095649 msecs"      "Elapsed time: 0.167609 msecs"

;; small unchunked reducible, few bindings:
core/doseq                          doseq2
"Elapsed time: 2.351466 msecs"      "Elapsed time: 2.749858 msecs"
"Elapsed time: 0.755616 msecs"      "Elapsed time: 0.80578 msecs"
"Elapsed time: 0.664072 msecs"      "Elapsed time: 0.661074 msecs"
"Elapsed time: 0.549186 msecs"      "Elapsed time: 0.712239 msecs"
"Elapsed time: 0.551442 msecs"      "Elapsed time: 0.518207 msecs"

core/doseq                          doseq2
"Elapsed time: 95.237101 msecs"     "Elapsed time: 55.3067 msecs"
"Elapsed time: 41.030972 msecs"     "Elapsed time: 30.817747 msecs"
"Elapsed time: 42.107288 msecs"     "Elapsed time: 19.535747 msecs"
"Elapsed time: 41.088291 msecs"     "Elapsed time: 4.099174 msecs"
"Elapsed time: 41.03616 msecs"      "Elapsed time: 4.084832 msecs"

;; small chunked reducible, few bindings:
core/doseq                          doseq2
"Elapsed time: 31.793603 msecs"     "Elapsed time: 40.082492 msecs"
"Elapsed time: 17.302798 msecs"     "Elapsed time: 28.286991 msecs"
"Elapsed time: 17.212189 msecs"     "Elapsed time: 14.897374 msecs"
"Elapsed time: 17.266534 msecs"     "Elapsed time: 10.248547 msecs"
"Elapsed time: 17.227381 msecs"     "Elapsed time: 10.022326 msecs"

;; more bindings:
core/doseq                          doseq2
"Elapsed time: 4.418727 msecs"      "Elapsed time: 2.685198 msecs"
"Elapsed time: 2.421063 msecs"      "Elapsed time: 2.384134 msecs"
"Elapsed time: 2.210393 msecs"      "Elapsed time: 2.341696 msecs"
"Elapsed time: 2.450744 msecs"      "Elapsed time: 2.339638 msecs"
"Elapsed time: 2.223919 msecs"      "Elapsed time: 2.372942 msecs"

core/doseq                          doseq2
"Elapsed time: 28.869393 msecs"     "Elapsed time: 2.997713 msecs"
"Elapsed time: 22.414038 msecs"     "Elapsed time: 1.807955 msecs"
"Elapsed time: 21.913959 msecs"     "Elapsed time: 1.870567 msecs"
"Elapsed time: 22.357315 msecs"     "Elapsed time: 1.904163 msecs"
"Elapsed time: 21.138915 msecs"     "Elapsed time: 1.694175 msecs"
Comment by Ghadi Shayban [ 28/Jun/14 6:47 PM ]

It's good that the benchmarks contain empty doseq bodies in order to isolate the overhead of traversal. However, that represents 0% of actual real-world code.

At least for the first benchmark (large chunked seq), adding in some tiny amount of work did not change results signifantly. Neither for (map str [a])

(range 10000000) =>  (map str [a])
core/doseq
"Elapsed time: 586.822389 msecs"
"Elapsed time: 563.640203 msecs"
"Elapsed time: 369.922975 msecs"
"Elapsed time: 366.164601 msecs"
"Elapsed time: 373.27327 msecs"
doseq2
"Elapsed time: 419.704021 msecs"
"Elapsed time: 371.065783 msecs"
"Elapsed time: 358.779231 msecs"
"Elapsed time: 363.874448 msecs"
"Elapsed time: 368.059586 msecs"

nor for intrisics like (inc a)

(range 10000000)
core/doseq
"Elapsed time: 317.091849 msecs"
"Elapsed time: 272.360988 msecs"
"Elapsed time: 215.501737 msecs"
"Elapsed time: 206.639181 msecs"
"Elapsed time: 206.883343 msecs"
doseq2
"Elapsed time: 241.475974 msecs"
"Elapsed time: 193.154832 msecs"
"Elapsed time: 198.757873 msecs"
"Elapsed time: 197.803042 msecs"
"Elapsed time: 200.603786 msecs"

I still see reduce-based doseq ahead of the original, except for small seqs

Comment by Ghadi Shayban [ 04/Aug/14 2:55 PM ]

A form like the following will not work with this patch:

(go (doseq [c chs] (>! c :foo)))

as the go macro doesn't traverse fn boundaries. The only such code I know is core.async/mapcat*, a private fn supporting a fn that is marked deprecated.

Comment by Ghadi Shayban [ 07/Aug/14 2:09 PM ]

I see #'clojure.core/run! was just added, which has a similar limitation

Comment by Rich Hickey [ 29/Aug/14 8:19 AM ]

Please consider Ghadi's feedback, esp re: closures.

Comment by Ghadi Shayban [ 22/Sep/14 4:36 PM ]

The current expansion of a doseq [1] under a go form is less than ideal due to the amount of control flow. 14 states in the state machine vs. 7 with loop/recur

[1] Comparison of macroexpansion of (go ... doseq) vs (go ... loop/recur)
https://gist.github.com/ghadishayban/639009900ce1933256a1





[CLJ-1295] Speed up dissoc on array-maps Created: 15/Nov/13  Updated: 08/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: performance

Attachments: File clj-1295-1.diff    
Patch: Code
Approval: Triaged

 Description   

In latest Clojure master as of Nov 15 2013, the method without() in PersistentArrayMap.java first searches for a matching key using indexOf(key) and saves the result in i.

If a matching key was found, the code then copies the old array to the new smaller one, but unnecessarily repeats the comparison of every key in the map to the key being removed, even though its location is already stored in i.



 Comments   
Comment by Andy Fingerhut [ 15/Nov/13 7:05 PM ]

The patch clj-1295-1.diff changes PersistentArrayMap's without() to use System.arraycopy to copy only the necessary parts from the current array to newArray, similar to PersistentHashMap's method removePair().

Benchmark 1 has strings for keys, which are relatively slow to compare to each other.

(def m1 (array-map "abcdef" 1 "abcdeg" 2 "abcdeh" 3 "abcdei" 4))
(time (dotimes [i 100000000] (dissoc m1 "abcdei")))

1.6.0-alpha2 with no changes:
"Elapsed time: 29663.443 msecs"
"Elapsed time: 29490.225 msecs"
"Elapsed time: 29600.138 msecs"
"Elapsed time: 29627.948 msecs"

1.6.0-alpha2 with patch clj-1295-1.diff:
"Elapsed time: 6362.006 msecs"
"Elapsed time: 6121.006 msecs"
"Elapsed time: 6163.377 msecs"
"Elapsed time: 6155.299 msecs"
"Elapsed time: 6395.224 msecs"

Averages about 21% of the run time before the change.

Benchmark 2 has keywords for keys, which are compared via Java ==, so as fast as comparison can get.

(def m2 (array-map :abcdef 1 :abcdeg 2 :abcdeh 3 :abcdei 4))
(time (dotimes [i 100000000] (dissoc m2 :abcdei)))

1.6.0-alpha2 with no changes:
"Elapsed time: 5033.863 msecs"
"Elapsed time: 5028.327 msecs"
"Elapsed time: 5045.019 msecs"
"Elapsed time: 5004.751 msecs"
"Elapsed time: 5039.143 msecs"

1.6.0-alpha2 with patch clj-1295-1.diff:
"Elapsed time: 2874.748 msecs"
"Elapsed time: 2862.878 msecs"
"Elapsed time: 2887.778 msecs"
"Elapsed time: 2874.196 msecs"
"Elapsed time: 2861.807 msecs"

Averages about 57% of the run time before the change.

Comment by Ghadi Shayban [ 08/Dec/14 9:47 AM ]

A nice boost, but probably obsoleted by CLJ-1517

Comment by Andy Fingerhut [ 08/Dec/14 11:10 AM ]

Always happy to be obsoleted by something even better

Comment by Ghadi Shayban [ 08/Dec/14 11:43 AM ]

I dunno, seems like CLJ-1517 phase 1 is just vectors not maps, so this is still easy low-hanging fruit.

Comment by Alex Miller [ 08/Dec/14 1:07 PM ]

There will be a second ticket for maps, both will be targeted for 1.8.





[CLJ-1289] aset-* and aget perform poorly on multi-dimensional arrays even with type hints. Created: 01/Nov/13  Updated: 14/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Michael O. Church Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: arrays, performance
Environment:

Clojure 1.5.1.

Dependencies: criterium


Attachments: Text File CLJ-1289-p1.patch    
Patch: Code
Approval: Triaged

 Description   

Here's a transcript of the behavior. I don't know for sure that reflection is being done, but the performance penalty (about 1300x) suggests it.

user=> (use 'criterium.core)
nil
user=> (def b (make-array Double/TYPE 1000 1000))
#'user/b
user=> (quick-bench (aget ^"[[D" b 304 175))
WARNING: Final GC required 3.5198021166354323 % of runtime
WARNING: Final GC required 29.172288684474303 % of runtime
Evaluation count : 63558 in 6 samples of 10593 calls.
             Execution time mean : 9.457308 µs
    Execution time std-deviation : 126.220954 ns
   Execution time lower quantile : 9.344450 µs ( 2.5%)
   Execution time upper quantile : 9.629202 µs (97.5%)
                   Overhead used : 2.477107 ns

One workaround is to use multiple agets.

user=> (quick-bench (aget ^"[D" (aget ^"[[D" b 304) 175))
WARNING: Final GC required 40.59820310542545 % of runtime
Evaluation count : 62135436 in 6 samples of 10355906 calls.
             Execution time mean : 6.999273 ns
    Execution time std-deviation : 0.112703 ns
   Execution time lower quantile : 6.817782 ns ( 2.5%)
   Execution time upper quantile : 7.113845 ns (97.5%)
                   Overhead used : 2.477107 ns

Cause: The inlined version only applies to arity 2, and otherwise it reflects.



 Comments   
Comment by Gary Fredericks [ 08/Dec/13 9:28 PM ]

A glance at the source makes it obvious that the hypothesis is correct – the inlined version only applies to arity 2, and otherwise it reflects.

I thought this would be as simple as converting the inline function to be variadic (using reduce), but after trying it I realized this is tricky as you have to generate the correct type hints for each step. E.g., given ^"[[D" the inline function needs to type-hint the intermediate result with ^"[D". This isn't difficult if we're just dealing with strings that begin with square brackets, but I don't know for sure that those are the only possibilities.

Comment by Yaron Peleg [ 13/Feb/14 4:44 AM ]

Bump. I just got bitten bad by this.

There are two seperate issues here:
1) (aget 2d-array-doubles 0 0 ) doesn't emit a reflection warning.
2) It seems like the compiler has enough information to avoid the reflective call here.

Note this gets exp. worse as number of dimensions grows, i.e (get doubles3d 0 0 0)
will be 1M slower, etc' Not true, unless you iterate over all elements. it's
simply n_dims*1000x per lookup.

Nasty surprise, especially considering you often go to primitive arrays for speed,
and a common use case is an inner loop(s) that iterate over arrays.

Comment by Gary Fredericks [ 13/Feb/14 7:08 AM ]

I can probably take a stab at this.

Comment by Gary Fredericks [ 13/Feb/14 8:34 PM ]

I think the reflection warning problem is pretty much impossible to solve without changing code elsewhere in the compiler, because the reflection done in aget is a different kind than normal clojure reflection – it's explicitly in the function body rather than emitted by the compiler. Since the compiler isn't emitting it, it doesn't reasonably know it's even there. So even if aget is fixed for other arities, you still won't get the warning when it's not inlined.

I can imagine some sort of metadata that you could put on a function telling the compiler that it will reflect if not inlined. Or maybe a more generic not-inlined warning?

The global scope of adding another compiler flag seems about balanced by the seriousness of array functions not being able to warn on reflection.

Comment by Gary Fredericks [ 13/Feb/14 8:52 PM ]

Attached CLJ-1289-p1.patch which simply inlines variadic calls to aget. It assumes that if it sees a :tag on the array arg that is a string beginning with [, it can assume that the return value from one call to aget can be tagged with the same string with the leading [ stripped off.

I'm not a jvm expert, but having read through the spec a little bit I think this is a reasonable assumption.

Comment by Alex Miller [ 14/Feb/14 3:34 PM ]

I think this probably is actually true, but a more official way to ask that question would be to get the array class and ask for Class.getComponentType() (and less janky than string munging).

Comment by Gary Fredericks [ 14/Feb/14 3:40 PM ]

How would you get the array class based on the :tag type hint?

Comment by Gary Fredericks [ 14/Feb/14 7:05 PM ]

I see (-> s (Class/forName) (.getComponentType) (.getName)) does the same thing – is that route preferred, or is there another one?





[CLJ-1282] The quote special form should throw an exception if passed more than one form to quote Created: 23/Oct/13  Updated: 10/Sep/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Howard Lewis Ship Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: errormsgs, reader

Attachments: Text File CLJ-1282-p1.patch     Text File CLJ-1282-p2.patch    
Patch: Code and Test
Approval: Triaged

 Description   

Quote currently ignores all but the first argument. In the case of being called accidentally with multiple values, it should throw an exception specifying the error.

user> (quote 1 2 3)
1

------- Original: --------

Every once in a while, you can just go down the rabbit hole.

I had an errant expression in my code:

(-> message get-message-values 'DESTINATION_MERCHANT_ID)

One would think this would work; it certainly would if the key was a keyword and not a symbol.

One would expect this to expand to:

('DESTINATION_MERCHANT_ID (get-message-values message))

however, the reader is involved, so it is as if the source were:

(-> message get-message-values (quote DESTINATION_MERCHANT_ID))

which expands to:

(quote (-> message get-message-values) DESTINATION_MERCHANT_ID))

... hilarity ensues! Because quote currently ignores extra parameters, my code gets the quoted value '(clojure.core/-> message get-message-values) rather than the expected string from the map; this shifts us from the "there's a bug in my code" to "the nature of reality is broken".

The correct expression is:

(-> message get-message-values (get 'DESTINATION_MERCHANT_ID))

This took quite a while to track down; if the

special form checked that it was passed exactly one form to quote and threw an exception otherwise, I think I would have caught this much earlier. It could even identify the expression it is quoting, which would provide a lot better understanding of where I went wrong.



 Comments   
Comment by Howard Lewis Ship [ 23/Oct/13 2:02 PM ]

Sorry, can't edit the description now to correct the formatting errors.

Comment by Howard Lewis Ship [ 24/Oct/13 6:09 PM ]

I just wanted to point out that your description is shorter, but makes it appear that such a use is unlikely and therefore unimportant; the detail of my description is to point out a reasonable situation where something explicable, but completely counterintuitive and confusing, does occur.

Comment by Alex Miller [ 24/Oct/13 8:28 PM ]

That's why I left the original in there too.

Comment by Gary Fredericks [ 09/Dec/13 7:07 AM ]

(quote) currently returns nil. Do we have an opinion about that?

Comment by Gary Fredericks [ 09/Dec/13 9:32 AM ]

Attached p1, which throws an IllegalArgumentException (wrapped in a CompilerException of course) for anything but 1 arg, and includes the number of args that were passed.

I can't think of any reason why (quote) would be useful, so I decided to throw on that too. Very easy to change of course.

Also added a test that (eval '(quote 1 2 3)) throws.

Comment by Stuart Halloway [ 31/Jan/14 12:46 PM ]

I recommend the following changes:

  • throw an ex-info that includes the offending form in its map {:form ...}
  • check only for the map data, not exception type or message, in the tests
Comment by Andy Fingerhut [ 31/Jan/14 6:31 PM ]

Patch CLJ-1282-p1.patch no longer applies cleanly after commits made to Clojure master on Jan 31 2014, probably due to the patch committed for CLJ-1318, and probably only because some lines of context changed in the test file. That would be trivial to update, but Stu's comments above suggest that more significant changes need to be made.

Comment by Gary Fredericks [ 01/Feb/14 9:19 AM ]

Throwing an ex-info is easy enough. I don't know how to avoid at least incidentally checking for the exception type, since the ExceptionInfo is wrapped in a CompilerException. I'll make a patch that keeps the class name in the test but doesn't do any checks on the cause aside from the ex-data. Let me know if I should do anything different.

Comment by Gary Fredericks [ 01/Feb/14 9:58 AM ]

Attached CLJ-1282-p2.patch which is off of the current master and addresses Stu's points.

Comment by Alex Miller [ 04/Feb/14 11:23 PM ]

Moving back to Triaged for more looks.

Comment by Nicola Mometto [ 16/Feb/14 12:12 PM ]

Currently (quote) returns nil, is it intended that this patch makes that an error or was this by accident?

Comment by Gary Fredericks [ 16/Feb/14 12:23 PM ]

I consciously chose to make (quote) an error – I made a comment about that earlier and didn't get any feedback, so I unilaterally decided to make it an error due to the fact that I couldn't think of any possible use for (quote).

It's an easy switch if somebody thinks differently.

Comment by Nicola Mometto [ 16/Feb/14 1:13 PM ]

I'm sorry I did not notice your previois comment.
I'm asking because I need to know whether I should throw on (quote) for tools.analyzer, currently it is allowed but I too think that (quote) should be an error.





[CLJ-1280] Create reusable exception that can carry file/line/col info Created: 18/Oct/13  Updated: 18/Apr/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Alex Miller Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None


 Description   

This concept already exists in multiple places in Clojure - Compiler$CompilerException and the Exception classes buried in EdnReader and LispReader. It would also be useful in other places where IllegalArgument or other other exceptions are thrown.

For example, this protocol exception throws an IllegalArgumentException and could transmit the file, line, and column info at the location of the error but it seems weird to use any of the existing exceptions for this purpose.

(defprotocol Bar (m [this]) (m [this arg]))


 Comments   
Comment by Kevin Downey [ 18/Apr/14 12:09 AM ]

seems like ExceptionInfo can do this





[CLJ-1279] Fix confusing macroexpand1 ArityException handling Created: 16/Oct/13  Updated: 05/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Alex Coventry Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: Compiler, errormsgs, macro

Attachments: Text File 0001-Edit-macro-ArityException-in-AFn.patch     Text File 0001-Fix-macroexpand1-s-handling-of-ArityException.patch    
Patch: Code and Test

 Description   

macros can give very confusing error messages when they execute a form which generates an ArityException. clojure.lang.Compiler.macroexpand1 assumes that any ArityException comes from the call to the macro itself, which need not be the case. For instance:

user> (do (defmacro f [] (assoc)) (f))
ArityException Wrong number of args (-2) passed to: core$assoc clojure.lang.Compiler.macroexpand1 (Compiler.java:6488)
user> (use 'clojure.repl) (pst)
nil
ArityException Wrong number of args (-2) passed to: core$assoc
clojure.lang.Compiler.macroexpand1 (Compiler.java:6488)
clojure.lang.Compiler.macroexpand (Compiler.java:6544)
clojure.lang.Compiler.eval (Compiler.java:6618)
clojure.lang.Compiler.eval (Compiler.java:6624)
clojure.lang.Compiler.eval (Compiler.java:6597)
clojure.core/eval (core.clj:2864)
clojure.main/repl/read-eval-print-6596/fn-6599 (main.clj:260)
clojure.main/repl/read-eval-print--6596 (main.clj:260)
clojure.main/repl/fn--6605 (main.clj:278)
clojure.main/repl (main.clj:278)
clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn--1251 (interruptible_eval.clj:56)
clojure.core/apply (core.clj:617)
nil

Easy enough to see the source of the problem in this case, but because both the number of arguments actually passed is off by two, and the stacktrace element for the call to assoc has been dropped, this shortcut by macroexpand1 can get super confusing.

The attached patch corrects this behavior. E.g.

user=> (do (defmacro f [] (assoc)) (f))
ArityException Wrong number of args (0) passed to: core$assoc clojure.lang.AFn.throwArity (AFn.java:437)
user=> (use 'clojure.repl) (pst)
nil
ArityException Wrong number of args (0) passed to: core$assoc
user/f (NO_SOURCE_FILE:1)
clojure.lang.Var.invoke (Var.java:419)
clojure.lang.Var.applyTo (Var.java:532)
clojure.lang.Compiler.macroexpand1 (Compiler.java:6507)
clojure.lang.Compiler.macroexpand (Compiler.java:6580)
clojure.lang.Compiler.eval (Compiler.java:6654)
clojure.lang.Compiler.eval (Compiler.java:6660)
clojure.lang.Compiler.eval (Compiler.java:6633)
clojure.core/eval (core.clj:2864)
clojure.main/repl/read-eval-print-6594/fn-6597 (main.clj:260)
clojure.main/repl/read-eval-print--6594 (main.clj:260)
clojure.main/repl/fn--6603 (main.clj:278)
nil



 Comments   
Comment by Alex Coventry [ 17/Oct/13 11:01 AM ]

Patch with test

Comment by Alex Coventry [ 23/Oct/13 11:42 PM ]

Amended patch to deal more gracefully with unexpected stack trace structure.

Comment by Alex Miller [ 24/Oct/13 12:09 AM ]

Also see CLJ-397 and CLJ-383.

Comment by Alex Coventry [ 24/Oct/13 2:46 PM ]

Thanks, Alex. It would be easy enough to move most of the logic into ArityException, which would be a compromise between Stu's[1] options 1 and 2. Is that worth doing?

Amending clojure.lang.AFn.throwArity to check whether "this" is a macro and adjust the arg count there accordingly might be the simplest way. I can see why Rich prefers all the logic to go into ArityException, but since ArityExceptions are used for things other than macros, I don't see a way to make an honest error message there without groveling the stack trace.

[1] http://dev.clojure.org/jira/browse/CLJ-397?focusedCommentId=24090&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24090

Comment by Alex Miller [ 24/Oct/13 2:57 PM ]

I would have to take more time than I have to make an informed opinion but I can say that from a general point of view inspecting StackTraceElements does not seem like the right solution to (almost) any problem.

Comment by Alex Coventry [ 24/Oct/13 10:26 PM ]

This patch causes Var.setMacro to set instance attribute AFn.macrop to true, so that AFn.throwArity can reduce the number of arguments reported.

I'm not used to negotiating java class hierarchies, so it's possible there's a cleaner way. Since Var.fn() returns an IFn, I added macrop handling methods IFn.setMacro and IFn.isMacro. These then needed to be implemented in Ref and Keyword, as well as AFn (where I wanted them) because they implement the IFn interface but don't inherit from AFn.

The real drawback I see with this approach is the duplicated state, though: ^{:macro true} vs AFn.macrop==true.

Comment by Andy Fingerhut [ 25/Oct/13 6:33 PM ]

I have not investigated the reason yet, but neither patch applies cleanly after the latest commits to Clojure master on Oct 25 2013. Given that what kinds of solution methods would be acceptable for this issue, it sounds like more thinking and code changes are probably needed anyway before worrying too much about that.





[CLJ-1277] Speed up printing of time instants by adding type hints Created: 10/Oct/13  Updated: 03/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: ft, performance

Attachments: Text File clj-1277-1.txt    
Patch: Code
Approval: Triaged

 Description   

There are several occurrences of reflection in instant.clj that slow down the printing of time instants.

Clojure Google group conversation link: https://mail.google.com/mail/u/0/?shva=1#label/clojure/1419e1e6f6cc5b3d

The addition of a few type hints is enough to speed the printing of time instants by a factor of about 3 to 4.5, in a few small benchmarks.

Patch: clj-1277-1.txt
Screened by: Alex Miller



 Comments   
Comment by Andy Fingerhut [ 10/Oct/13 12:07 AM ]

Patch clj-1277-1.txt adds 4 type hints that eliminate all reflection occurrences in source file instant.clj. Benchmarks show that it speeds up printing of java.util.Date and java.sql.Timestamp objects by a factor of about 3 to 4.5.

Latest Clojure master as of Oct 9 2013:

user=> (time (let [d (java.util.Date.)] (dotimes [i 3000000] (pr-str d))))
"Elapsed time: 24094.282 msecs"
user=> (import 'java.sql.Timestamp)
user=> (time (let [d (java.sql.Timestamp. 1300000000000)] (dotimes [i 2000000] (pr-str d))))
"Elapsed time: 20856.957 msecs"

That version of Clojure plus the patch clj-1277-1.txt:

user=> (time (let [d (java.util.Date.)] (dotimes [i 3000000] (pr-str d))))
"Elapsed time: 5085.847 msecs"
user=> (time (let [d (java.sql.Timestamp. 1300000000000)] (dotimes [i 2000000] (pr-str d))))
"Elapsed time: 7582.233 msecs"

Comment by Alexander Kiel [ 10/Oct/13 4:54 AM ]

Thanks for the patch Andy. But I don't like the (set! warn-on-reflection true). I think its better to use it only in the dev profile in leiningen. Not in real production code.

Comment by Andy Fingerhut [ 10/Oct/13 4:06 PM ]

Alexander, Leiningen is not used for building Clojure itself. The two supported choices are Maven and ant. Several Clojure source files, e.g. core/protocols.clj and core/reducers.clj, set warn-on-reflection to true, I believe so that if code is changed in such a way as to introduce a warning, it will be caught more quickly.

If the screeners or Rich think it is inappropriate, it is easy enough to remove.

Comment by Alex Miller [ 10/Oct/13 4:48 PM ]

Setting that is not uncommon in core clojure code and seems fine to me here.





[CLJ-1272] Agent thread executors do not use the global uncaught exception handler Created: 01/Oct/13  Updated: 05/Feb/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: David Greenberg Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: agents


 Description   

If you use Thread.setDefaultUncaughtExceptionHandler to catch all application exceptions, and then throw an exception in a future, that exception will get swallowed up in deployment environments that don't watch stdout. It seems that the agent's executors ought to delegate to the global handler.

This issue bit us, in that we deploy and monitor our system only through its logs and metrics, and never actually saw that exceptions were being thrown in futures.






[CLJ-1263] Allow static compilation of function invocations Created: 14/Sep/13  Updated: 07/Nov/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: compiler


 Description   

This proposal is to allow metadata on functions to prevent a fully dynamic var deref to be used whenever the function is called.

When the function is invoked, JVM "invokevirtual" instruction will be used, which is faster than the current implementation (var deref + IFn cast + invokinterface) and has less restrictions (no need to predefine interfaces to match the function parameters). The JVM is generally able to compile such invokevirtual instructions into extremely efficient code - effectively as fast as pure Java.

This is intended to pave the way to better support for statically compiled, high performance code. In particular, it allow:

  • Supporting arbitrary JVM primitives (float, int, byte, char etc.) as well as just double/long.
  • Supporting typed return values e.g. "String". This could eliminate many casts and type checks.
  • Supporting typed reference arguments (e.g. String).

Suggested usage:

(defn ^:static foo ^int [^String a ^String b]
(+ (count a) (Integer/parseInt b)))

Existing code / semantics should not be affected



 Comments   
Comment by Alex Fowler [ 18/Sep/13 5:08 AM ]

Very nice! That is what would really improve experience with certain tasks. I think it will also make possible to work with primitive arrays without the conversions?

Comment by Mike Anderson [ 19/Sep/13 5:44 PM ]

Hi Alex - which aspect of "work with primitive arrays" are you referring to? This feature would certainly help with passing primitive arguments to/from functions that use primitive arrays. It would also potentially help avoid some casts of primitive array arguments themselves. I don't think it helps in any other way - perhaps a separate issue would be appropriate if there is another thing you are trying to do?

Comment by Kevin Downey [ 29/Oct/13 11:50 AM ]

this issue is confusing, because there was/is a :static feature in clojure(which seems to be disabled in the compiler at the moment) and this proposal doesn't mention the existing work at all.

I also think this proposal is begging the question, there is no discussion of other possible solutions to the performance problem (whatever that is) that this is trying to solve.

the (var.deref()(IFn)).invoke(...) is pretty fundamental to the feel of clojure, in fact the existing :static keyword seems to be disabled in the compiler exactly because it complicates those semantics. so we should have a very clear proposal (not a wishlist) if we want to change that with some very clear wins.

maybe an optimizing clojure compiler would be a better approach.

Comment by Mike Anderson [ 30/Oct/13 11:01 PM ]

Hi Kevin,

This is partly in response to this discussion on Clojure-Dev, where we discussed there are quite a lot of performance issues around the way that Clojure passes arguments currently:
https://groups.google.com/d/topic/clojure/H5P25eYKBj4/discussion

Also I believe it reinstates the original intention of "^:static": I can't find where this is/was officially documented, but Arthur's answer in this SO question suggests that this was the case:
http://stackoverflow.com/questions/7552632/what-does-static-do-in-clojure

I think the proposal is relatively clear: it's probably the minimal change required to get static/direct (i.e. not via an indirect var reference / IFn) function invocations without affecting any of the semantics of current code.

This is sufficiently important for me that it's preventing me from shifting some performance-critical code to Clojure (even with primitive type hints). e.g. here's a simple case with a small primitive function:

(defn ^long foo [^long x]
(inc x))

(c/quick-bench (dotimes [i 100] (foo i))) ;; c = criterium
=> Execution time mean : 194.334293 ns

(c/quick-bench (dotimes [i 100] (inc i)))
=> Execution time mean : 71.539048 ns

i.e. the indirect function invocation is costing us nearly 170% overhead. In Java the equivalent functions perform identically: the overhead is zero because with static function invocation the JVM JIT is able to eliminate all the function call overhead.

In the long term, I agree that a proper optimising compiler would be the best way forward (perhaps Clojure 2.0/CinC can give us this?) but in the meantime I think this is a pragmatic way to improve performance with minimal impact on existing code. Even with an optimising compiler, I think we' would need some way to specifiy the "optimised" semantics rather than the indirect var deref behaviour, and "^:static" seems like a reasonable way to do so (unless anyone has a better idea?)

Comment by Kevin Downey [ 04/Nov/13 3:58 PM ]

have you looked at the definition of int and how it uses :inline/definline to avoid the call overhead?

Comment by Mike Anderson [ 05/Nov/13 4:27 AM ]

Good point Kevin - :inline and definline seem like a good approach in many cases (although it's marked as "experimental" - does that mean we can't rely on it to work in future releases?).

This proposal is still somewhat different: the inline solutions and its variants are effectively doing macro expansion to generate code without a function call on the Clojure side. The approach in this proposal would still emit a function call in bytecode, but do so in a way that the JVM can subsequently inline and optimise much more efficiently. Both have their uses, I think?

Commented edited Nov 7 2013 by Andy Fingerhut: Regarding definline marked as experimental, it has been so marked since Clojure 1.0's release, and the plan is to keep it marked that way in the pending Clojure 1.6 release. See discussion thread on CLJ-1281. No plans to remove it that I am aware of.

Comment by Kevin Downey [ 06/Nov/13 2:06 PM ]

my point is your benchmark above is not a comparison of clojure's current deref + cast + invoke vs. invokevirtual, inc is being inlined in to a static method call there

Comment by Kevin Downey [ 06/Nov/13 2:32 PM ]

I've been noodling around this, and it is entirely possible to generate and invoke code in clojure right now without paying the extra deref() cost:

 (defn ^long fib [^long n]
   (case n
     0 0
     1 1
     (+ (fib (dec n))
        (fib  (dec (dec n))))))

can be written as

(declare TheR1798)

(definterface I1797
  (^long fib_Invoke_1 [^long n]))

(deftype R1798 []
  I1797
  (^long fib_Invoke_1
    [this1799 ^long n]
    (case n
      0 0
      1 1
      (+ (.fib_Invoke_1 this1799 (dec n))
         (.fib_Invoke_1 this1799 (dec (dec n)))))))

(def TheR1798 (new R1798))

(defn ^long fib [^long n]
  (.fib_Invoke_1 TheR1798  n)))

now the recursive calls are invokeinterfaces, and the resulting function seems to have mean execution time about 5 times smaller using criterium to bench mark

it is entirely possible to write a macro that translates one in to other, and the weird names in the above are because I have a little proof of concept that does that.

the body of the bytecode for the regular fib function first shown looks something like:

  public final java.lang.Object invokePrim(long);
    Code:
       0: lload_1       
       1: lstore_3      
       2: lload_3       
       3: l2i           
       4: tableswitch   { // 0 to 1
                     0: 28
                     1: 40
               default: 52
          }
      28: lconst_0      
      29: lload_3       
      30: lcmp          
      31: ifne          52
      34: getstatic     #33                 // Field const__1:Ljava/lang/Object;
      37: goto          94
      40: lconst_1      
      41: lload_3       
      42: lcmp          
      43: ifne          52
      46: getstatic     #37                 // Field const__3:Ljava/lang/Object;
      49: goto          94
      52: getstatic     #57                 // Field const__5:Lclojure/lang/Var;
      55: invokevirtual #70                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
      58: checkcast     #6                  // class clojure/lang/IFn$LO
      61: lload_1       
      62: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      65: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      70: getstatic     #57                 // Field const__5:Lclojure/lang/Var;
      73: invokevirtual #70                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
      76: checkcast     #6                  // class clojure/lang/IFn$LO
      79: lload_1       
      80: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      83: invokestatic  #75                 // Method clojure/lang/Numbers.dec:(J)J
      86: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      91: invokestatic  #81                 // Method clojure/lang/Numbers.add:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Number;
      94: areturn       
    LineNumberTable:
      line 243: 0
      line 244: 2
      line 247: 52
      line 247: 52
      line 247: 61
      line 248: 70
      line 248: 79
      line 248: 79
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
             2      92     3 G__301   J
             0      94     0  this   Ljava/lang/Object;
             0      94     1     n   J

  public java.lang.Object invoke(java.lang.Object);
    Code:
       0: aload_0       
       1: aload_1       
       2: checkcast     #89                 // class java/lang/Number
       5: invokestatic  #93                 // Method clojure/lang/RT.longCast:(Ljava/lang/Object;)J
       8: invokeinterface #77,  3           // InterfaceMethod clojure/lang/IFn$LO.invokePrim:(J)Ljava/lang/Object;
      13: areturn       

the body of the "optimized" version looks like:

  public long fib_Invoke_1(long);
    Code:
       0: lload_1       
       1: lstore_3      
       2: lload_3       
       3: l2i           
       4: tableswitch   { // 0 to 1
                     0: 28
                     1: 38
               default: 48
          }
      28: lconst_0      
      29: lload_3       
      30: lcmp          
      31: ifne          48
      34: lconst_0      
      35: goto          80
      38: lconst_1      
      39: lload_3       
      40: lcmp          
      41: ifne          48
      44: lconst_1      
      45: goto          80
      48: aload_0       
      49: checkcast     #6                  // class com/thelastcitadel/kernel/I2364
      52: lload_1       
      53: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      56: invokeinterface #55,  3           // InterfaceMethod com/thelastcitadel/kernel/I2364.fib_Invoke_1:(J)J
      61: aload_0       
      62: checkcast     #6                  // class com/thelastcitadel/kernel/I2364
      65: lload_1       
      66: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      69: invokestatic  #53                 // Method clojure/lang/Numbers.dec:(J)J
      72: invokeinterface #55,  3           // InterfaceMethod com/thelastcitadel/kernel/I2364.fib_Invoke_1:(J)J
      77: invokestatic  #59                 // Method clojure/lang/Numbers.add:(JJ)J
      80: lreturn       
    LineNumberTable:
      line 251: 0
      line 251: 2
      line 251: 48
      line 251: 48
      line 251: 52
      line 251: 61
      line 251: 65
      line 251: 65
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
             2      78     3 G__2363   J
             0      80     0  this   Lcom/thelastcitadel/kernel/R2365;
             0      80     1     n   J

so the calls are not invokevirtual (due to the way clojure compiles stuff, you cannot type anything inside a record as being that record's type), but the interface is unique and only has one instance, so I think the jvm's class hierarchy analysis makes short work of that.

if I have time I may try and complete my macro and release it as a library, but given tools.analyzer.jvm someone should be able to do better than my little proof of concept very quickly.

Comment by Andy Fingerhut [ 07/Nov/13 12:48 PM ]

I don't know if my editing of Mike Anderson's Nov 5 2013 comment is notified to people watching this ticket, so adding a new comment so those interested in definline's experimental status can know to go back and re-read it.





[CLJ-1259] Speed up pprint Created: 09/Sep/13  Updated: 03/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Andy Fingerhut Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: ft, performance, print

Attachments: Text File clj-1259-1.txt    
Patch: Code
Approval: Triaged

 Description   

There are many occurrences of reflection in the pprint implementation.

By eliminating all of them, I ran one benchmark of pprint'ing a Clojure map that resulted in a 300 Kbyte output. After eliminating reflection, the elapsed time to pprint was reduced by 18% (about 14.0 sec down to about 11.5 sec) on a recent model MacBook Pro.

Patch: clj-1277-1.txt
Screened by: Alex Miller



 Comments   
Comment by Andy Fingerhut [ 09/Sep/13 11:36 PM ]

Patch clj-1259-1.txt eliminates all occurrences of reflection in pprint, and all files loaded from pprint.clj. It also sets warn-on-reflection to true for those files, in hopes of making it more obvious if a new use of reflection is added there.





[CLJ-1256] Support type-hinted overrides of function parameters Created: 06/Sep/13  Updated: 09/Sep/13

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: compiler, interop, typehints


 Description   

Problem: in many cases, the Clojure compiler has enough information about the type of a function argument to statically emit maximally efficient code on the JVM (i.e. without instance? checks, type casts or other forms of dynamic polymorphic dispatch). We are currently unable to do so in Clojure, which pushes developers with strong performance requirements to use some unidiomatic or convoluted workarounds.

Proposal is simply to allow functions to take type-hinted overloads of function arguments, e.g.

(defn foo
([^double x] (Math/floor x))
([^float x] (Math/floor (double x)))
([^String s] (count s)))

An "Object" version of the code with the correct arity will always be emitted, which will maintain compatibility with the IFn interface and ensure that the function can still be used in dynamic / interactive contexts. If the "Object" version is not explicitly provided, then it will be generated to use instance? checks that subsequently delegate to the appropriate typed version of the function (or throw an InvalidArgumentException if no match is found).

Matching rules would be the same as Java.

This will be backwards compatible with all existing uses of defn. In particular, it should extend / enhance / supercede the existing handling of primitive functions.

In the future, this technique might be used alongside core.typed to ensure that the most efficient function version is chosen based on type inference.






[CLJ-1255] Support Abstract Base Classes with Java-only variant of "reify" Created: 06/Sep/13  Updated: 01/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Mike Anderson Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: interop

Approval: Triaged

 Description   

Problem:

  • Various Java APIs depend on extension of abstract base classes rather than interfaces
  • "proxy" has limitations (no access to protected fields or super)
  • "proxy" has performance overhead because of an extra layer of functions / parameter boxing etc.
  • "gen-class" is complex and is complected with compilation / bytecode generation

In summary: Clojure does not currently have a good / convenient way to extend a Java abstract base class dynamically.

The proposal is to create a variant of "reify" that allows the extension of a single abstract base class (optionally also with interfaces/protocols). Code generation would occur as if the abstract base class had been directly extended in Java (i.e. with full access to protected members and with fully type-hinted fields).

Since this is a JVM-only construct, it should not affect the portable extension methods in Clojure (deftype etc.). We propose that it is placed in an separate namespace that could become the home for other JVM-specific interop functionality, e.g. "clojure.java.interop"



 Comments   
Comment by Alex Miller [ 30/Jun/14 8:18 AM ]

From Rich: we do not want to support abstract classes in a portable construct (reify, deftype). However, this would be considered as a new Java-only construct (extend-class or reify-class). If you could modify the ticket appropriately, will move back to Triaged.





[CLJ-1250] Reducer (and folder) instances hold onto the head of seqs Created: 03/Sep/13  Updated: 06/Oct/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Christophe Grand Assignee: Unassigned
Resolution: Unresolved Votes: 9
Labels: memory, reducers

Attachments: Text File after-change.txt     Text File before-change.txt     Text File CLJ-1250-08-29.patch     Text File CLJ-1250-08-29-ws.patch     Text File CLJ-1250-20131211.patch     Text File CLJ-1250-AllInvokeSites-20140204.patch     Text File CLJ-1250-AllInvokeSites-20140320.patch    
Patch: Code and Test
Approval: Vetted

 Description   

Problem Statement
A shared function #'clojure.core.reducers/reducer holds on to the head of a reducible collection, causing it to blow up when the collection is a lazy sequence.

Reproduction steps:
Compare the following calls:

(time (reduce + 0 (map identity (range 1e8))))
(time (reduce + 0 (r/map identity (range 1e8))))

The second call should fail on a normal or small heap.

(If reducers are faster than seqs, increase the range.)

Cause: #'reducer closes over a collection when in order to reify CollReduce, and the closed-over is never cleared. When code attempts to reduce over this anonymous transformed collection, it will realize the tail while the head is stored in the closed-over.

Patch
CLJ-1250-08-29.patch

Approach:

Clear the reference to 'this' on the stack just before a tail call occurs

Removes calls to emitClearLocals(), which is a no-op.

When the context is RETURN (indicating a tail call), and the operation
is an InvokeExpr, StaticMethodExpr, or InstanceMethodExpr, clear the
reference to 'this' which is in slot 0 of the locals.

Edge-case: Inside the body of a try block, we cannot clear 'this' at the tail
position as we might need to keep refs around for use in the catch or finally
clauses. Introduces another truthy dynamic binding var to track position being
inside a try block.

Adds two helpers to emitClearThis and inTailCall.

Advantages: Fixes this case with no user code changes. Enables GC to do reclaim closed-overs references earlier.
Disadvantages: A compiler change.

Alternate Approaches:

1) Reimplement the #'reducer (and #'folder) transformation fns similar to the manner that Christophe proposes here:

(defrecord Reducer [coll xf])

(extend-protocol 
  clojure.core.protocols/CollReduce
  Reducer
      (coll-reduce [r f1]
                   (clojure.core.protocols/coll-reduce r f1 (f1)))
      (coll-reduce [r f1 init]
                   (clojure.core.protocols/coll-reduce (:coll r) ((:xf r) f1) init)))

(def rreducer ->Reducer) 

(defn rmap [f coll]
  (rreducer coll (fn [g] 
                   (fn [acc x]
                     (g acc (f x))))))

Advantages: Relatively non-invasive change.
Disadvantages: Not evident code. Additional protocol dispatch, though only incurred once

2) Alternate approach

from Christophe Grand:
Another way would be to enhance the local clearing mechanism to also clear "this" but it's complex since it may be needed to keep a reference to "this" for caches long after obvious references to "this" are needed.

Advantages: Fine-grained
Disadvantages: Complex, invasive, and the compiler is hard to hack on.

Mitigations
Avoid reducing on lazy-seqs and instead operate on vectors / maps, or custom reifiers of CollReduce or CollFold. This could be easier with some implementations of common collection functions being available (like iterate and partition).

See https://groups.google.com/d/msg/clojure-dev/t6NhGnYNH1A/2lXghJS5HywJ for previous discussion.



 Comments   
Comment by Gary Fredericks [ 03/Sep/13 8:53 AM ]

Fixed indentation in description.

Comment by Ghadi Shayban [ 11/Dec/13 11:08 PM ]

Adding a patch that clears "this" before tail calls. Verified that Christophe's repro case is fixed.

Will upload a diff of the bytecode soon.

Any reason this juicy bug was taken off 1.6?

Comment by Ghadi Shayban [ 11/Dec/13 11:17 PM ]

Here's the bytecode for the clojure.core.reducers/reducer reify before and after the change... Of course a straight diff isn't useful because all the line numbers changed. Kudos to Gary Trakhman for the no.disassemble lein plugin.

Comment by Christophe Grand [ 12/Dec/13 6:58 AM ]

Ghadi, I'm a bit surprised by this part of the patch: was the local clearing always a no-op here?

-		if(context == C.RETURN)
+		if(shouldClear)
 			{
-			ObjMethod method = (ObjMethod) METHOD.deref();
-			method.emitClearLocals(gen);
+                            gen.visitInsn(Opcodes.ACONST_NULL);
+                            gen.visitVarInsn(Opcodes.ASTORE, 0);
 			}

The problem with this approach (clear this on tail call) is that it adds yet another special case. To me the complexity stem from having to keep this around even if the user code doesn't refer to it.

Comment by Ghadi Shayban [ 12/Dec/13 7:19 AM ]

Thank you - I failed to mention this in the commit message: it appears that emitClearLocals() belonging to both ObjMethod and FnMethod (its child) are empty no-ops. I believe the actual local clearing is on line 4855.

I agree re: another special case in the compiler.

Comment by Alex Miller [ 12/Dec/13 8:56 AM ]

Ghadi re 1.6 - this ticket was never in the 1.6 list, it has not yet been vetted by Rich but is ready to do so when we open up again after 1.6.

Comment by Ghadi Shayban [ 12/Dec/13 8:59 AM ]

Sorry I confused the critical list with the Rel1.6 list.

Comment by Ghadi Shayban [ 14/Dec/13 11:16 AM ]

New patch 20131214 that handles all tail invoke sites (InvokeExpr + StaticMethodExpr + InstanceMethodExpr). 'StaticInvokeExpr' seems like an old remnant that had no active code path, so that was left as-is.

The approach taken is still the same as the original small patch that addressed only InvokeExpr, except that it is now using a couple small helpers. The commit message has more details.

Also a 'try' block with no catch or finally clause now becomes a BodyExpr. Arguably a user error, historically accepted, and still accepted, but now they are a regular BodyExpr, instead of being wrapped by a the no-op try/catch mechanism. This second commit can be optionally discarded.

With this patch on my machine (4/8 core/thread Ivy Bridge) running on bare clojure.main:
Christophe's test cases both run i 3060ms on a artificially constrained 100M max heap, indicating a dominant GC overhead. (But they now both work!)

When max heap is at a comfortable 2G the reducers version outpaces the lazyseq at 2100ms vs 2600ms!

Comment by Ghadi Shayban [ 13/Jan/14 10:48 AM ]

Updating stale patch after latest changes to master. Latest is CLJ-1250-AllInvokeSites-20140113

Comment by Ghadi Shayban [ 04/Feb/14 3:50 PM ]

Updating patch after murmur changes

Comment by Tassilo Horn [ 13