<< Back to previous view

[CLJ-1476] map-invert should use (empty m) instead of {} Created: 26/Jul/14  Updated: 26/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Gregory Schlomoff Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

clojure.set/map-invert should reduce with (empty m) instead of {} so that it returns a map of the same type as its argument.

This is a trivial change and I'm willing to submit a patch if nobody opposes.



 Comments   
Comment by Alex Miller [ 26/Jul/14 8:43 AM ]

I don't think that always makes sense. Say you had a map of string to integers with a custom comparator created by sorted-map-by. If you use empty, you'd still have a map with a custom comparator which you would pour integer keys into and would likely throw a ClassCastException.

What is the use case that led you to this ticket?

Comment by Gregory Schlomoff [ 26/Jul/14 9:14 AM ]

Hello Alex, thanks for commenting.

My use case is that I have a custom type that implements IPersistentMap. If I use map-invert over it, I get a regular map back, which is problematic because regular maps don't allow multiple values for the same key, unlike my multimap implementation, so I loose information.

(map-invert (my-multimap :a 1, :b 1))
=> {1 :b} ; lost the (1 :a) entry because regular maps don't allow duplicate keys

Maybe a solution would be to make a version of map-invert that takes a map to insert the inverted entries into?

I'm not adamant over this, if you think there is no elegant solution for this issue we can close it.





[CLJ-1475] :post condition causes compiler error with recur Created: 25/Jul/14  Updated: 26/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Steve Miner Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Approval: Triaged

 Description   

Michael O'Keefe <michael.p.okeefe@gmail.com> posted on the mailing list an example of code that causes a compiler error only if a :post condition is added. Here's my slightly modified version:

(defn g
  [xs acc]
  {:pre [(or (nil? xs) (sequential? xs))]
   :post [(number? %)]}
  (if (seq xs)
     (recur (next xs) (+ (first xs) acc))
     acc))

CompilerException java.lang.UnsupportedOperationException: Can only recur from tail position

The work-around is to wrap the body in a loop that simply rebinds the original args.



 Comments   
Comment by Steve Miner [ 25/Jul/14 9:53 AM ]

A macro expansion shows that body is placed in a let form to capture the result for later testing with the post condition, but the recur no longer has a proper target. The work-around of using a loop form is easy once you understand what's happening but it's a surprising limitation.

Comment by Steve Miner [ 25/Jul/14 9:55 AM ]

Use a local fn* around the body and call it with the original args so that the recur has a proper target. Update: not good enough for handling destructuring. Patch withdrawn.

Comment by Michael Patrick O'Keefe [ 25/Jul/14 10:37 AM ]

Link to the original topic discussion: https://groups.google.com/d/topic/clojure/Wb1Nub6wVUw/discussion

Comment by Steve Miner [ 25/Jul/14 1:42 PM ]

Patch withdrawn because it breaks on destructured args.

Comment by Steve Miner [ 25/Jul/14 5:27 PM ]

While working on a patch, I came up against a related issue: Should the :pre conditions apply to every recur "call". Originally, I thought the :pre conditions should be checked just once on the initial function call and never during a recur. People on the mailing list pointed out that the recur is semantically like calling the function again so the :pre checks are part of the contract. But no one seemed to want the :post check on every recursion, so the :post would happen only at the end.

That means automatically wrapping a loop (or nested fn* call) around the body is not going to work for the :pre conditions. A fix would have to bring the :pre conditions inside the loop.

Comment by Steve Miner [ 26/Jul/14 8:54 AM ]

I'm giving up on this bug. My approach was adding too much complexity to handle an edge case. I recommend the "loop" work-around to anyone who runs into this problem.

(defn g2
  [xs acc]
  {:pre [(or (nil? xs) (sequential? xs))]
   :post [(number? %)]}
  (loop [xs xs acc acc]
    (if (seq xs)
       (recur (next xs) (+ (first xs) acc))
       acc)))




[CLJ-701] Compiler loses 'loop's return type in some cases Created: 03/Jan/11  Updated: 26/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Backlog
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Chouser Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Clojure commit 9052ca1854b7b6202dba21fe2a45183a4534c501, version 1.3.0-master-SNAPSHOT


Attachments: File hoistedmethod-pass-1.diff     File hoistedmethod-pass-2.diff     File hoistedmethod-pass-3.diff    
Patch: Code
Approval: Vetted

 Description   
(set! *warn-on-reflection* true)
(fn [] (loop [b 0] (recur (loop [a 1] a))))

Generates the following warnings:

recur arg for primitive local: b is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: b

This is interesting for several reasons. For one, if the arg to recur is a let form, there is no warning:

(fn [] (loop [b 0] (recur (let [a 1] a))))

Also, the compiler appears to understand the return type of loop forms just fine:

(use '[clojure.contrib.repl-utils :only [expression-info]])
(expression-info '(loop [a 1] a))
;=> {:class long, :primitive? true}

The problem can of course be worked around using an explicit cast on the loop form:

(fn [] (loop [b 0] (recur (long (loop [a 1] a)))))

Reported by leafw in IRC: http://clojure-log.n01se.net/date/2011-01-03.html#10:31



 Comments   
Comment by a_strange_guy [ 03/Jan/11 4:36 PM ]

The problem is that a 'loop form gets converted into an anonymous fn that gets called immediately, when the loop is in a expression context (eg. its return value is needed, but not as the return value of a method/fn).

so

(fn [] (loop [b 0] (recur (loop [a 1] a))))

gets converted into

(fn [] (loop [b 0] (recur ((fn [] (loop [a 1] a))))))

see the code in the compiler:
http://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L5572

this conversion already bites you if you have mutable fields in a deftype and want to 'set! them in a loop

http://dev.clojure.org/jira/browse/CLJ-274

Comment by Christophe Grand [ 23/Nov/12 2:28 AM ]

loops in expression context are lifted into fns because else Hotspot doesn't optimize them.
This causes several problems:

  • type inference doesn't propagate outside of the loop[1]
  • the return value is never a primitive
  • mutable fields are inaccessible
  • surprise allocation of one closure objects each time the loop is entered.

Adressing all those problems isn't easy.
One can compute the type of the loop and emit a type hint but it works only with reference types. To make it works with primitive, primitie fns aren't enough since they return only long/double: you have to add explicit casts.
So solving the first two points can be done in a rather lccal way.
The two other points require more impacting changes, the goal would be to emit a method rather than a fn. So it means at the very least changing ObjExpr and adding a new subclassof ObjMethod.

[1] beware of CLJ-1111 when testing.

Comment by Alex Miller [ 21/Oct/13 10:28 PM ]

I don't think this is going to make it into 1.6, so removing the 1.6 tag.

Comment by Kevin Downey [ 21/Jul/14 7:14 PM ]

an immediate solution to this might be to hoist loops out in to distinct non-ifn types generated by the compiler with an invoke method that is typed to return the getJavaClass() type of the expression, that would give us the simplifying benefits of hoisting the code out and free use from the Object semantics of ifn

Comment by Kevin Downey [ 22/Jul/14 8:39 PM ]

I have attached a 3 part patch as hoistedmethod-pass-1.diff

3ed6fed8 adds a new ObjMethod type to represent expressions hoisted out in to their own methods on the enclosing class

9c39cac1 uses HoistedMethod to compile loops not in the return context

901e4505 hoists out try expressions and makes it possible for try to return a primitive expression (this might belong on http://dev.clojure.org/jira/browse/CLJ-1422)

Comment by Kevin Downey [ 22/Jul/14 8:54 PM ]

with hoistedmethod-pass-1.diff the example code generates bytecode like this

user=> (println (no.disassemble/disassemble (fn [] (loop [b 0] (recur (loop [a 1] a))))))
// Compiled from form-init1272682692522767658.clj (version 1.5 : 49.0, super bit)
public final class user$eval1675$fn__1676 extends clojure.lang.AFunction {
  
  // Field descriptor #7 Ljava/lang/Object;
  public static final java.lang.Object const__0;
  
  // Field descriptor #7 Ljava/lang/Object;
  public static final java.lang.Object const__1;
  
  // Method descriptor #10 ()V
  // Stack: 2, Locals: 0
  public static {};
     0  lconst_0
     1  invokestatic java.lang.Long.valueOf(long) : java.lang.Long [16]
     4  putstatic user$eval1675$fn__1676.const__0 : java.lang.Object [18]
     7  lconst_1
     8  invokestatic java.lang.Long.valueOf(long) : java.lang.Long [16]
    11  putstatic user$eval1675$fn__1676.const__1 : java.lang.Object [20]
    14  return
      Line numbers:
        [pc: 0, line: 1]

 // Method descriptor #10 ()V
  // Stack: 1, Locals: 1
  public user$eval1675$fn__1676();
    0  aload_0 [this]
    1  invokespecial clojure.lang.AFunction() [23]
    4  return
      Line numbers:
        [pc: 0, line: 1]
  
  // Method descriptor #25 ()Ljava/lang/Object;
  // Stack: 3, Locals: 3
  public java.lang.Object invoke();
     0  lconst_0
     1  lstore_1 [b]
     2  aload_0 [this]
     3  lload_1 [b]
     4  invokevirtual user$eval1675$fn__1676.__hoisted1677(long) : long [29]
     7  lstore_1 [b]
     8  goto 2
    11  areturn
      Line numbers:
        [pc: 0, line: 1]
      Local variable table:
        [pc: 2, pc: 11] local: b index: 1 type: long
        [pc: 0, pc: 11] local: this index: 0 type: java.lang.Object

 // Method descriptor #27 (J)J
  // Stack: 2, Locals: 5
  public long __hoisted1677(long b);
    0  lconst_1
    1  lstore_3 [a]
    2  lload_3
    3  lreturn
      Line numbers:
        [pc: 0, line: 1]
      Local variable table:
        [pc: 2, pc: 3] local: a index: 3 type: long
        [pc: 0, pc: 3] local: this index: 0 type: java.lang.Object
        [pc: 0, pc: 3] local: b index: 1 type: java.lang.Object

}
nil
user=> 
  

the body of the method __hoisted1677 is the inner loop

for reference the part of the bytecode from the same function compiled with 1.6.0 is pasted here https://gist.github.com/hiredman/f178a690718bde773ba0 the inner loop body is missing because it is implemented as its own IFn class that is instantiated and immediately executed. it closes over a boxed version of the numbers and returns an boxed version

Comment by Kevin Downey [ 23/Jul/14 12:43 AM ]

hoistedmethod-pass-2.diff replaces 901e4505 with f0a405e3 which fixes the implementation of MaybePrimitiveExpr for TryExpr

with hoistedmethod-pass-2.diff the largest clojure project I have quick access to (53kloc) compiles and all the tests pass

Comment by Alex Miller [ 23/Jul/14 12:03 PM ]

Thanks for the work on this!

Comment by Kevin Downey [ 23/Jul/14 2:05 PM ]

I have been working through running the tests for all the contribs projects with hoistedmethod-pass-2.diff, there are some bytecode verification errors compiling data.json and other errors elsewhere, so there is still work to do

Comment by Kevin Downey [ 25/Jul/14 7:08 PM ]

hoistedmethod-pass-3.diff

49782161 * add HoistedMethod to the compiler for hoisting expresssions out well typed methods
e60e6907 * hoist out loops if required
547ba069 * make TryExpr MaybePrimitive and hoist tries out as required

all contribs whose tests pass with master pass with this patch.

the change from hoistedmethod-pass-2.diff in this patch is the addition of some bookkeeping for arguments that take up more than one slot

Comment by Nicola Mometto [ 26/Jul/14 1:37 AM ]

Kevin there's still a bug regarding long/doubles handling:
On commit 49782161, line 101 of the diff, you're emitting gen.pop() if the expression is in STATEMENT position, you need to emit gen.pop2() instead if e.getReturnType is long.class or double.class

Test case:

user=> (fn [] (try 1 (finally)) 2)
VerifyError (class: user$eval1$fn__2, method: invoke signature: ()Ljava/lang/Object;) Attempt to split long or double on the stack  user/eval1 (NO_SOURCE_FILE:1)
Comment by Kevin Downey [ 26/Jul/14 1:46 AM ]

bah, all that work to figure out the thing I couldn't get right and of course I overlooked the thing I knew at the beginning. I want to get rid of some of the code duplication between emit and emitUnboxed for TryExpr, so when I get that done I'll fix the pop too





[CLJ-1474] `reduced` docstring should be more explicit Created: 25/Jul/14  Updated: 25/Jul/14  Resolved: 25/Jul/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Jean Niklas L'orange Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: docstring


 Description   

The documentation for reduced is as follows:

Wraps x in a way such that a reduce will terminate with the value x

From what I gather, this does not specify whether the init value of a reduce could be a reduced value or not. As shown, the fact that the init value is a reduced value is ignored:

(reduce list (reduced 1) [2])
=> (#<Reduced@518a6aa: 1> 2)

The documentation should explicitly mention that a reduce call will not check if the initial value is reduced.



 Comments   
Comment by Alex Miller [ 25/Jul/14 9:09 AM ]

reduced creates a value that has special meaning as the output of invocation of the reducing function. Your example is about an input to that function. I don't see that this makes sense or needs documenting.

You can of course invent a situation where a (reduced 1) input is also the output but again, that seems like a pretty weird use case.

(reduce (fn [a v] a) (reduced 1) [2])
;; 1
Comment by Jean Niklas L'orange [ 25/Jul/14 12:10 PM ]

Right, that's my point. Nowhere in the documentation does it state that this does not apply to the initial value given to reduce. While you and I know this, I don't see how one can conclude this based on the current documentation.

Put differently, someone might wrongly assume that reduce is implemented as an optimised version of this:

(defn reduce [f init coll]
  (cond (reduced? init) (unreduced init)
        (empty? coll)    init
        :else           (recur f (f init (first coll))
                                 (rest coll))))

However, that's not the case, which I think is worth pointing out.

Comment by Alex Miller [ 25/Jul/14 5:01 PM ]

But it might apply to the initial value (as in my example where a reduced value is respected - note that doesn't return (reduced 1), just 1). Your suggested documentation change is talking about input values, but in my mind that leads to incorrect conclusions.

The only change that would make sense to me is clarifying where a "reduced" value is checked (on the result of applying the function passed to reduce). I think that's already implicit in the existing doc string myself. Since we have multiple implementations of "reduce", we have to tread carefully not to refer to explicitly to a particular one.

This use of a reduced initial value does not even make sense; why we would we confuse the docstring to warn about it?





[CLJ-1192] vec function is substantially slower than into function Created: 06/Apr/13  Updated: 25/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Luke VanderHart Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance

Approval: Vetted

 Description   

(vec coll) and (into [] coll) do exactly the same thing. However, due to into using transients, it is substantially faster. On my machine:

(time (dotimes [_ 100] (vec (range 100000))))
"Elapsed time: 732.56 msecs"

(time (dotimes [_ 100] (into [] (range 100000))))
"Elapsed time: 491.411 msecs"

This is consistently repeatable.

Since vec's sole purpose is to transform collections into vectors, it should do so at the maximum speed available.



 Comments   
Comment by Andy Fingerhut [ 07/Apr/13 5:50 PM ]

I am pretty sure that Clojure 1.5.1 also uses transient vectors for (vec (range n)) (probably also some earlier versions of Clojure, too).

Look at vec in core.clj. It checks whether its arg is a java.util.Collection, which lazy seqs are, so calls (clojure.lang.LazilyPersistentVector/create coll).

LazilyPersistentVector's create method checks whether its argument is an ISeq, which lazy seqs are, so it calls PersistentVector.create(RT.seq(coll)).

All 3 of PersistentVector's create() methods use transient vectors to build up the result.

I suspect the difference in run times are not because of transients or not, but because of the way into uses reduce, and perhaps may also have something to do with the perhaps-unnecessary call to RT.seq in LazilyPersistentVector's create method (in this case, at least – it is likely needed for other types of arguments).

Comment by Alan Malloy [ 14/Jun/13 2:17 PM ]

I'm pretty sure the difference is that into uses reduce: since reducers were added in 1.5, chunked sequences know how to reduce themselves without creating unnecessary cons cells. PersistentVector/create doesn't use reduce, so it has to allocate a cons cell for each item in the sequence.

Comment by Gary Fredericks [ 08/Sep/13 1:55 PM ]

Is there any downside to (defn vec [coll] (into [] coll)) (or the inlined equivalent)?

Comment by Ghadi Shayban [ 11/Apr/14 5:13 PM ]

While I agree that there are improvements and possibly low-hanging fruit, FWIW https://github.com/clojure/tools.analyzer/commit/cf7dda81a22f4c9c1fe64c699ca17e7deed61db4#commitcomment-5989545

showed a 5% slowdown from a few callsites in tools.analyzer.

This ticket's benchmark is incomplete in that it covers a single type of argument (chunked range), and flawed as it timing the expense of realizing the range. (That could be a legit benchmark case, but it shouldn't be the only one).

Sorry to rain on a parade. I promise like speed too!

Comment by Greg Chapman [ 25/Apr/14 5:23 PM ]

One thing to note is that vec has a subtle difference from into when the collection is an Object array of length <= 32. In that case, vec aliases the supplied array, rather than copying it (this is noted in the warning here: http://clojuredocs.org/clojure_core/clojure.core/vec). I believe I read some place that this behavior is intentional, but I can't find the citation.

Comment by Andy Fingerhut [ 25/Apr/14 10:18 PM ]

Greg, CLJ-893 might be what you remember. That is the ticket that was closed by a patch updating the documentation of vec.

Comment by Mike Anderson [ 18/May/14 7:41 AM ]

I think there are quite a few performance improvements that can be made to vec in general. For example, if given a List it should use PersistentVector.create(List) rather than producing an unnecessary seq, which appears to be the case at the moment. Also it should probably return the same object if passed an existing IPersistentVector.

Basically there are a number of cases that we could be handling more efficiently....

I'm taking a look at this now.... will propose a quick patch if it seems there is a good solution.

Comment by Mike Anderson [ 24/Jul/14 4:01 AM ]

I've looked at this issue and it is quite complex. There are multiple types that need to potentially be converted into vectors, and doing so efficiently will often require making use of reduce-style operations on the source collections.

Doing this efficiently will probably in turn require making use of the IReduce interface, which doesn't yet seem to be fully utilised across the Clojure code based. If we do this, lots of operations (not just vec!) can be made faster but it will be quite a major change.

I have a branch that implements some of this but would appreciate feedback if this is the right direction before I take it any further:
https://github.com/mikera/clojure/tree/clj-1192-vec-performance

Comment by Alex Miller [ 24/Jul/14 9:45 AM ]

Thanks Mike! It may take a few days before I can get back to you about this.

Comment by Mike Anderson [ 25/Jul/14 3:44 AM ]

Basically the approach I am proposing is:

  • Make various collections implement IReduce efficiently (if they don't already). Especially applied to chunked seqs etc.
  • Have RT.reduce(...) methods that implement reduce on the Java side
  • Make the Clojure side use IReduce where relevant (should be as simple as extending the existing protocols)
  • Implement vec (and other similar operations) in terms of IReduce - which will solve this specific issue

If we really care about pushing vector performance even further, we can also consider:

  • Create specialised small vector types where appropriate - e.g. a specialised SmallPersistentVector class for <32 elements. This should outperform the more generic PersistentVector which is better suited for large vectors.
  • Some dedicated construction functions that know how to efficiently exploit knowledge about the data source (e.g. creating a vec from a segment of a big Object array can be done with a bunch of arraycopys into 32-element chunks and then constructing a PersistentVector around these)

This should give us a decent speedup overall (of course it would need benchmarking... but I'd hope to see some sort of measurable improvement on a macro benchmark like building and testing Clojure).





[CLJ-1473] Bad pre/post conditions silently passed Created: 24/Jul/14  Updated: 24/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Brandon Bloom Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: errormsgs

Attachments: Text File 0001-Validate-that-pre-and-post-conditions-are-vectors.patch    
Patch: Code and Test
Approval: Triaged

 Description   

Before:

user=> ((fn [x] {:pre (pos? x)} x) -5) ; ouch!
-5
user=> ((fn [x] {:pre [(pos? x)]} x) -5) ; meant this
AssertionError Assert failed: (pos? x)  user/eval4075/fn--4076 (form-init5464179453862723045.clj:1)

After:

user=> ((fn [x] {:pre (pos? x)} x) -5)
CompilerException java.lang.IllegalArgumentException: Pre and post conditions should be vectors, compiling:(NO_SOURCE_PATH:1:2) 
user=> ((fn [x] {:pre [(pos? x)]} x) -5)                                  
AssertionError Assert failed: (pos? x)  user/eval2/fn--3 (NO_SOURCE_FILE:2)
user=> ((fn [x] {:post (pos? x)} x) -5)
CompilerException java.lang.IllegalArgumentException: Pre and post conditions should be vectors, compiling:(NO_SOURCE_PATH:3:2) 
user=> ((fn [x] {:post [(pos? x)]} x) -5)              
AssertionError Assert failed: (pos? x)  user/eval7/fn--8 (NO_SOURCE_FILE:4)





[CLJ-1472] The locking macro fails bytecode verification on ART runtime Created: 23/Jul/14  Updated: 24/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Adam Clements Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Environment:

Android ART runtime


Attachments: Text File 0001-Move-monitor-enter-outside-try-block.patch    
Patch: Code
Approval: Triaged

 Description   

Android ART runs compile time verification on bytecode and was failing on any usage of the locking macro. Examination of the bytecode as compared to a java synchronized block shows up a number of differences:
https://gist.github.com/AdamClements/2ae6c4919964b71eb470

Having the monitor-enter inside the try block seems wrong to me, as surely if the lock fails to be acquired, it shouldn't be released with monitor-exit. Moving the monitor enter outside the try block seems to have resolved the issue and android no longer complains about usages of locking and all clojure tests still pass.

Java's generated code goes further and catches any exceptions generated by the monitor-exit itself and retries indefinitely (I believe the logic is that then at least your deadlock is in the right place, and not next time something else attempts to acquire a lock on the same object). I don't think that this can be replicated in clojure without getting down to the bytecode emitting level though and it doesn't seem to be an issue for the ART verifier.



 Comments   
Comment by Adam Clements [ 24/Jul/14 11:17 AM ]

After using this a little more, I've found that moving this outside the try block breaks nREPL.

Looking at the bytecode, the monitorenter for the locking in clojure.tools.nrepl.middleware.session/session-out and in a few other places ends up in an entirely different method definition and we now get a JVM IllegalMonitorStateException as well as an ART verification error for this function.





[CLJ-1415] Keyword cache cleanup incurs linear scan of cache Created: 06/May/14  Updated: 23/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Alex Miller
Resolution: Unresolved Votes: 6
Labels: keywords, performance

Attachments: File faster-keywords.diff     File keyword-cache.diff     Text File kw-clean-future.patch     File unified-kw-patch.diff    
Patch: Code
Approval: Vetted

 Description   

If the GC reclaims a keyword, any subsequent attempt to create a keyword requires an O(n) scan over the entire keyword table via Util.clearCache. This is a significant performance cost in keyword-heavy operations; e.g. JSON parsing.

Patch: keyword-cache.diff - patch to defer cleaning till portion of the table is dead and avoid multiple threads cleaning simultaneously.

Patch: kw-clean-future.patch - patch to spin cache cleaning into a future. Found that 1) this introduces some ordering constraints and circularity between Agent and Keyword (fixable) and 2) using the future pool for this means shutdown-agents would always need to be called (in the patch I avoided this by changing agent's soloExecutor to use daemon threads.

Patch: unified-kw-patch.diff - Alternative to keyword-cache and clean-future.patch. Combines all keyword-perf changes, including the EDN kw parser improvement, improved table lookup performance, and has threads cooperate to empty the table refqueue with a minimum of contention.



 Comments   
Comment by Alex Miller [ 06/May/14 5:53 PM ]

Any perf-related ticket will need some clear before/after timings (with good methodology and how to repro) and also a consideration of cases where the change may introduce any perf degradation in normal usage.

Comment by Kyle Kingsbury [ 07/May/14 9:54 PM ]

I've experimented with a patch reducing the cache clearing cost and removing the need for String.intern. Preliminary results are good, but I want to try a few alternative approaches for cache keys. For instance, could we use pure strings like "foo" and "clojure.core/foo" as the cache keys, removing a level of memory indirection? If we're being really sneaky, we could share those same strings with the Symbol _str field to halve our memory use, assuming it's OK to reach in and mutate it.

https://gist.github.com/aphyr/f72e72992dade4578232
http://imgur.com/a/YSgUa#2

Comment by Alex Miller [ 08/May/14 12:29 PM ]

Great start on this - having the perf data is hugely important. One thing I don't see you've covered yet is what the corresponding memory increase you're incurring with CacheKey to get the benefit - we need to quantify both sides of the tradeoff here (latency/throughput vs memory) to fully judge.

Questions/comments on your patch...

1) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L101 - do we need the (o instanceof CacheKey) check? If the usage of this is constrained then we might be able to skip it (and it will blow up on the next line if something is wrong).

2) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L110 - shouldn't we precompute and save the hash code!? The only thing we're making this for is fast hash comparisons. That hash computation is string length dependent - it ain't cheap.

3) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L126 - have you tested with other values here? Should have some justification for this.

4) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L126 - have you tested with other values here? Should have some justification that this is a reasonable number.

5) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L169 - there is a race here (actually more than one if you include getting the tableSize):

Th1: orphansCount = orphans.get()
Th2: orphansCount = orphans.get()
Th2: orphansNew = orphans.getAndSet(0)
Th2: orphansNew > orphansCount -> start cleaning
<huge gc, 10 zillion orphans are created>
Th1: orphansNew = orphans.getAndSet(0)
Th1: orphansNew > orphansCount -> start cleaning

but I guess this is "safe"; we just have multiple threads cleaning in that case.

6) In general it seems pretty sloppy (I don't mean that pejoratively) how the orphans, rq, and cleaning thread are working together and may be out of sync. I get it and I even believe it will work (usually) but I worry about timing issues that I am too dumb to think of.

Is there a simpler strategy? What if every Nth call to intern() cleaned M entries (or up to M% of table)?

7) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L177 - if you made the iterator explicit in this loop, it would be safe to call iterator.remove() instead of table.remove() and I believe that would be faster on CHM.

8) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L198 - could have two versions of this with/without the symbol. Not sure if that would be faster but they might both inline better into their callers then?

9) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L242 - what's the use case for finding an external CacheKey? Fast re-lookup for specialized use? Do we want to commit to this in the API?

Comment by Kyle Kingsbury [ 08/May/14 2:41 PM ]

1) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L101 - do we need the (o instanceof CacheKey) check? If the usage of this is constrained then we might be able to skip it (and it will blow up on the next line if something is wrong).

I'm usually wary of violating equality/hashCode contracts, and this doesn't even appear as a blip on the radar in profiling. I think we could drop it

2) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L110 - shouldn't we precompute and save the hash code!? The only thing we're making this for is fast hash comparisons. That hash computation is string length dependent - it ain't cheap.

It's less memoizable than you might think; each CacheKey is only indexed a few times, and only at query time; it also doesn't help us for equality checks, since those only occur after hashing. I can add a memoizing field for it at the cost of another 32 bits/kw; we'll see how it impacts performance.

3) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L126 - have you tested with other values here? Should have some justification for this.

I experimented with several values on the Clojure test suite, benchmarks, and some real-world hadoop code. Diminishing returns, as you'd expect. 0.1 and 0.5 are essentially identical in runtime tradeoff. We could drop to 0.01 if desired; it'll only make a difference in large (10-100K) extant keyword benchmarks, as far as I can tell.

4) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L126 - have you tested with other values here? Should have some justification that this is a reasonable number.

Same question as #3?

5) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L169 - there is a race here (actually more than one if you include getting the tableSize):

Th1: orphansCount = orphans.get()
Th2: orphansCount = orphans.get()
Th2: orphansNew = orphans.getAndSet(0)
Th2: orphansNew > orphansCount -> start cleaning
<huge gc, 10 zillion orphans are created>
Th1: orphansNew = orphans.getAndSet(0)
Th1: orphansNew > orphansCount -> start cleaning

but I guess this is "safe"; we just have multiple threads cleaning in that case.

Yep. This check is only there as an optimization--and note that if a huge GC occurs, it's likely we want to schedule a followup traversal of the table anyway, because the thread that's already cleaned some of the table has probably missed some subsequently GC'ed elements. The number of concurrently cleaning threads is bounded by the rate of GC churn, and in the most pathological case (sadly, I haven't been able to produce this race experimentally), this degenerates to the old Clojure behavior of every thread doing a full scan.

6) In general it seems pretty sloppy (I don't mean that pejoratively) how the orphans, rq, and cleaning thread are working together and may be out of sync. I get it and I even believe it will work (usually) but I worry about timing issues that I am too dumb to think of.

Is there a simpler strategy? What if every Nth call to intern() cleaned M entries (or up to M% of table)?

Every nth call is just fine, but it degrades more poorly for large tables. In general, I try to lean towards scale-invariant solutions, which is why I aimed to reclaim roughly a tenth of the entries in the map every time. Maybe more, maybe less, depending on CAS retries, delayed threads resetting the counter to zero, etc.

Doing M entries or M% is more tricky; gotta figure out which threads collect what fraction when, how you efficiently access that subsection of the hash, make sure elements don't fall through the cracks, etc.

7) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L177 - if you made the iterator explicit in this loop, it would be safe to call iterator.remove() instead of table.remove() and I believe that would be faster on CHM.

I agree. I figured Rich had a good reason for doing it this way, but if you concur I'll change it.

8) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L198 - could have two versions of this with/without the symbol. Not sure if that would be faster but they might both inline better into their callers then?

I agree. We can do that dispatch statically and cut down on branch misprediction, too.

9) https://gist.github.com/aphyr/f72e72992dade4578232#file-gistfile1-diff-L242 - what's the use case for finding an external CacheKey? Fast re-lookup for specialized use? Do we want to commit to this in the API?

Keep forgetting Java's obsession with encapsulation. I'll privatize.

Comment by Alex Miller [ 08/May/14 10:14 PM ]

On several of these - 2, 7, 8 - I think those are worth a test. If faster, we should consider.

On 9, I thought maybe you were opening it up so it would be possible to save off a CacheKey and reuse it or something else. If it's not needed externally, then might be good to private-ize CacheKey itself so we can change it later.

Comment by Kyle Kingsbury [ 09/May/14 6:04 PM ]

http://imgur.com/a/1bv3P#0
https://gist.github.com/aphyr/f72e72992dade4578232

These charts show the performance impact of several changes. In order, they are:

1.7                 baseline
kw                  initial patch
kw-static-paths     Separate codepaths for interning symbols vs strings. Iterator
                    .remove for cache cleaning. Fix a bug for null comparisons
                    in CacheKey namespaces. Internal functions now protected, not
                    public. Not much performance impact.
kw-memo-hash        Memoize hashcodes for CacheKeys. Performance is a wash.
kw-string-cachekeys Observing that String.indexOf('/') consumed a significant 
                    fraction of interning time, use a combined "ns/name" string for
                    Cachekeys instead of separate strings. Significant performance 
                    boost in all tests; 40% reduction in median latencies in 1000-
                    kw allocation test, for instance.
kw-string-keys      Use raw strings for CacheKeys. Improves performance by removing
                    a level of memory indirection, even without cached hashcodes.
kw-interned-keys    Intern those strings to reduce memory consumption, sharing
                    them with the underlying symbol's strings. Slightly slower.

Performance is even better now. Creating 1000 keywords median latency changed from 900 to 200 micros; .999s lower, throughput from 4000 to 20,000/second. JSON parsing median latency fell from 170 micros to 100 micros; throughput doubled from 17500 docs/sec to 36,000 docs/sec.

We're still suffering from poor dispersal in ConcurrentHashmap's use of the string hashCode on JDK7/8, but maybe that's a subject for a different patch.

Memory impact is now minimal. We intern every key string in the table, and those strings are interned by the symbols anyway, so they're essentially the same object. For namespaced symbols, we pay a slightly higher cost--forcing the interning of the "ns/name" string instead of deferring it to Symbol.toString() time. For non-namespaced symbols, these strings are interned as a part of the symbol creation process so there's no memory overhead.

At the repl, I tested by allocating and retaining a million keywords:

(def x (mapv keyword (map (partial str "test-kw-") (range 1e6))))

Retained size (bytes)             1.7   string-kw
----------------------------------------------------
Total retained heap        221.    MB  221.    MB
clojure.lang.Symbols       104.820 MB   32.900 MB
clojure.lang.Keywords       24.021 MB   56.049 MB
java.lang.Strings           89.537 MB   81.786 MB
clojure.lang.Keyword class  72.447 MB   72.451 MB

Total memory use is unchanged, but note that clojure.lang.Symbol retains less, since its strings are now shared by the Keyword table. Keywords, by contrast, retains more. Strings and the keyword table are essentially unchanged.

Comment by Kyle Kingsbury [ 09/May/14 6:08 PM ]

I can't figure out how to edit the ticket description, but I updated the same gist with the cumulative changes. Comments welcome!

Comment by Alex Miller [ 09/May/14 9:51 PM ]

Excellent, thanks for the data. I added a group to your auth so I think you should be able to edit descriptions now. If not, let me know. I'll re-review the patch next week. It would be good either at this point or after that to turn this into an actual patch file instead of a gist.

Comment by Kyle Kingsbury [ 12/May/14 4:24 PM ]

I've attached a cumulative patch. It's comprised of 8 commits; one for each stage we've discussed. I can rebase into a single commit if you'd like.

Comment by Alex Miller [ 13/May/14 7:31 AM ]

I would like a single cumulative rebased patch. I hope to have some time to look at it today.

Comment by Alex Miller [ 13/May/14 12:39 PM ]

On another look, I think it would be useful to separate this ticket into two parts - the first is about optimizing keyword creation and lookup to avoid unnecessary work (avoiding symbol creation and interning, using Strings as keys in the cache). The second part is really about optimizing cache clearing. Do you think these can be separated into two tickets?

Regarding the cache clearing part, have you tested a simpler strategy of just counting calls to clearCache() and running the clean scan every N calls? If that was almost as good, I'd be in favor of that over what is in the patch.

The kw-static paths version did not seem to be an improvement - perhaps you should try pulling them back together to simplify the code? Only worth it if there is a real improvement from it.

On the various find methods - if you could retain their ordering and minimize noise in the diffs that would really help make it easier to screen.

Finally, we need to do some tests to verify that we have not altered the performance of using keywords and symbols as keys in a map for lookup.

Comment by Kyle Kingsbury [ 05/Jun/14 2:36 PM ]

> On another look, I think it would be useful to separate this ticket into two parts - the first is about optimizing keyword creation and lookup to avoid unnecessary work (avoiding symbol creation and interning, using Strings as keys in the cache). The second part is really about optimizing cache clearing. Do you think these can be separated into two tickets?

Created dev.clojure.org/jira/browse/CLJ-1439 for reduced intern cost

> Regarding the cache clearing part, have you tested a simpler strategy of just counting calls to clearCache() and running the clean scan every N calls? If that was almost as good, I'd be in favor of that over what is in the patch.

I'm not confident that this work will be merged, so I'm kinda reticent to go off and spend another N hours doing benchmarks.

> The kw-static paths version did not seem to be an improvement - perhaps you should try pulling them back together to simplify the code? Only worth it if there is a real improvement from it.

It was obsoleted by a later commit; only included it in the benchmark because you asked about the perf impact.

> On the various find methods - if you could retain their ordering and minimize noise in the diffs that would really help make it easier to screen.

Done.

> Finally, we need to do some tests to verify that we have not altered the performance of using keywords and symbols as keys in a map for lookup.

This doesn't touch the lookup path; costs are equivalent.

Comment by Alex Miller [ 09/Jun/14 1:53 PM ]

reduced patch with only the keyword cache clearing changes

Comment by Alex Miller [ 09/Jun/14 8:53 PM ]

Patch that spins cache cleaning into a future

Comment by Kyle Kingsbury [ 21/Jul/14 2:20 PM ]

Just as a followup: got bit by this issue again in a data analysis project today: JSON parsing thrashes the reference queue really hard. Merging this patch would definitely be appreciated. Yourkit screenshot here: http://aphyr.com/media/clojure-keyword-refqueue.png

Comment by Kyle Kingsbury [ 21/Jul/14 4:58 PM ]

Oh yeah, once these two are merged, here's a patch that speeds up my EDN parsing-heavy hadoop jobs by 2-3x. Avoids constructing the symbol every time.

--- a/src/jvm/clojure/lang/EdnReader.java
+++ b/src/jvm/clojure/lang/EdnReader.java
@@ -299,10 +299,9 @@ private static Object matchSymbol(String s){
                        return null;
                        }
                boolean isKeyword = s.charAt(0) == ':';
-               Symbol sym = Symbol.intern(s.substring(isKeyword ? 1 : 0));
                if(isKeyword)
-                       return Keyword.intern(sym);
-               return sym;
+                       return Keyword.intern(s.substring(1));
+               return Symbol.intern(s);
                }
        return null;
 }
Comment by Kyle Kingsbury [ 21/Jul/14 6:33 PM ]
public static void clearCache() {
  if(rq.poll() != null) {
    Agent.soloExecutor.submit(new Runnable() {
      public void run() {
        Util.clearCache(rq,table);
      }
    });
  }
}

This spawns literally hundreds of new threads – 30-40 at a time in my workload – which fight over the referencequeue. Also it causes a fair bit of contention on the executor itself during keyword-heavy allocation-all threads have to synchronize on the executor's queue-but that seems secondary to the cost of the cache-clearing threads themselves.

How about adding a single-thread executor to Agent for these sorts of housekeeping jobs?

Comment by Alex Miller [ 21/Jul/14 8:14 PM ]

I actually built another patch that did exactly that but never got around to attaching it; a single-threaded executor reserved solely for cache clearing. I tried actually making it completely independent but I found it was pretty easy for it to fall behind - it needs to be connected into the construction process to avoid blowing the queue up too big.

I have not been able to build an isolated test that actually showed any significant performance difference with just your cache-clearing change (what's currently attached as keyword-cache.diff) and not the other changes. I had many problems getting your test code to run but when I did get something to run I was not able to see any significant difference with just the keyword-cache.diff.

Comment by Kyle Kingsbury [ 22/Jul/14 6:43 PM ]

Managed to eliminate the refqueue contention by having only one thread involved in GCing at a time. Also doesn't require messing with background threads, and is less susceptible to the queue-overflow problem. Since the various extant patches don't apply cleanly on top of each other, I've re-written them in unified-kw-patch.diff, attached. Roughly doubles throughput compared to your patch, at least on a 24-core xeon running openjdk7.

http://aphyr.com/media/clojure-reduced-kw-refqueue-contention.png

Can you please reconsider merging? I've put over a hundred hours into writing, testing, and refining this patchset; it's been stable in production for months and has made a dramatic difference in several of our data-heavy Clojure programs.

Comment by Alex Miller [ 23/Jul/14 10:58 AM ]

Hey Kyle, I appreciate the time you've put into this. However, having a big giant patch tuned on a single use case is not an effective way to evolve the language. We need to separate and describe problems, then explore the solution space for each one, as independently as possible, while considering the impacts on all other use cases.

This particular ticket is concerned solely with the linear cleanup of the reference queue. Can you split out just a patch that deals with this issue? It would be helpful to have a test that demonstrates the performance problem and how this patch address the problem. My testing so far with the prior patch did not demonstrate any improvement.

It would also be helpful to have a squashed version of the complement of the changes related to interning on CLJ-1439 for consideration of that as a separate problem. (And maybe there is further splitting that could be done; I have not looked closely at the interning changes.)

Comment by Alex Miller [ 23/Jul/14 11:00 AM ]

The EdnReader changes, for example, should be a separate ticket.

Comment by Kyle Kingsbury [ 23/Jul/14 12:30 PM ]

Could you at least merge dev.clojure.org/jira/browse/CLJ-1439 first? I split it into a separate ticket over a month ago and these changes depend on it.

Comment by Alex Miller [ 23/Jul/14 1:27 PM ]

I would be happy to consider CLJ-1439 first. Can you update the patch there to be current and focused on the intern/cache?

Comment by Kyle Kingsbury [ 23/Jul/14 2:35 PM ]

The patch is current, and it is focused on the intern/cache.





[CLJ-308] protocol-ize with-open Created: 21/Apr/10  Updated: 23/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Assembla Importer Assignee: Unassigned
Resolution: Unresolved Votes: 6
Labels: io

Attachments: Text File 0001-Added-ClosableResource-protocol-for-with-open.patch    
Patch: Code
Approval: Triaged

 Description   

Good use (and documentation example) of protocols: make with-open aware of a Closable protocol for APIs that use a different close convention. See http://groups.google.com/group/clojure/browse_thread/thread/86c87e1fc4b1347c



 Comments   
Comment by Assembla Importer [ 24/Aug/10 4:39 PM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/308

Comment by Tassilo Horn [ 23/Dec/11 5:11 AM ]

Added a CloseableResource protocol and extended it on java.io.Closeable (implemented by all Readers, Writers, Streams, Channels, Sockets). Use it in with-open.

All tests pass.

Comment by Tassilo Horn [ 23/Dec/11 7:14 AM ]

Seems to be related to Scopes (http://dev.clojure.org/jira/browse/CLJ-2).

Comment by Tassilo Horn [ 08/Mar/12 3:59 AM ]

Updated patch.

Comment by Andy Fingerhut [ 02/Apr/12 12:11 PM ]

Patch 0001-Added-ClosableResource-protocol-for-with-open.patch dated 08/Mar/12 applies, builds, and tests cleanly on latest master as of Apr 2 2012. Tassilo has signed a CA.

Comment by Tassilo Horn [ 13/Apr/12 11:23 AM ]

Updated patch to apply cleanly against master again.

Comment by Brandon Bloom [ 22/Jul/14 9:00 PM ]

I looked up this ticket because I ran in to a reflection warning: with-open does not hint it's binding with java.io.Closeable

Some feedback on the patch:

1) This is a breaking change for anyone relying on the close method to be duck-typed.

2) CloseableResource is a bit long. clojure.core.protocols.Closeable is plenty unambiguous.

3) Rather than extending CloseableResource to java.io.Closeable, you can use the little known (undocumented? unsupported?) :on-interface directive:

(defprotocol Closeable
  :on-interface java.io.Closeable
  (close [this]))

That would perform much better than the existing patch.

Comment by Tassilo Horn [ 23/Jul/14 7:12 AM ]

Hi Brandon, two questions:

Could 1) be circumvented somehow by providing a default implementation somehow? I guess the protocol could be extended upon Object with implementation (.close this), but that would give a reflection warning since Object has no close method. Probably one could extend upon Object and in the implementation search a "close" method using java.lang.reflect and throw an exception if none could be found?

Could you please tell me a bit more about the :on-interface option? How does that differ from extend? And how do I add the implementation, i.e., (.close this) with that option?





[CLJ-1470] Make Atom and ARef easy to subclass Created: 20/Jul/14  Updated: 23/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Aaron Craelius Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File CLJ-1470-v1.patch    
Patch: Code

 Description   

Atom is currently defined as final and ARef.validate() is package-private. This makes it impossible to define a subclass of an Atom and difficult subclass ARef (if validate() needs to be called).

I propose removing the final modifier from Atom, making ARef.validate() protected and also making Atom.state protected (it is currently package-private).

I'm not sure if there is a specific reason why Atom is final - if this is for performance reasons or to prevent someone from doing strange things with Atom's, but I can see a use case for sub-classing it.

One use-case is to create reactive Atom that allows derefs to be tracked (as in reagent). I have some Clojure (not Clojurescript) code where I'm trying to play with this idea and I've had to copy the entire Atom class (because it's sealed) and place it in the clojure.lang package (because ARef.validate() is package-private): https://github.com/aaronc/freactive/blob/master/src/java/clojure/lang/ReactiveAtom.java. In addition, I need to copy the defns for swap! and reset! into my own namespace. This seems a bit inconvenient.



 Comments   
Comment by Kevin Downey [ 23/Jul/14 12:55 AM ]

related to http://dev.clojure.org/jira/browse/CLJ-803





[CLJ-1422] Recur around try boxes primitives Created: 14/May/14  Updated: 22/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: compiler, performance, typehints


 Description   

Primitive function and recur variables can't pass through a (try) cleanly; they're boxed to Object instead. This causes reflection warnings for fns or loops that use primitive types.

user=> (set! *warn-on-reflection* true)
true
 
user=> (fn [] (loop [t 0] (recur t)))
#<user$eval676$fn__677 user$eval676$fn__677@3d80023a>
 
user=> (fn [] (loop [t 0] (recur (try t))))
NO_SOURCE_FILE:1 recur arg for primitive local: t is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: t
#<user$eval680$fn__681 user$eval680$fn__681@5419323a>

user=> (fn [^long x] (recur (try x)))
NO_SOURCE_FILE:1 recur arg for primitive local: x is not matching primitive, had: Object, needed: long

CompilerException java.lang.IllegalArgumentException:  recur arg for primitive local: x is not matching primitive, had: Object, needed: long, compiling:(NO_SOURCE_PATH:1:1)


 Comments   
Comment by David James [ 15/Jun/14 10:27 PM ]

Without commenting on the most desirable behavior, the following code does not cause reflection warnings:

user=> (set! *warn-on-reflection* true)
true
user=> (fn [] (loop [t 0] (recur (long (try t)))))
#<user$eval673$fn__674 user$eval673$fn__674@4e56c411>
Comment by Nicola Mometto [ 16/Jun/14 6:33 AM ]

Similar ticket http://dev.clojure.org/jira/browse/CLJ-701

Comment by Kevin Downey [ 21/Jul/14 6:59 PM ]

try/catch in the compiler only implements Expr, not MaybePrimitiveExpr, looking at extending TryExpr with MaybePrimitiveExpr it seems simple enough, but it turns out recur analyzes it's arguments in the statement context, which causes (try ...) to essentially wrap itself in a function like ((fn [] (try ...))), at which point it is an invokeexpr which is much harder to add maybeprimitiveexpr too and it reduces to the same case as CLJ-701

Comment by Kevin Downey [ 22/Jul/14 9:27 PM ]

http://dev.clojure.org/jira/browse/CLJ-701 has a patch that I think solves this





[CLJ-1232] Functions with non-qualified return type hints force import of hinted classes when called from other namespace Created: 18/Jul/13  Updated: 22/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Tassilo Horn Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: typehints

Approval: Triaged

 Description   

You can add a type hint to function arglists to indicate the return type of a function like so.

user> (import '(java.util List))
java.util.List
user> (defn linkedlist ^List [] (java.util.LinkedList.))
#'user/linkedlist
user> (.size (linkedlist))
0

The problem is that now when I call `linkedlist` exactly as above from another namespace, I'll get an exception because java.util.List is not imported in there.

user> (in-ns 'user2)
#<Namespace user2>
user2> (refer 'user)
nil
user2> (.size (linkedlist))
CompilerException java.lang.IllegalArgumentException: Unable to resolve classname: List, compiling:(NO_SOURCE_PATH:1:1)
user2> (import '(java.util List)) ;; Too bad, need to import List here, too.
java.util.List
user2> (.size (linkedlist))
0

There are two workarounds: You can import the hinted type also in the calling namespace, or you always use fully qualified class names for return type hints. Clearly, the latter is preferable.

But clearly, that's a bug that should be fixed. It's not in analogy to type hints on function parameters which may be simple-named without having any consequences for callers from other namespaces.



 Comments   
Comment by Andy Fingerhut [ 16/Apr/14 3:47 PM ]

To make sure I understand, Nicola, in this ticket you are asking that the Clojure compiler change behavior so that the sample code works correctly with no exceptions, the same way as it would work correctly without exceptions if one of the workarounds were used?

Comment by Tassilo Horn [ 17/Apr/14 12:18 AM ]

Hi Andy. Tassilo here, not Nicola. But yes, the example should work as-is. When I'm allowed to use type hints with simple imported class names for arguments, then doing so for return values should work, too.

Comment by Rich Hickey [ 10/Jun/14 10:41 AM ]

Type hints on function params are only consumed by the function definition, i.e. in the same module as the import/alias. Type hints on returns are just metadata, they don't get 'compiled' and if the metadata is not useful to consumers in other namespaces, it's not a useful hint. So, if it's not a type in the auto-imported set (java.lang), it should be fully qualified.

Comment by Alex Miller [ 10/Jun/14 11:55 AM ]

Based on Rich's comment, this ticket should probably morph into an enhancement request on documentation, probably on http://clojure.org/java_interop#Java Interop-Type Hints.

Comment by Andy Fingerhut [ 10/Jun/14 3:13 PM ]

I would suggest something like the following for a documentation change, after this part of the text on the page Alex links in the previous comment:

For function return values, the type hint can be placed before the arguments vector:

(defn hinted
(^String [])
(^Integer [a])
(^java.util.List [a & args]))

-> #user/hinted

If the return value type hint is for a class that is outside of java.lang, which is the part auto-imported by Clojure, then it must be a fully qualified class name, e.g. java.util.List, not List.

Comment by Nicola Mometto [ 10/Jun/14 4:02 PM ]

I don't understand why we should enforce this complexity to the user.
Why can't we just make the Compiler (or even defn itself) update all the arglists tags with properly resolved ones? (that's what I'm doing in tools.analyzer.jvm)

Comment by Alexander Kiel [ 19/Jul/14 10:02 AM ]

I'm with Nicola here. I also think that defn should resolve the type hint according the imports of the namespace defn is used in.

Comment by Max Penet [ 22/Jul/14 7:06 AM ]

Same here, I was bit by this in the past. The current behavior is clearly counterintuitive.





[CLJ-1298] Add more type predicate fns to core Created: 21/Nov/13  Updated: 22/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Alex Fowler Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None


 Description   

Add more built-in type predicates:

1) Definitely missing: (atom? x), (ref? x), (deref? x), (named? x), (map-entry? x), (lazy-seq? x).
2) Very good to have: (throwable? x), (exception? x), (pattern? x).

The first group is especially important for writing cleaner code with core Clojure.



 Comments   
Comment by Alex Miller [ 21/Nov/13 8:42 AM ]

In general many of the existing predicates map to interfaces. I'm guessing these would map to checks on the following types:

atom? = Atom (final class)
ref? = IRef (interface)
deref? = IDeref (interface)
named? = Named (interface, despite no I prefix)
map-entry? = IMapEntry (interface)
lazy-seq? = LazySeq (final class)

throwable? = Throwable
exception? = Exception, but this seems less useful as it feels like the right answer when you likely actually want throwable?
pattern? = java.util.regex.Pattern

Comment by Alex Fowler [ 21/Nov/13 9:02 AM ]

Yes, they do, and sometimes the code has many checks like (instance? clojure.lang.Atom x). Ok, you can write a little function (atom? x) but it has either to be written in all relevant namespaces or required/referred there from some extra namespace. All this is just a burden. For example, we have predicates like (var? x) or (future? x) which too map to Java classes, but having them abbreviated often makes possible to write a cleaner code.

I feel the first group to be especially significant for it being about core Clojure concepts like atom and ref. Having to fall to manual Java classes check to work with them feels inorganic. Of course we can, but why then do we have (var? x), (fn? x) and other? Imagine, for example:

(cond
(var? x) (...)
(fn? x) (...)
(instance? clojure.lang.Atom x) (...)
(or (instance? clojure.lang.Named x) (instance? clojure.lang.LazySeq x)) (...))

vs

(cond
(var? x) (...)
(fn? x) (...)
(atom? x) (...)
(or (named? x) (lazy-seq? x)) (...))

The second group is too, essential since these concepts are fundamental for the platform (but you're right with the (exception? x) one).

Comment by Alex Fowler [ 22/Nov/13 6:35 AM ]

Also, obviously I missed the (boolean? x) predicate in the original post. Did not even guess it is absent too until I occasionally got into it today. Currently the most clean way we have is to do (or (true? x) (false? x)). Needles to say, it looks weird next to the present (integer? x) or (float? x).

Comment by Brandon Bloom [ 22/Jul/14 1:02 AM ]

Predicates for core types are also very useful for portability to CLJS.

Comment by Brandon Bloom [ 22/Jul/14 1:05 AM ]

I'd be happy to provide a patch for this, but I'd prefer universal interface support where possible. Therefore, this ticket, in my mind, is behind http://dev.clojure.org/jira/browse/CLJ-803 etc.

Comment by Alex Miller [ 22/Jul/14 6:12 AM ]

I don't think it's worth making a ticket for this until Rich has looked at it and determined which parts are wanted.





[CLJ-891] make (symbol foo "bar") work with foo being a namespace Created: 05/Dec/11  Updated: 22/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Joe Gallo Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None

Attachments: Text File 0001-add-Symbol.intern-arity-for-a-Namespace-and-String.patch    
Patch: Code

 Description   

I've run across the need to do (symbol ns "bar") a few times, and the existing approach (symbol (name (ns-name ns)) "bar") just doesn't seem like it ought to be the only way to do the job. Includes a patch to make this work, by adding a new arity to Symbol.intern().

Some discussion on this idea, here: https://groups.google.com/forum/#!topic/clojure/n25aZ5HA7hc/discussion



 Comments   
Comment by Stuart Halloway [ 09/Dec/11 9:39 AM ]

I am not sure I like this, but I would like a rethink of names and namespaces. Doing a lot of cross language work, it would be great to have protocols for "I have a name" and for "I have a namespace".

With such protocols in place, it would also be possible to separately consider implementing symbol et al in terms of them.

Comment by Kevin Downey [ 09/Dec/11 12:24 PM ]

Named being a protocol or an interface seems orthogonal to being able to create a symbol qualified with a namespace when you have a namespace in hand.

I don't think the patch goes far enough, not only should (symbol ns "foo") be supported, but also (symbol ns 'foo), given that (symbol 'foo) works and (symbol "foo") works, (symbol 'bar 'foo) should also work, but doesn't.

if Named is a protocol, and if you extend it to String, and if you make the symbol function create symbols from one or two Named things you still end up having to do (symbol (ns-name ns) 'foo) or (symbol (ns-name ns) "foo")

Comment by Joe Gallo [ 16/Dec/11 4:18 PM ]

Stuart, I'm not opposed to the idea of separate protocols for Named and Namespaced. Where should I go about creating a proposal to create those protocols and get them into clojure? I'm interested in doing the leg-work, or being a part of it. But as an outsider, I don't know what to do next – creating a ticket in Jira exhausted my knowledge of the process.

Comment by Frank Siebenlist [ 02/Mar/12 4:53 PM ]

The same enhancement that joe suggests for symbol, would also apply to keyword.

See: http://groups.google.com/group/clojure/browse_thread/thread/222e4abc16df8b20

Probably same/similar solution applies to both issues.

-FrankS.

Comment by Ambrose Bonnaire-Sergeant [ 22/Jul/14 1:10 AM ]

I did vote on this, but I retracted my vote. Strings for both arguments is clearly a correct implementation to me, but combinations of symbols/namespaces/named/namespaced things are not so clear.





[CLJ-1471] Option to print type info Created: 21/Jul/14  Updated: 21/Jul/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Pascal Germroth Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: pprint


 Description   

I've had an issue with defrecord-types being converted into ordinary maps somewhere, which was relatively hard to track down inside a deep structure since they are pprinted as the same thing by default.
The following code patches into the pprint dispatch and prints the type around values; it turned out to be quite useful, but feels hackish.
Maybe something like that would be useful to integrate into clojure.pprint directly (there are a number of cosmetic options already), i.e. into clojure.pprint/write-out.

Only printing (type) may not be enough in some cases; so an option to print all metadata would be nice.
Maybe something like :metadata nil as default, :metadata :type to print types (but also for non-IMetas, using (type) and :metadata true to print metadata for IMetas using (meta).

(defn pprint-with-type
  ([object] (pprint object *out*))
  ([object writer]
   ; keep original dispatch.
   ; calling it directly will print only that object,
   ; but return to our dispatch for subobjects.
   (let [dispatch clojure.pprint/*print-pprint-dispatch*]
     (binding [clojure.pprint/*print-pprint-dispatch*
               (fn [obj]
                 (if (instance? clojure.lang.IMeta obj)
                   (do (print "^{:type ")
                       (dispatch (type obj))
                       (print "} ")
                       (clojure.pprint/pprint-newline :fill)
                       (dispatch obj))
                   (do (print "(^:type ")
                       (dispatch (type obj))
                       (print " ")
                       (clojure.pprint/pprint-newline :fill)
                       (dispatch obj)
                       (print ")"))))]
       (clojure.pprint/pprint object writer)))))





[CLJ-373] update-in with empty key paths Created: 04/Jun/10  Updated: 20/Jul/14  Resolved: 10/Jun/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Assembla Importer Assignee: Unassigned
Resolution: Declined Votes: 7
Labels: None

Attachments: Text File 0001-CLJ-373-Alter-behaviour-of-update-in-with-empty-key-.patch     Text File 0001-Support-empty-path-in-update-in.-CLJ-373.patch     Text File clj-373-alter-behavior-of-update-in-with-empty-key-patch2.txt     Text File CLJ-373-nested-ops.patch    
Patch: Code and Test
Approval: Triaged
Waiting On: Chouser

 Description   

To the topic of get-in and update-in. While I realize this is not a bug it is odd and in my eyes unexpected and unwanted behavior.

get-in called with an empty path returns the hashmap it was given to walk through - this is as I expect and makes a lot of sense when dynamically (with generated pathes ) walking a hash map.

update-in behaves differently and while from the implementation side, it's behavior too makes sense, it does not work as expected (at least not for me) update-in with an empty map creates a new key 'nil' so:

(update-in {...} [] f) ist he same as (update-in {...} [nil] f) while (get-in {...} []) is not the same as (get-in {...} [nil]) and of cause differs from what update-in does.

For automatically walking trees the behavior of get-in makes a lot more sense since the current behavior of update-in forces you to check for empty paths and if they are empty fall back to plain assoc and get (or get-in since this works):

(if-let [r (butlast @path)]
(do
  (alter m update-in r dissoc (last @path))
  (alter m update-in r assoc {:name @sr} c))
(do
  (alter m dissoc (last @path))
  (alter m assoc {:name @sr} c)))

Next argument is that update-in with an empty map working on nil isn't easy to gasp, one needs to know the implementation details to realize that it works, I think 90% of the people reading update-in with [] will not instinctively know that it works on the key nil, so changing this would most likely not break any current code, and if it would the code would be bad anyway .

Chouser has, a very nice solution on the mailing list that would fix the problem I'm not sure if I'm entitled to post it here since I did not wrote it but it can be found in this thread: http://groups.google.com/group/clojure/browse_thread/thread/de5b20b8c3fe498b?hl=en

Regards,
Heinz



 Comments   
Comment by Assembla Importer [ 08/Oct/10 10:33 AM ]

Converted from http://www.assembla.com/spaces/clojure/tickets/373

Comment by Chouser [ 13/Nov/10 6:37 PM ]

I've attached my code from the g.group thread in the form of a patch, in case it's sufficient. (Thanks to abedra for the gentle kick in the pants.)

Comment by Stuart Halloway [ 29/Nov/10 8:20 AM ]

Looks to me like the patch is misusing if-let, e.g.

(when-let [[k & mk] []] "Doh!") 
=> Doh!

Please correct, and add tests for nil and [] keys (at least).

Comment by Scott Lowe [ 08/May/12 12:20 PM ]

I will write some tests and correct this.

Comment by Scott Lowe [ 09/May/12 8:39 PM ]

I'm sorry to report that my good intentions of wanting to help clear some of the project backlog has created more work by way of further questions. I'd also like to clarify the desired new behaviours for the test cases.

Heinz proposed that an empty key sequence will not create a new nil key in the returned map.
He also suggested that the following behaviour changes be made (compare old and new* behaviours):

 
(update-in {1 2} [] (constantly {2 3}))
{nil {2 3}, 1 2}

(update-in* {1 2} [] (constantly {2 3}))
{2 3}

(update-in {1 2} [] assoc  1 3)
{nil {1 3}, 1 2}

(update-in* {1 2} [] assoc  1 3)
{1 3}

Chouser also added that nil keys should be honoured, as before:

 
(update-in* {nil 2} [nil] (constantly 3))
{nil 3}

I've added a variety of tests to cover the existing behaviour and would like to confirm that the above is all that's required for new behaviour.

The patch from November 2010 didn't work, but I tweaked it with a when-let as Stuart suggested and placed a check for an empty sequence of keys before the when-let block; because essentially, the primary behaviour change boils down to simply handling an empty sequence of keys, in addition to the existing behaviours.

I'm not entirely convinced that these changes are a good thing, but at least there's now something concrete for discussion. Please have a look at what is there. The good news is that at least there are some tests covering update-in now.

Comment by Andy Fingerhut [ 24/May/12 10:35 PM ]

clj-373-alter-behavior-of-update-in-with-empty-key-patch2.txt dated May 24, 2012 supersedes patch 0001-CLJ-373Alter-behaviour-of-update-in-with-empty-key.patch dated May 9, 2012. It makes no changes to the contents of the patch except in the lines of context that have changed since the earlier patch was created. It applies cleanly to the latest master whereas the earlier patch no longer does. Builds and passes tests with Oracle/Apple JDK 1.6 on Mac OS X 10.6.8.

Comment by Fogus [ 15/Aug/12 11:21 AM ]

This patch creates a new error mode highlighted by the following:

(update-in {:a 1} [] inc)

; ClassCastException clojure.lang.PersistentArrayMap cannot be cast to java.lang.Number

The reason is that the code that executes in the event that the keys are empty (or nil) blindly assumes that the function given can be applied to the map itself. This is no problem in the case of assoc and the like, but clearly not for inc and many other useful functions.

The old behavior was to throw an NPE and if any code relies on that fact is now broken. Maybe this is not a problem, but I'm kicking it back to get a little more discussion and to request that whatever the ultimate fix happens to be, its behavioral changes and error modes are noted in the docstring for update-in.

Comment by Gunnar Völkel [ 08/Sep/12 6:11 AM ]

I vote for changing `update-in` to support empty key vectors.

Because I think "(apply f m args)" in case of an empty vector is the natural continuation of the usual behavior of update-in:

  • update-in with 2 keys the second level map is updated
  • update-in with 1 key the first level map (child of given map) is updated
  • update-in with no key the given map (zero level) is updated.

Otherwise, you will always have to wrap update-in in something like the following when the keys vector is computed:

(if (seq keys) (apply update-in m keys f args) (apply f m args))

To Fogus' last post: (update-in {:a {:b 1}} [:a] inc) fails similar and is not handled with a special exception.

Comment by Tim McCormack [ 25/Nov/13 11:45 AM ]

Fogus, if there was code relying on that brokenness, I'd say it was only working by accident.

Comment by Ambrose Bonnaire-Sergeant [ 05/Jan/14 9:13 AM ]

Attached new patch CLJ-373-nested-ops.patch.

It implements update-in with an empty path equivalent to (apply f m args).

Since they are clearly related, I changed assoc-in to throw an error on an empty path, and updated {update,get,assoc}-in's docstring to reflect these changes.

I changed the terminology of a "sequence of keys" to a "path of keys (a sequence)", and henceforth referred to the sequence as a key path, or the name of the related arg.

Comment by Andy Fingerhut [ 14/Feb/14 11:42 AM ]

Patch CLJ-373-nested-ops.patch dated Jan 5 2014 no longer applies cleanly to latest Clojure master as of Feb 14 2014. It did on Feb 7 2014. I haven't checked in detail, but this is probably simply due to some tests recently added to a test file that require updating some diff context lines.

Comment by Ambrose Bonnaire-Sergeant [ 24/Mar/14 12:39 PM ]

Updated with clean patch.

Comment by Timothy Baldridge [ 30/Jun/14 10:33 AM ]

Can we get some feedback on why this was rejected (with no comments)? Seems like a lot of work/interest went into this. Would be nice to know why all that was thrown out without ceremony.

Comment by Andy Fingerhut [ 01/Jul/14 6:24 PM ]

Standard disclaimer: I'm not in the loop. I have a guess where the loop is, but I'm nowhere near it.

Timothy, I doubt many people look at comments on tickets after they are closed, except perhaps Alex Miller and I. If you have as good a communication channel with Rich as Alex does, asking him may be the only way to find out. Other tickets with new functions proposed for clojure.core have been closed with comments similar to "we are not interested in adding this functionality to Clojure at this time". That might be all there is to know.

Comment by Anders Hovmöller [ 20/Jul/14 2:09 PM ]

I was bitten by this today. I had to insert an extra "(if (= (count path) 0) <do something else>...". Was quite surprised at the extra nil inserted into my map!





Generated at Sat Jul 26 10:23:35 CDT 2014 using JIRA 4.4#649-r158309.