<< Back to previous view

[CLJ-1449] Add clojure.string functions for portability to ClojureScript Created: 19/Jun/14  Updated: 03/Sep/15  Resolved: 03/Sep/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.8

Type: Enhancement Priority: Major
Reporter: Bozhidar Batsov Assignee: Nola Stowe
Resolution: Completed Votes: 29
Labels: ft, string

Attachments: Text File add_functions_to_strings-2.patch     Text File add_functions_to_strings-3.patch     Text File add_functions_to_strings-4.patch     Text File add_functions_to_strings-5.patch     Text File add_functions_to_strings-6.patch     Text File add_functions_to_strings-7.patch     Text File add_functions_to_strings.patch     Text File clj-1449-more-v1.patch    
Patch: Code and Test
Approval: Ok


It would be useful if a few common functions from Java's String were available as Clojure functions for the purposes of increasing portability to other Clojure platforms like ClojureScript.

The functions below also cover the vast majority of cases where Clojure users currently drop into Java interop for String calls - this tends to be an issue for discoverability and learning. While the goal of this ticket is increased portability, improving that is a nice secondary benefit.

Proposed clojure.string fn java.lang.String method
index-of indexOf
last-index-of lastIndexOf
starts-with? startsWith
ends-with? endsWith
includes? contains


  • clj-1449-7.patch - uses nil to indicate not-found in index-of

Performance: Tested the following with criterium for execution mean time (all times in ns).

Java Clojure Java Clojure (-4 patch) Clojure (-6 patch)
(.indexOf "banana" "n") (clojure.string/index-of "banana" "n") 8.70 9.03 9.27
(.indexOf "banana" "n" 1) (clojure.string/index-of "banana" "n" 1) 7.29 7.61 7.66
(.indexOf "banana" (int \n)) (clojure.string/index-of "banana" \n) 5.34 6.20 6.20
(.indexOf "banana" (int \n) 1) (clojure.string/index-of "banana" \n 1) 5.38 6.19 6.24
(.startsWith "apple" "a") (clojure.string/starts-with? "apple" "a") 4.84 5.71 5.65
(.endsWith "apple" "e") (clojure.string/ends-with? "apple" "e") 4.82 5.68 5.70
(.contains "baked apple pie" "apple") (clojure.string/includes? "baked apple pie" "apple") 10.78 11.99 12.17

Screened by: Stu, who prefers nil over -1 - add_functions_to_strings-6.patch

Note: In both Java and JavaScript, indexOf will return -1 to indicate not-found, so this is (at least in these two cases) the status quo. The benefit of nil return in Clojure is that it can be used as a found/not-found condition. The benefit of a -1 return is that it matches Java/JavaScript and that the return type can be hinted as a primitive long, allowing you to feed it into some other arithmetic or string operation without boxing.

Comment by Alex Miller [ 19/Jun/14 12:53 PM ]

Re substring, there is a clojure.core/subs for this (predates the string ns I believe).

([s start] [s start end])
Returns the substring of s beginning at start inclusive, and ending
at end (defaults to length of string), exclusive.

Comment by Jozef Wagner [ 20/Jun/14 3:21 AM ]

As strings are collection of characters, you can use Clojure's sequence facilities to achieve such functionality:

user=> (= (first "asdf") \a)
user=> (= (last "asdf") \a)
Comment by Alex Miller [ 20/Jun/14 8:33 AM ]

Jozef, String.startsWith() checks for a prefix string, not just a prefix char.

Comment by Bozhidar Batsov [ 20/Jun/14 9:42 AM ]

Re substring, I know about subs, but it seems very odd that it's not in the string ns. After all most people will likely look for string-related functionality in clojure.string. I think it'd be best if `subs` was added to clojure.string and clojure.core/subs was deprecated.

Comment by Pierre Masci [ 01/Aug/14 5:27 AM ]

Hi, I was thinking the same about starts-with and .ends-with, as well as (.indexOf s "c") and (.lastIndexOf "c").

I read the whole Java String API recently, and these 4 functions seem to be the only ones that don't have an equivalent in Clojure.
It would be nice to have them.

Andy Fingerhut who maintains the Clojure Cheatsheet told me: "I maintain the cheatsheet, and I put .indexOf and .lastIndexOf on there since they are probably the most common thing I saw asked about that is in the Java API but not the Clojure API, for strings."
Which shows that there is a demand.

Because Clojure is being hosted on several platforms, and might be hosted on more in the future, I think these functions should be part of the de-facto ecosystem.

Comment by Andy Fingerhut [ 30/Aug/14 3:39 PM ]

Updating summary line and description to add contains? as well. I can back this off if it changes your mind about triaging it, Alex.

Comment by Andy Fingerhut [ 30/Aug/14 3:40 PM ]

Patch clj-1449-basic-v1.patch dated Aug 30 2014 adds starts-with? ends-with? contains? functions to clojure.string.

Patch clj-1449-more-v1.patch is the same, except it also replaces several Java method calls with calls to these Clojure functions.

Comment by Andy Fingerhut [ 05/Sep/14 1:02 PM ]

Patch clj-1449-basic-v1.patch dated Sep 5 2014 is identical to the patch I added recently called clj-1149-basic-v1.patch. It is simply renamed without the typo'd ticket number in the file name.

Comment by Yehonathan Sharvit [ 02/Dec/14 3:09 PM ]

What about an implementation that works also in cljs?

Comment by Bozhidar Batsov [ 02/Dec/14 3:11 PM ]

Once this is added to Clojure it will be implemented in ClojureScript as well.

Comment by Yehonathan Sharvit [ 02/Dec/14 3:22 PM ]

Great! Any idea when it will be added to Clojure?
Also, will it be automatically added to Clojurescript or someone will have to write a particular code for it.
The suggested patch relies on Java so I am curious to understand who is going to port the patch to cljs.

Comment by Bozhidar Batsov [ 02/Dec/14 3:27 PM ]

No idea when/if this will get merged. Upvote the ticket to improve the odds of this happening sooner.
Someone on the ClojureScript team will have to implement this in terms of JavaScript.

Comment by Alex Miller [ 02/Dec/14 4:01 PM ]

Some things that would be helpful:

1) It would be better to combine the two patches into a single patch - I think changing current uses into new users is a good thing to include. Also, please keep track of the "current" patch in the description.
2) Patch needs tests.
3) Per the instructions at the top of the clojure.string ns (and the rest of the functions), the majority of these functions are implemented to take the broader CharSequence interface. Similar to those implementations, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String.
4) Consider return type hints - I'm not sure they're necessary here, but I would examine bytecode for typical calling situations to see if it would be helpful.
5) Check performance implications of the new versions vs the old with a good tool (like criterium). You've put an additional var invocation and (soon) type check in the calling path for these functions. I think providing a portable target is worth a small cost, but it would be good to know what the cost actually is.

I don't expect we will look at this until after 1.7 is released.

Comment by Andy Fingerhut [ 02/Dec/14 8:05 PM ]

Alex, all your comments make sense.

If you think a ready-and-waiting patch that does those things would improve the odds of the ticket being vetted by Rich, please let us know.

My guess is that his decision will be based upon the description, not any proposed patches. If that is your belief also, I'll wait until he makes that decision before working on a patch. Of course, any other contributor is welcome to work on one if they like.

Comment by Alex Miller [ 02/Dec/14 8:40 PM ]

Well nothing is certain of course, but I keep a special report of things I've "screened" prior to vetting that makes possible moving something straight from Triaged all the way through into Screened/Ok when Rich is able to look at them. This is a good candidate if things were in pristine condition.

That said, I don't know whether Rich will approve it or not, so it's up to you. I think the argument for portability is a strong one and complements the feature expression.

Comment by Alex Miller [ 14/May/15 8:55 AM ]

I'd love to have a really high-quality patch on this ticket to consider for 1.8 that took into account my comments above.

Also, it occurred to me that I don't think this should be called "contains?", overlapping the core function with a different meaning (contains value vs contains key). Maybe "includes?"?

Comment by Andy Fingerhut [ 14/May/15 9:14 AM ]

clojure.string already has 2 name conflicts with clojure.core, for which most people probably do something like (require '[clojure.string :as str]) to avoid that:

user=> (use 'clojure.string)
WARNING: reverse already refers to: #'clojure.core/reverse in namespace: user, being replaced by: #'clojure.string/reverse
WARNING: replace already refers to: #'clojure.core/replace in namespace: user, being replaced by: #'clojure.string/replace
Comment by Alex Miller [ 14/May/15 10:05 AM ]

I'm not concerned about overlapping the name. I'm concerned about overlapping it and meaning something different, particularly vs one of the most confusing functions in core already. clojure.core/contains? is better than linear time key search. clojure.string/whatever will be a linear time subsequence match.

Comment by Alex Miller [ 14/May/15 10:18 AM ]

Ruby uses "include?" for this.

Comment by Devin Walters [ 19/May/15 4:56 PM ]

I agree with Alex's comment about the overlap. Personally, I prefer the way "includes?" reads over "include?", but IMO either one is better than adding to the "contains?" confusion.

Comment by Bozhidar Batsov [ 20/May/15 12:10 AM ]

I'm fine with `includes?`. Ruby is famous for the bad English used in its core library.

Comment by Andrew Rosa [ 17/Jun/15 12:29 PM ]

Andy Fingerhut, since you are the author of the original patch do you desire to continue the work on it? If you prefer (or even need) to focus on different stuff, I would like to tackle this issue.

Comment by Andy Fingerhut [ 17/Jun/15 1:08 PM ]

Andrew Rosa, I am perfectly happy if someone else wants to work on a revised patch for this. If you are interested, go for it! And from Alex's comments, I believe my initial patch covers such a small fraction of what he wants, do not worry about keeping my name associated with any patch you submit – if yours is accepted, my patch will not have helped you in getting there.

Comment by Nola Stowe [ 03/Jul/15 9:01 AM ]

Andrew Rosa, I am interested on working on this. Are you currently working it?

Comment by Andrew Rosa [ 03/Jul/15 12:08 PM ]

Thanks Andy Fingerhut, didn't saw the answer here. I'm happy that Nola Stowe could take this one.

Comment by Nola Stowe [ 08/Jul/15 12:46 PM ]

Updated patch to rename methods as specified and add tests.

Comment by Bozhidar Batsov [ 08/Jul/15 3:26 PM ]

A few remarks:

1. added should say 1.8, not 1.9.
2. The docstrings could be improved a bit - normally they should end with a . and refer to all the params of the function in question.

Comment by Nola Stowe [ 08/Jul/15 3:29 PM ]

Thanks Bozhidar Batsov .. I will, and I also got feedback from the slack clojure-dev group (Alex, Ghadi, AndrewHr) so I will be submitting an improved patch soon

Comment by Andy Fingerhut [ 09/Jul/15 11:57 AM ]

Also, I am not sure, but Alex Miller made this comment earlier: "Per the instructions at the top of the clojure.string ns, functions should take the broader CharSequence interface instead of String. Similar to existing clojure.string functions, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String."

Looking at the Java classes and interfaces, one of the classes that implements CharSequence is CharBuffer, and it has no indexOf or lastIndexOf methods. If one were to call the proposed index-of or last-index-of with a CharBuffer, I believe you would get an exception for an unknown method.

I don't know if Alex's sentence is essential to any patch to be considered, but wanted to point out that it seems like it effectively means writing your own implementation of each of these methods for some of the classes implementing the CharSequence interface.

Comment by Nola Stowe [ 09/Jul/15 4:15 PM ]

Andy, yes you are right. Alex helped me with my current changes (not posted yet) and where needed I cast the value to a string so that it would have the indexOf methods.

Comment by Nola Stowe [ 10/Jul/15 7:43 AM ]

Added type hints, using StringBuffer instead of string in tests.

Comment by Nola Stowe [ 10/Jul/15 11:53 PM ]

Use critium and did some bench marking, I am not familiar enough to say "hey its good or no, too slow". I tried an optimization but it didn't make much difference.

notes here:


Appreciate any feedback

Comment by Alex Miller [ 13/Jul/15 10:37 AM ]

Nola, some comments:

1) In index-of and last-index-of, I think we should use (instance? Character value) instead of char? - this will avoid a function invocation.

2) In index-of and last-index-of, in the branches where you know you have a Character, we should just invoke the proper function on the character instance directly rather than calling int. In this case that is (.charValue ^Character value). However, this function returns char so you still want the ^int hint.

3) In index-of and last-index-of, we should do a couple of things to maximally utilize primitive support for from-index. We should type-hint the incoming value as a ^long (only long and double are specialized cases in Clojure, not int). We should then use the function unchecked-int to optimally convert the long primitive to an int primitive.

4) We should type-hint the return value to a primitive long to avoid boxing the return.

Putting 1+2+3+4 together:

(defn index-of
  "Return index of value (string or char) in s, optionally searching forward from from-index."
  {:added "1.8"}
  (^long [^CharSequence s value]
   (if (instance? Character value)
     (.indexOf (.toString s) ^int (.charValue ^Character value))
     (.indexOf (.toString s) ^String value)))
  (^long [^CharSequence s value ^long from-index]
  (if (instance? Character value)
    (.indexOf (.toString s) ^int (.charValue ^Character value) (unchecked-int from-index))
    (.indexOf (.toString s) ^String value (unchecked-int from-index)))))

Similar changes in last-index-of.

5) In starts-with? and ends-with?, I would move the type hint for substr to the parameter declaration. This is more of a style preference on my part, no functional difference I believe.

6) On includes? I would do the same as #5, except .contains actually takes a CharSequence for the substr, so I would change the type hint from ^String to the broader ^CharSequence.

7) Can you edit the description and add a table with the "execution time mean" number for direct Java vs new Clojure fn for the patch after these changes? Please add tests for the from-index variants of index-of and last-index-of as well.

Comment by Nola Stowe [ 14/Jul/15 5:46 PM ]

optimizations as suggested by Alex Miller

Comment by Andy Fingerhut [ 14/Jul/15 5:50 PM ]

Nola, not a big deal, but Rich has requested that patch file names end with ".patch" or ".diff", I think because whatever he uses to view them recognizes that suffix and puts it in a different viewing/editing mode.

Comment by Nola Stowe [ 14/Jul/15 6:03 PM ]

Andy, so no need to keep around patches when used in comparisons?

Comment by Andy Fingerhut [ 14/Jul/15 8:31 PM ]

It is ok to have multiple versions of patches attached, but names like add_functions_to_strings-2.patch etc. would be preferred over names like add_functions_to_strings.patch-2, so the suffix is ".patch".

Comment by Nola Stowe [ 14/Jul/15 8:55 PM ]

original patch submitted by Nola

Comment by Nola Stowe [ 14/Jul/15 8:56 PM ]

Optimizations suggested by Alex

Comment by Alex Miller [ 14/Jul/15 9:44 PM ]

Nola, I re-did the timings on my machine - these were done with Java 1.8 and no Leiningen involved.

Comment by Alex Miller [ 14/Jul/15 9:57 PM ]

-4 patch is identical to -3 but removes one errant ^long type hint

Otherwise, looks good to me!

Comment by Alex Miller [ 17/Jul/15 10:24 PM ]

Rich, since you didn't put any comments on here, I'm not sure if there's anything else you wanted, so marking screened.

Comment by Stuart Halloway [ 19/Jul/15 7:36 AM ]

I dislike the approach here for several reasons:

  • floats Java-isms (-1 for missing instead of nil?) up into the Clojure API
  • makes narrow string fns out of things that could/should be seq fns

Stepping back, this is all library work. Why should it be in core? If this started in a contrib, it could evolve much more freely.

Comment by Alex Miller [ 19/Jul/15 8:49 AM ]

re Java-isms: totally fair comment worth changing

re narrow string fns: if these were seq fns, they would presumably take seqables of chars and return seqables of chars. When I have strings, I want to work with strings (as with clojure.string/join and clojure.string/reverse vs their seq equivalents).

re library: these particular functions are the most commonly used string functions where current users drop to Java interop. Note that all 5 of these functions have over time been added to the cheatsheet (http://clojure.org/cheatsheet) because they are commonly used. I believe there is significant value in including them in core and standardizing their name and behavior across to CLJS.

Comment by Bozhidar Batsov [ 19/Jul/15 10:29 AM ]

I'm obviously biased, but I totally agree with everything Alex said. Having one core string library and a separate contrib string library is just impractical. With clojure.string already part of the language it's much more sensible to extend it where/when needed instead of encouraging the proliferation of tons of alternative libraries (there are already a couple of those out there, mostly because essential functions are missing). Initial designs are rarely perfect, there's nothing wrong in revisiting and improving them.

I believe this is one of the top voted tickets, which clearly indicates that the Clojure community is quite supportive of this additions.

Comment by Michael Blume [ 19/Jul/15 12:26 PM ]

None of these functions return Strings so they could very easily be seq functions which use the String methods as a fast path when called with Strings, the trouble is then they wouldn't necessarily belong in clojure.string.

Comment by Bozhidar Batsov [ 19/Jul/15 1:49 PM ]

Such an argument can be made for existing functions like clojure.core/subs and clojure.string/reverse - we could have just went with generic functions that operate on seqs and converted the results to strings, but we didn't. It might be just me, but I'm really thinking this is being overanalysed. Sometimes it's really preferable to deal with some things as strings (not to mention more efficient).

Comment by Rich Hickey [ 20/Jul/15 9:39 AM ]

I completely agree with Alex and Bozhidar re: strings vs seqs. Other langs that have only lists and use them for strings are really bad at strings. So - string-specific fns are ok. What about the Java-isms?

Comment by Bozhidar Batsov [ 20/Jul/15 10:08 AM ]

We should address them for sure. Let's wait for a couple of days for Nola to update the patch. If she's too busy I'll do the update myself.

Comment by Nola Stowe [ 20/Jul/15 10:50 AM ]

I will be happy to work on it Saturday. Is nil an acceptable response when index-of "apple" "p" is not found? The other functions return booleans and I think those are ok.

Comment by Bozhidar Batsov [ 20/Jul/15 10:55 AM ]

Yep, we should go with nil.

Comment by Nola Stowe [ 25/Jul/15 4:33 PM ]

Changed index-of and last-index-of to return nil instead of -1 (java-ism), thus had to remove the type hint for the return value.

Seems to have slowed down the function compared to having the function return -1. My benchmarks: https://www.evernote.com/l/ABL2wead73BJZYAkpH5yxsfX9JM8cpUiNhw

Comment by Nola Stowe [ 25/Jul/15 9:05 PM ]

Updated patch to include return value in doc string for index-of last-index-of, otherwise same as last update.

Comment by Alex Miller [ 26/Jul/15 8:40 AM ]

Removing the hint causes the return type to be Object, which forces boxing in the "found" cases.

Comment by Nola Stowe [ 26/Jul/15 8:56 AM ]

Is it worth it to remove the java-ism of returning -1 when not found? Personally, I don't really like a function returning one of two things... found index or nil ... I prefer it being an int at all times.

Comment by Alex Miller [ 26/Jul/15 9:00 AM ]

There is one whitespace issue in the -5 patch:

[~/code/clojure (master)]$ git apply ~/Downloads/add_functions_to_strings-5.patch
/Users/alex/Downloads/add_functions_to_strings-5.patch:34: trailing whitespace.
  (let [result
warning: 1 line adds whitespace errors.

I updated the timings in the description to include both -4 and -5 - the boxing there does have a significant impact.

Comment by Nola Stowe [ 26/Jul/15 9:28 AM ]

Fixed whitespace issue.

Comment by Alex Miller [ 26/Jul/15 9:32 AM ]

Added -6 patch and timings that address most of the performance impact.

Comment by Stuart Halloway [ 27/Jul/15 9:35 AM ]

Alex: Note that all of these functions return numbers and booleans, not strings nor seqs of anything. This is partially why they were left out of the original string implementation.

It isn't clear to me why index-of should be purely a string fn, I regularly need index-of with other things, and both my book and JoC use index-of as an example of a nice-to-have.

Comment by Nicola Mometto [ 27/Jul/15 9:44 AM ]

Stuart, I don't think the fact that those functions don't return string is really relevant, the important fact is that they operate on strings (just like c.set/superset? or subset? don't return sets) and are frequently used on them.

Comment by Bozhidar Batsov [ 27/Jul/15 9:51 AM ]

I'm with Nicola on this one - we have to be practical. There's a reason why this is the top-voted JIRA ticket - Clojure programmers want those functions. That's says plenty.

Anyways, having a generic `index-of` is definitely not a bad idea.

Comment by Stuart Halloway [ 31/Jul/15 11:27 AM ]

How does this compare to making them all seq fns and adding them to clojure.core?

Comment by Andy Fingerhut [ 31/Jul/15 11:39 AM ]

Given Rich's comment earlier "I completely agree with Alex and Bozhidar re: strings vs seqs. Other langs that have only lists and use them for strings are really bad at strings. So - string-specific fns are ok.", do we need to write a separate patch that makes these all into seq fns and do performance comparisons between those vs. the string-specific ones?

Comment by Stuart Halloway [ 31/Jul/15 1:26 PM ]


Definitely not! I want to have a design conversation.

We have protocols, and could use them to provide core fns that are specialized to strings, so it isn't clear to me why we would have to be really bad at it.

Comment by Stuart Halloway [ 07/Aug/15 2:19 PM ]

Discussed with Rich. He does not want sequence fns with these performance characteristics, so string-specific fns it shall be.

Comment by Rich Hickey [ 08/Aug/15 10:11 AM ]

So which patch is screened?

Comment by Alex Miller [ 08/Aug/15 11:40 AM ]

Stu screened add_functions_to_strings-6.patch where index-of returns nil on not-found. The add_functions_to_strings-4.patch is almost identical but returns -1 from index-of (which matches Java and JavaScript index-of behavior and may be slightly faster for callers that can be monomorphic on primitive longs?).

Comment by Rich Hickey [ 24/Aug/15 11:27 AM ]

nil-returning index fns should document that return when not found

Comment by Nola Stowe [ 24/Aug/15 11:56 AM ]

adds better doc string to indicated nil is return when not found for index-of and last-index-of.

Comment by Alex Miller [ 24/Aug/15 11:59 AM ]

Added clj-1449-7.patch that is identical to -6 but adds statement of return expectation on index-of and last-index-of.

Comment by Alex Miller [ 24/Aug/15 12:19 PM ]

jinx - same patch! Replacing with Nola's.

Comment by Colin Taylor [ 30/Aug/15 6:48 PM ]

Bit late on this - but .equalsIgnoreCase (=ic? equals-ic ? ) is currently a visible wart for our portability.

Comment by Alex Miller [ 30/Aug/15 10:03 PM ]

Hi Colin, that's a good idea but you have missed the window on this ticket - feel free to file a different one.

Comment by Tom Marble [ 03/Sep/15 3:46 PM ]

I have just proposed an analogous patch to ClojureScript under

[CLJ-1586] Compiler doesn't preserve metadata for LazySeq literals Created: 12/Nov/14  Updated: 03/Sep/15  Resolved: 03/Sep/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: None
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: compiler, metadata, typehints

Attachments: Text File 0001-CLJ-1586-Compiler-doesn-t-preserve-metadata-for-lazy.patch     Text File 0001-Compiler-doesn-t-preserve-metadata-for-lazyseq-liter.patch     Text File clj-1586-2.patch    
Patch: Code and Test
Approval: Ok


The analyzer in Compiler.java forces evaluation of lazyseq literals, but loses the compile time original metadata of that form, meaning that a type hint will be lost.

Example demonstrating this issue:

user=> (set! *warn-on-reflection* true)
user=> (list '.substring (with-meta (concat '(identity) '("foo")) {:tag 'String}) 0)
(.substring (identity "foo") 0)
user=> (eval (list '.substring (with-meta (concat '(identity) '("foo")) {:tag 'String}) 0))
Reflection warning, NO_SOURCE_PATH:6:1 - call to method substring on java.lang.Object can't be resolved (no such method).

Forcing the concat call to an ASeq rather than a LazySeq fixes this issue:

user=> (eval (list '.hashCode (with-meta (seq (concat '(identity) '("foo"))) {:tag 'String})))

This ticket blocks http://dev.clojure.org/jira/browse/CLJ-1444 since clojure.core/sequence might return a lazyseq.

This bug affected both tools.analyzer and tools.reader and forced me to commit a fix in tools.reader to work around this issue, see: http://dev.clojure.org/jira/browse/TANAL-99

The proposed patch trivially preserves the form metadata after realizing the lazyseq

Approach: Keep a copy of the original form, and apply its metadata to the realized lazyseq
Patch: clj-1586-2.patch
Screened by: Alex Miller

Comment by Stuart Halloway [ 31/Jul/15 9:19 AM ]

Please add a test case.

Comment by Nicola Mometto [ 31/Jul/15 9:32 AM ]

Updated patch with testcase

Comment by Alex Miller [ 11/Aug/15 10:55 AM ]

clj-1586-2.patch is same as prior, just updated to apply to current master

[CLJ-1609] Fix an edge case in the Reflector's search for a public method declaration Created: 05/Dec/14  Updated: 03/Sep/15  Resolved: 03/Sep/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Jeremy Heiler Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: interop

Attachments: Text File clj-1609-2.patch     Text File clj-1609-3.patch     Text File clj-1609-test.patch     Text File reflector_method_bug.patch    
Patch: Code and Test
Approval: Ok

// all: package demo;

public interface IBar {
  void stuff();

class Bar {
  public void stuff() { System.out.println("stuff"); }

class SubBar extends Bar implements IBar {
  // implements IBar, via non-public superclass Bar

public class Factory {
  public static IBar get() { return new SubBar(); }

Can't invoke from Clojure:

(import '[demo Factory IBar])
(def inst (Factory/get))  ;; instance of SubBar, inferred as IBar
(.stuff inst)
;; IllegalArgumentException Can't call public method of non-public class: public void demo.Bar.stuff()  clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:88)

Cause: The Reflector is not taking into account that a non-public class can implement an interface, and have a non-public parent class contain an implementation of a method on that interface.

Approach: The solution I took is to pass in the target object's class instead of the declaring class of the method object.

Patch: clj-1609-3.patch + clj-1609-test.patch

Screened by: Fogus

Comment by Alex Miller [ 05/Dec/14 8:48 AM ]

Probably a dupe of CLJ-259. I'll probably dupe that one to this though.

Comment by Jeremy Heiler [ 05/Dec/14 1:33 PM ]

Thanks. Sorry for not finding that myself. CLJ-259 refers to CLJ-126, which I think is covered with this patch.

Comment by Jeremy Heiler [ 06/Dec/14 12:37 PM ]

I was looking into whether or not the target could be null, and it can be when invoking a static method. However, I don't think that code path would make it to the target.getClass() because non-public static methods aren't returned by getMethods().

Comment by Stuart Halloway [ 31/Jul/15 9:24 AM ]


The compile-tests phase of the build provides a point for compiling Java classes that that tests can then consume. Does that provide the hook you would need to add a test?

Comment by Alex Miller [ 07/Aug/15 9:10 AM ]

afaict, this ticket is incomplete pending a test from Stu's last comment so moving there

Comment by Alex Miller [ 18/Aug/15 4:58 PM ]

clj-1609-2.patch is identical to previous patch, just adds jira id to beginning of commit message

clj-1609-test.patch adds a reflection test that fails without the patch and passes with it.

Comment by Alex Miller [ 18/Aug/15 4:58 PM ]

should be ready to screen now

Comment by Fogus [ 21/Aug/15 12:53 PM ]

The patch and test are straight-forward and clean.

Comment by Rich Hickey [ 24/Aug/15 9:54 AM ]

Can I get some more context in this diff please?

Comment by Alex Miller [ 24/Aug/15 12:11 PM ]

Added clj-1609-3.patch which is identical to prior but has 15 lines of context instead.

[CLJ-1232] Functions with non-qualified return type hints force import of hinted classes when called from other namespace Created: 18/Jul/13  Updated: 03/Sep/15  Resolved: 03/Sep/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5, Release 1.6, Release 1.7
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Tassilo Horn Assignee: Unassigned
Resolution: Completed Votes: 8
Labels: compiler, typehints

Attachments: Text File 0001-auto-qualify-arglists-class-names.patch     Text File 0001-auto-qualify-arglists-class-names-v2.patch     Text File 0001-auto-qualify-arglists-class-names-v3.patch     Text File 0001-auto-qualify-arglists-class-names-v4.patch     Text File 0001-CLJ-1232-auto-qualify-arglists-class-names.patch     Text File 0001-throw-on-non-qualified-class-names-that-are-not-auto.patch    
Patch: Code and Test
Approval: Ok


You can add a type hint to function arglists to indicate the return type of a function like so.

user> (import '(java.util List))
user> (defn linkedlist ^List [] (java.util.LinkedList.))
user> (.size (linkedlist))

The problem is that now when I call `linkedlist` exactly as above from another namespace, I'll get an exception because java.util.List is not imported in there.

user> (in-ns 'user2)
#<Namespace user2>
user2> (refer 'user)
user2> (.size (linkedlist))
CompilerException java.lang.IllegalArgumentException: Unable to resolve classname: List, compiling:(NO_SOURCE_PATH:1:1)
user2> (import '(java.util List)) ;; Too bad, need to import List here, too.
user2> (.size (linkedlist))

There are two workarounds: You can import the hinted type also in the calling namespace, or you always use fully qualified class names for return type hints. Clearly, the latter is preferable.

Approach: Resolve the tags in the defn macro.

Patch: 0001-CLJ-1232-auto-qualify-arglists-class-names.patch

Screened by: Fogus

Alternate approach: Make the compiler resolve the return tags when necessary (tag is not a string, primitive tag (^long) or array tag (^longs)) and update the Var's :arglist appropriately. Patch: 0001-auto-qualify-arglists-class-names-v4.patch. Note that this patch had problems with Hystrix, which was using non-conformant arglists - Hystrix has since been patched.

Comment by Andy Fingerhut [ 16/Apr/14 3:47 PM ]

To make sure I understand, Nicola, in this ticket you are asking that the Clojure compiler change behavior so that the sample code works correctly with no exceptions, the same way as it would work correctly without exceptions if one of the workarounds were used?

Comment by Tassilo Horn [ 17/Apr/14 12:18 AM ]

Hi Andy. Tassilo here, not Nicola. But yes, the example should work as-is. When I'm allowed to use type hints with simple imported class names for arguments, then doing so for return values should work, too.

Comment by Rich Hickey [ 10/Jun/14 10:41 AM ]

Type hints on function params are only consumed by the function definition, i.e. in the same module as the import/alias. Type hints on returns are just metadata, they don't get 'compiled' and if the metadata is not useful to consumers in other namespaces, it's not a useful hint. So, if it's not a type in the auto-imported set (java.lang), it should be fully qualified.

Comment by Alex Miller [ 10/Jun/14 11:55 AM ]

Based on Rich's comment, this ticket should probably morph into an enhancement request on documentation, probably on http://clojure.org/java_interop#Java Interop-Type Hints.

Comment by Andy Fingerhut [ 10/Jun/14 3:13 PM ]

I would suggest something like the following for a documentation change, after this part of the text on the page Alex links in the previous comment:

For function return values, the type hint can be placed before the arguments vector:

(defn hinted
(^String [])
(^Integer [a])
(^java.util.List [a & args]))

-> #user/hinted

If the return value type hint is for a class that is outside of java.lang, which is the part auto-imported by Clojure, then it must be a fully qualified class name, e.g. java.util.List, not List.

Comment by Nicola Mometto [ 10/Jun/14 4:02 PM ]

I don't understand why we should enforce this complexity to the user.
Why can't we just make the Compiler (or even defn itself) update all the arglists tags with properly resolved ones? (that's what I'm doing in tools.analyzer.jvm)

Comment by Alexander Kiel [ 19/Jul/14 10:02 AM ]

I'm with Nicola here. I also think that defn should resolve the type hint according the imports of the namespace defn is used in.

Comment by Max Penet [ 22/Jul/14 7:06 AM ]

Same here, I was bit by this in the past. The current behavior is clearly counterintuitive.

Comment by Nicola Mometto [ 28/Aug/14 12:58 PM ]

Attached two patches implementing two different solutions:

  • 0001-auto-qualify-arglists-class-names.patch makes the compiler automatically qualify all the tags in the :arglists
  • 0001-throw-on-non-qualified-class-names-that-are-not-auto.patch makes the compiler throw an exception for all public defs whose return tag is a symbol representing a non-qualified class that is not in the auto-import list (approach proposed in IRC by Alex Miller)
Comment by Tassilo Horn [ 29/Aug/14 1:49 AM ]

For what it's worth, I'd prefer the first patch because the second doesn't help in situations where the caller lives in a namespace where the called function's return type hinted class is `ns-unmap`-ed. And there a good reasons for doing that. For example, Process is a java.lang class and Process is a pretty generic name. So in some namespace, I want to define my own Process deftype or defrecord. Without unmapping 'Process first to get rid of the java.lang.Process auto-import, I'd get an exception:

user> (deftype Process [])
IllegalStateException Process already refers to: class java.lang.Process in namespace: user  clojure.lang.Namespace.referenceClass (Namespace.java:140)

Now when I call some function from some library that has a `^Process` return type hint (meaning java.lang.Process there), I get the same exception as in my original report.

I can even get into troubles when only using standard Clojure functions because those have `^String` and `^Class` type hints. IMO, Class is also a pretty generic name I should be able to name my custom deftype/defrecord. And I might also want to have a custom String type/record in my astrophysics system.

Comment by Andy Fingerhut [ 30/Sep/14 4:39 PM ]

Not sure whether the root cause of this behavior is the same as the example in the description or not, but seems a little weird that even for fully qualified Java class names hinting the arg vector, it makes a difference whether it is done with defn or def:

Clojure 1.6.0
user=> (set! *warn-on-reflection* true)
user=> (defn f1 ^java.util.LinkedList [coll] (java.util.LinkedList. coll))
user=> (def f2 (fn ^java.util.LinkedList [coll] (java.util.LinkedList. coll)))
user=> (.size (f1 [2 3 4]))
user=> (.size (f2 [2 3 4]))
Reflection warning, NO_SOURCE_PATH:5:1 - reference to field size can't be resolved.
Comment by Alex Miller [ 30/Sep/14 6:21 PM ]

Andy, can you file that as a separate ticket?

Comment by Andy Fingerhut [ 30/Sep/14 9:08 PM ]

Created ticket CLJ-1543 for the issue raised in my comment earlier on 30 Sep 2014.

Comment by Andy Fingerhut [ 01/Oct/14 12:38 PM ]

Tassilo (or anyone), is there a reason to prefer putting the tag on the argument vector in your example? It seems that putting it on the Var name instead avoids this issue:

user=> (clojure-version)
user=> (set! *warn-on-reflection* true)
user=> (import '(java.util List))
user=> (defn ^List linkedlist [] (java.util.LinkedList.))
user=> (.size (linkedlist))
user=> (ns user2)
user2=> (refer 'user)
user2=> (.size (linkedlist))

I suppose that only allows a single type tag, rather than an independent one for each arity.

Comment by Tassilo Horn [ 02/Oct/14 3:16 AM ]

I wasn't aware of the fact that you can put it on the var's name. That's not documented at http://clojure.org/java_interop#Java Interop-Type Hints. But IMHO the documented version with putting the tag on the argument vector is more general since it supports different return type hints for the different arity version. In any case, if both forms are permitted then they should be equivalent in the case the function has only one arity.

Comment by Rich Hickey [ 16/Mar/15 12:02 PM ]

Please work on the simplest patch that resolves the names

Comment by Alex Miller [ 16/Mar/15 4:34 PM ]

Nicola, in this:

if (tag != null &&
                        !(tag instanceof String) &&
                        primClass((Symbol)tag) == null &&
                        !tagClass((Symbol) tag).getName().startsWith("["))
                            argvec = (IPersistentVector)((IObj)argvec).withMeta(RT.map(RT.TAG_KEY, Symbol.intern(tagClass((Symbol)tag).getName())));

doesn't tagClass already handle most of these cases properly already? Can this be simplified? Is there an optimization case in avoiding lookup for a dotted name?

Comment by Nicola Mometto [ 16/Mar/15 5:10 PM ]

Patch 0001-auto-qualify-arglists-class-names-v2.patch avoids doing unnecessary lookups (dotted names, special tags (primitive tags, array tags)) and adds a testcase

Comment by Michael Blume [ 20/May/15 1:13 PM ]

I'm seeing an odd failure with this patch and hystrix defcommands, will post a small reproduction shortly

Comment by Michael Blume [ 20/May/15 1:20 PM ]


passes lein check with 1.7 beta3, fails with v3 patch applied

Comment by Nicola Mometto [ 20/May/15 1:40 PM ]

During analysis the compiler understands only arglists in the form of (quote ([..]*)) (see https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L557-L558), hystrix emits arglists in the form of (list (quote [..])*).

Not sure what to do about this.

Comment by Michael Blume [ 20/May/15 1:51 PM ]

Possibly just ask Hystrix not to do that?

Comment by Alex Miller [ 20/May/15 2:30 PM ]

test.generative uses non-standard arglists too. I haven't looked at the patch, but if it's sensitive to that, it's probably not good enough.

Comment by Nicola Mometto [ 20/May/15 6:51 PM ]

test.generative uses non-standard :tag, not :arglists

Comment by Alex Miller [ 20/May/15 10:22 PM ]

ah, yes. sorry.

Comment by Stuart Halloway [ 10/Jul/15 12:15 PM ]

Opened https://github.com/Netflix/Hystrix/issues/831 to see what is up with hystrix.

Comment by Stuart Halloway [ 31/Jul/15 9:16 AM ]

Can somebody with context update this patch to apply to latest master?

Comment by Nicola Mometto [ 31/Jul/15 9:37 AM ]

patch updated

Comment by Matt Jacobs [ 06/Aug/15 12:59 PM ]

I've released hystrix-clj 1.4.14, with a fix for https://github.com/Netflix/Hystrix/issues/831: https://github.com/Netflix/Hystrix/releases/tag/v1.4.14. If anything else is needed on the Hystrix side, please open another issue.

Comment by Fogus [ 07/Aug/15 11:03 AM ]

Applied the 0001-CLJ-1232-auto-qualify-arglists-class-names.patch to master (1.8) and ran through the example scenario to check the veracity of the change. Additionally, I ran a modified snippet using def and verified that it too worked. Finally, after checking the code it seems reasonable in implementation and scope.

Comment by Rich Hickey [ 08/Aug/15 9:56 AM ]

I'd rather not change the compiler and it seems hystrix was broken. Also please make it clear what the single strategy and patch is at the top of the ticket.

Comment by Alex Miller [ 11/Aug/15 5:01 PM ]

Made preferred approach clear in the description and moved back to vetted. I believe the other ticket was screened by Fogus so this needs to be re-screened with the preferred approach ticket.

Comment by Alex Miller [ 11/Aug/15 5:08 PM ]

Nicola and Fogus, would appreciate an eye as to whether I just made the proper alterations in the description.

Comment by Nicola Mometto [ 12/Aug/15 6:17 AM ]

Alex Miller Reading Fogus' last comment, it seems to me that he actually screened the current patch (the one who resolves tags in defn rather than in the compiler) "Applied the 0001-CLJ-1232-auto-qualify-arglists-class-names.patch to master [..]"

Comment by Fogus [ 14/Aug/15 9:18 AM ]

It would be better if the description said the word "preferred," but based on my reading resolving the tags in defn is the winner (0001-CLJ-1232-auto-qualify-arglists-class-names.patch). Thankfully that's the one that I screened as I agree that non-trivial changes to the compiler are to be avoided. That said, of course that patch does modify the compiler but since it's a change that pulls out some code into a useful method I let it through. To my mind that was a trivial (and justifiable change). Of course, if we want to avoid any compiler changes then this patch would have to be reworked.

Comment by Alex Miller [ 18/Aug/15 5:03 PM ]

Moving back to screened as fogus screened the preferred (by Rich) version.

[CLJ-1766] Serializing+deserializing lists breaks their hash Created: 21/Jun/15  Updated: 03/Sep/15  Resolved: 03/Sep/15

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6, Release 1.7
Fix Version/s: Release 1.8

Type: Defect Priority: Major
Reporter: Chris Vermilion Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: interop

Tested on OS X 10.10.3, Clojure 1.6.0, Java 1.8.0_45-b14, but don't think this is a factor.

Attachments: Text File clj-1766-2.patch     Text File CLJ-1766.patch     File serialization_test_mod.diff     File serialize-test.clj    
Patch: Code and Test
Approval: Ok


clojure.lang.PersistentList implements Serializable but a serialization roundtrip results in a hash of 0. This appears to be caused by ASeq marking its _hash property as transient; other collection base classes don't do this. I don't know if there is a rationale for the distinction, but the result is that serializing, then deserializing, a list results in its _hash property being 0 instead of either its previous, calculated value, or -1, which would trigger recalculation.

This means that any code that relies on a list's hash value can break in subtle ways.


(import '[java.io ByteArrayInputStream ByteArrayOutputStream ObjectInputStream ObjectOutputStream])

(defn obj->bytes [obj]
  (let [bytestream (ByteArrayOutputStream.)]
    (doto (ObjectOutputStream. bytestream) (.writeObject obj))
    (.toByteArray bytestream)))

(defn bytes->obj [bs]
  (.readObject (ObjectInputStream. (ByteArrayInputStream. bs))))

(defn round-trip [x] (bytes->obj (obj->bytes x)))
user=> (hash '(1))
user=> (hash (round-trip '(1)))
user=> #{'(1) (round-trip '(1))}
#{(1) (1)}
user=> (def m {'(1) 1})
user=> (= m (round-trip m))
user=> (= (round-trip m) m)

Approach: Use 0 as the "uncomputed" hash value, not -1. The auto initialized value on serialization construction will then automatically trigger a re-computation.

Alternate approaches:

  • Implement a readObject method that sets the _hash property to -1. This is somewhat complicated in the face of subclasses, etc.

Note: Also need to consider other classes that use -1 instead of 0 as the uncomputed hash value: APersistentMap, APersistentSet, APersistentVector, PersistentQueue - however I believe in those cases they are not transient and thus may avoid the issue. Perhaps they should be transient though.

Patch: clj-1766-2.patch
Screened by: Alex Miller

Comment by Chris Vermilion [ 21/Jun/15 1:10 PM ]

Yikes, sorry about the formatting above; JIRA newbie and looks like I can't edit. Also, I should have noted that the above functions require an import: (import '[java.io ByteArrayInputStream ByteArrayOutputStream ObjectInputStream ObjectOutputStream]).

Comment by Alex Miller [ 22/Jun/15 7:55 AM ]

Yes, this is a bug. My preference would be to use 0 to indicate "not computed" and thus sidestep this issue. The downside of changing from -1 to 0 is that serialization/deserialization is then broken between versions of Clojure (although maybe if it was already broken, that's not an issue). -1 is used like this for lazily computed hashes in many classes in Clojure so the issue should really be fixed across the board.

Comment by Alex Miller [ 22/Jun/15 8:22 AM ]

There are some serialization round-trip tests in clojure.test-clojure.serialization - seems like those should be updated to compare the .hashCode and hash before/after, which should catch this. I attached a diff (not a proper patch) with that change just as a demonstration, which indeed produces a bunch of failures. That should be incorporated into any fix, possibly along with other failures.

Comment by Alex Miller [ 31/Jul/15 9:04 AM ]

Also, don't worry about crediting me on that test change, please just incorporate it into whatever gets made here.

Comment by Andrew Rosa [ 31/Jul/15 9:31 AM ]

Alex Miller I hope to work on this issue this weekend. There are some conflict with the alpha release schedule?

Comment by Alex Miller [ 31/Jul/15 9:47 AM ]

No conflict - when it's ready we'll look at it.

Comment by Andrew Rosa [ 04/Aug/15 8:21 AM ]

As Alex Miller recomended, I followed the change-default-to-zero path.
Not only it sidesteps the serialization issue, it looks more aligned with
the semantics of transient. Of course, there is no guarantees
about how different JVM implementations will deal with it - sometimes
they will be serialized, sometimes they will not.

So, together with the patch I've made some manual tests between some versions. The script used is attached for further inspection.

Running on a 1.6 REPL has shown on the original described issue:

serialize-test=> *clojure-version*
{:major 1, :minor 6, :incremental 0, :qualifier nil}
serialize-test=> (def f "seq-1-6.dump")
serialize-test=> (def x {'(1) 1})
serialize-test=> (dump-bytes f (serialize x))
serialize-test=> (deserialize (slurp-bytes f))
{(1) 1}
serialize-test=> (hash x)
serialize-test=> (hash (deserialize (serialize x)))
serialize-test=> (hash (deserialize (slurp-bytes f)))
serialize-test=> (= x (deserialize (slurp-bytes f)))
serialize-test=> (= (deserialize (slurp-bytes f)) x)

Running on 1.8 master. Despite of the = behavior looking ok, the
hashes are different between roundtrips, like we saw on 1.6:

serialize-test=> *clojure-version*
{:major 1, :minor 8, :incremental 0, :qualifier "master", :interim true}
serialize-test=> (def f "seq-1-8-master.dump")
serialize-test=> (def x {'(1) 1})
serialize-test=> (dump-bytes f (serialize x))
serialize-test=> (deserialize (slurp-bytes f))
{(1) 1}
serialize-test=> (hash x)
serialize-test=> (hash (deserialize (serialize x)))
serialize-test=> (hash (deserialize (slurp-bytes f)))
serialize-test=> (= x (deserialize (slurp-bytes f)))
serialize-test=> (= (deserialize (slurp-bytes f)) x)

Running on 1.8 after patch the hashes are correctly computed on both cases:

serialize-test=> *clojure-version*
{:major 1, :minor 8, :incremental 0, :qualifier "master", :interim true}
serialize-test=> (def f "seq-1-8-patch.dump")
serialize-test=> (def x {'(1) 1})
serialize-test=> (dump-bytes f (serialize x))
serialize-test=> (deserialize (slurp-bytes f))
{(1) 1}
serialize-test=> (hash x)
serialize-test=> (hash (deserialize (serialize x)))
serialize-test=> (hash (deserialize (slurp-bytes f)))
serialize-test=> (= x (deserialize (slurp-bytes f)))
serialize-test=> (= (deserialize (slurp-bytes f)) x)

Please note I've dumped each serialized data into different files, so I could test the interoperability between those versions. What I found:

  • 1.6 serialization is already incompatible with 1.8, on both directions;
  • 1.8 data could be exchange between master and patched versions, not affecting version behavior (hashes are computed only on patched REPL).

Did I miss something or everything looks correct for you too? I'm open to do some more exhaustive testing if anyone could give me some directions about where to explore.

Comment by Rich Hickey [ 08/Aug/15 9:46 AM ]

This patch both fixes the serialization problem and changes the implementation for no reason related to the problem. The implementation works in place in order to avoid head-holding, which the implementation change reintroduces. Screeners - please make sure patches contain the minimal code to address the problem and don't incorporate other extraneous 'improvements'.

Comment by Ghadi Shayban [ 08/Aug/15 11:59 AM ]

Rich Hickey The only change I see is the removal of a commented-out impl.

Comment by Alex Miller [ 08/Aug/15 10:33 PM ]

I agree with Ghadi - there is no change in implementation here, just some commented (non-used) code was removed. Moving back to Screened.

Comment by Alex Miller [ 18/Aug/15 12:26 PM ]

Latest patch is identical, just does not remove the commented out code.

Generated at Thu Sep 03 20:46:19 CDT 2015 using JIRA 4.4#649-r158309.