Clojure

Add clojure.string functions for portability to ClojureScript

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.6
  • Fix Version/s: Release 1.8
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Vetted

Description

It would be useful if a few common functions from Java's String were available as Clojure functions for the purposes of increasing portability to other Clojure platforms like ClojureScript.

The functions below also cover the vast majority of cases where Clojure users currently drop into Java interop for String calls - this tends to be an issue for discoverability and learning. While the goal of this ticket is increased portability, improving that is a nice secondary benefit.

Proposed clojure.string fn java.lang.String method
index-of indexOf
last-index-of lastIndexOf
starts-with? startsWith
ends-with? endsWith
includes? contains

Patch:

  • add_functions_to_strings-4.patch - uses -1 to indicate not-found in index-of
  • add_functions_to_strings-6.patch - uses nil to indicate not-found in index-of

Performance: Tested the following with criterium for execution mean time (all times in ns).

Java Clojure Java Clojure (-4 patch) Clojure (-6 patch)
(.indexOf "banana" "n") (clojure.string/index-of "banana" "n") 8.70 9.03 9.27
(.indexOf "banana" "n" 1) (clojure.string/index-of "banana" "n" 1) 7.29 7.61 7.66
(.indexOf "banana" (int \n)) (clojure.string/index-of "banana" \n) 5.34 6.20 6.20
(.indexOf "banana" (int \n) 1) (clojure.string/index-of "banana" \n 1) 5.38 6.19 6.24
(.startsWith "apple" "a") (clojure.string/starts-with? "apple" "a") 4.84 5.71 5.65
(.endsWith "apple" "e") (clojure.string/ends-with? "apple" "e") 4.82 5.68 5.70
(.contains "baked apple pie" "apple") (clojure.string/includes? "baked apple pie" "apple") 10.78 11.99 12.17

Screened by:

Note: In both Java and JavaScript, indexOf will return -1 to indicate not-found, so this is (at least in these two cases) the status quo. The benefit of nil return in Clojure is that it can be used as a found/not-found condition. The benefit of a -1 return is that it matches Java/JavaScript and that the return type can be hinted as a primitive long, allowing you to feed it into some other arithmetic or string operation without boxing.

  1. add_functions_to_strings.patch
    08/Jul/15 12:46 PM
    3 kB
    Nola Stowe
  2. add_functions_to_strings-2.patch
    14/Jul/15 8:55 PM
    4 kB
    Nola Stowe
  3. add_functions_to_strings-3.patch
    14/Jul/15 8:56 PM
    5 kB
    Nola Stowe
  4. add_functions_to_strings-4.patch
    14/Jul/15 9:56 PM
    5 kB
    Alex Miller
  5. add_functions_to_strings-5.patch
    26/Jul/15 9:28 AM
    5 kB
    Nola Stowe
  6. add_functions_to_strings-6.patch
    26/Jul/15 9:32 AM
    5 kB
    Alex Miller
  7. clj-1449-more-v1.patch
    30/Aug/14 3:40 PM
    4 kB
    Andy Fingerhut

Activity

Hide
Alex Miller added a comment -

Re substring, there is a clojure.core/subs for this (predates the string ns I believe).

clojure.core/subs
([s start] [s start end])
Returns the substring of s beginning at start inclusive, and ending
at end (defaults to length of string), exclusive.

Show
Alex Miller added a comment - Re substring, there is a clojure.core/subs for this (predates the string ns I believe). clojure.core/subs ([s start] [s start end]) Returns the substring of s beginning at start inclusive, and ending at end (defaults to length of string), exclusive.
Hide
Jozef Wagner added a comment -

As strings are collection of characters, you can use Clojure's sequence facilities to achieve such functionality:

user=> (= (first "asdf") \a)
true
user=> (= (last "asdf") \a)
false
Show
Jozef Wagner added a comment - As strings are collection of characters, you can use Clojure's sequence facilities to achieve such functionality:
user=> (= (first "asdf") \a)
true
user=> (= (last "asdf") \a)
false
Hide
Alex Miller added a comment -

Jozef, String.startsWith() checks for a prefix string, not just a prefix char.

Show
Alex Miller added a comment - Jozef, String.startsWith() checks for a prefix string, not just a prefix char.
Hide
Bozhidar Batsov added a comment -

Re substring, I know about subs, but it seems very odd that it's not in the string ns. After all most people will likely look for string-related functionality in clojure.string. I think it'd be best if `subs` was added to clojure.string and clojure.core/subs was deprecated.

Show
Bozhidar Batsov added a comment - Re substring, I know about subs, but it seems very odd that it's not in the string ns. After all most people will likely look for string-related functionality in clojure.string. I think it'd be best if `subs` was added to clojure.string and clojure.core/subs was deprecated.
Hide
Pierre Masci added a comment -

Hi, I was thinking the same about starts-with and .ends-with, as well as (.indexOf s "c") and (.lastIndexOf "c").

I read the whole Java String API recently, and these 4 functions seem to be the only ones that don't have an equivalent in Clojure.
It would be nice to have them.

Andy Fingerhut who maintains the Clojure Cheatsheet told me: "I maintain the cheatsheet, and I put .indexOf and .lastIndexOf on there since they are probably the most common thing I saw asked about that is in the Java API but not the Clojure API, for strings."
Which shows that there is a demand.

Because Clojure is being hosted on several platforms, and might be hosted on more in the future, I think these functions should be part of the de-facto ecosystem.

Show
Pierre Masci added a comment - Hi, I was thinking the same about starts-with and .ends-with, as well as (.indexOf s "c") and (.lastIndexOf "c"). I read the whole Java String API recently, and these 4 functions seem to be the only ones that don't have an equivalent in Clojure. It would be nice to have them. Andy Fingerhut who maintains the Clojure Cheatsheet told me: "I maintain the cheatsheet, and I put .indexOf and .lastIndexOf on there since they are probably the most common thing I saw asked about that is in the Java API but not the Clojure API, for strings." Which shows that there is a demand. Because Clojure is being hosted on several platforms, and might be hosted on more in the future, I think these functions should be part of the de-facto ecosystem.
Hide
Andy Fingerhut added a comment -

Updating summary line and description to add contains? as well. I can back this off if it changes your mind about triaging it, Alex.

Show
Andy Fingerhut added a comment - Updating summary line and description to add contains? as well. I can back this off if it changes your mind about triaging it, Alex.
Hide
Andy Fingerhut added a comment -

Patch clj-1449-basic-v1.patch dated Aug 30 2014 adds starts-with? ends-with? contains? functions to clojure.string.

Patch clj-1449-more-v1.patch is the same, except it also replaces several Java method calls with calls to these Clojure functions.

Show
Andy Fingerhut added a comment - Patch clj-1449-basic-v1.patch dated Aug 30 2014 adds starts-with? ends-with? contains? functions to clojure.string. Patch clj-1449-more-v1.patch is the same, except it also replaces several Java method calls with calls to these Clojure functions.
Hide
Andy Fingerhut added a comment -

Patch clj-1449-basic-v1.patch dated Sep 5 2014 is identical to the patch I added recently called clj-1149-basic-v1.patch. It is simply renamed without the typo'd ticket number in the file name.

Show
Andy Fingerhut added a comment - Patch clj-1449-basic-v1.patch dated Sep 5 2014 is identical to the patch I added recently called clj-1149-basic-v1.patch. It is simply renamed without the typo'd ticket number in the file name.
Hide
Yehonathan Sharvit added a comment -

What about an implementation that works also in cljs?

Show
Yehonathan Sharvit added a comment - What about an implementation that works also in cljs?
Hide
Bozhidar Batsov added a comment -

Once this is added to Clojure it will be implemented in ClojureScript as well.

Show
Bozhidar Batsov added a comment - Once this is added to Clojure it will be implemented in ClojureScript as well.
Hide
Yehonathan Sharvit added a comment - - edited

Great! Any idea when it will be added to Clojure?
Also, will it be automatically added to Clojurescript or someone will have to write a particular code for it.
The suggested patch relies on Java so I am curious to understand who is going to port the patch to cljs.

Show
Yehonathan Sharvit added a comment - - edited Great! Any idea when it will be added to Clojure? Also, will it be automatically added to Clojurescript or someone will have to write a particular code for it. The suggested patch relies on Java so I am curious to understand who is going to port the patch to cljs.
Hide
Bozhidar Batsov added a comment -

No idea when/if this will get merged. Upvote the ticket to improve the odds of this happening sooner.
Someone on the ClojureScript team will have to implement this in terms of JavaScript.

Show
Bozhidar Batsov added a comment - No idea when/if this will get merged. Upvote the ticket to improve the odds of this happening sooner. Someone on the ClojureScript team will have to implement this in terms of JavaScript.
Hide
Alex Miller added a comment - - edited

Some things that would be helpful:

1) It would be better to combine the two patches into a single patch - I think changing current uses into new users is a good thing to include. Also, please keep track of the "current" patch in the description.
2) Patch needs tests.
3) Per the instructions at the top of the clojure.string ns (and the rest of the functions), the majority of these functions are implemented to take the broader CharSequence interface. Similar to those implementations, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String.
4) Consider return type hints - I'm not sure they're necessary here, but I would examine bytecode for typical calling situations to see if it would be helpful.
5) Check performance implications of the new versions vs the old with a good tool (like criterium). You've put an additional var invocation and (soon) type check in the calling path for these functions. I think providing a portable target is worth a small cost, but it would be good to know what the cost actually is.

I don't expect we will look at this until after 1.7 is released.

Show
Alex Miller added a comment - - edited Some things that would be helpful: 1) It would be better to combine the two patches into a single patch - I think changing current uses into new users is a good thing to include. Also, please keep track of the "current" patch in the description. 2) Patch needs tests. 3) Per the instructions at the top of the clojure.string ns (and the rest of the functions), the majority of these functions are implemented to take the broader CharSequence interface. Similar to those implementations, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String. 4) Consider return type hints - I'm not sure they're necessary here, but I would examine bytecode for typical calling situations to see if it would be helpful. 5) Check performance implications of the new versions vs the old with a good tool (like criterium). You've put an additional var invocation and (soon) type check in the calling path for these functions. I think providing a portable target is worth a small cost, but it would be good to know what the cost actually is. I don't expect we will look at this until after 1.7 is released.
Hide
Andy Fingerhut added a comment -

Alex, all your comments make sense.

If you think a ready-and-waiting patch that does those things would improve the odds of the ticket being vetted by Rich, please let us know.

My guess is that his decision will be based upon the description, not any proposed patches. If that is your belief also, I'll wait until he makes that decision before working on a patch. Of course, any other contributor is welcome to work on one if they like.

Show
Andy Fingerhut added a comment - Alex, all your comments make sense. If you think a ready-and-waiting patch that does those things would improve the odds of the ticket being vetted by Rich, please let us know. My guess is that his decision will be based upon the description, not any proposed patches. If that is your belief also, I'll wait until he makes that decision before working on a patch. Of course, any other contributor is welcome to work on one if they like.
Hide
Alex Miller added a comment -

Well nothing is certain of course, but I keep a special report of things I've "screened" prior to vetting that makes possible moving something straight from Triaged all the way through into Screened/Ok when Rich is able to look at them. This is a good candidate if things were in pristine condition.

That said, I don't know whether Rich will approve it or not, so it's up to you. I think the argument for portability is a strong one and complements the feature expression.

Show
Alex Miller added a comment - Well nothing is certain of course, but I keep a special report of things I've "screened" prior to vetting that makes possible moving something straight from Triaged all the way through into Screened/Ok when Rich is able to look at them. This is a good candidate if things were in pristine condition. That said, I don't know whether Rich will approve it or not, so it's up to you. I think the argument for portability is a strong one and complements the feature expression.
Hide
Alex Miller added a comment - - edited

I'd love to have a really high-quality patch on this ticket to consider for 1.8 that took into account my comments above.

Also, it occurred to me that I don't think this should be called "contains?", overlapping the core function with a different meaning (contains value vs contains key). Maybe "includes?"?

Show
Alex Miller added a comment - - edited I'd love to have a really high-quality patch on this ticket to consider for 1.8 that took into account my comments above. Also, it occurred to me that I don't think this should be called "contains?", overlapping the core function with a different meaning (contains value vs contains key). Maybe "includes?"?
Hide
Andy Fingerhut added a comment -

clojure.string already has 2 name conflicts with clojure.core, for which most people probably do something like (require '[clojure.string :as str]) to avoid that:

user=> (use 'clojure.string)
WARNING: reverse already refers to: #'clojure.core/reverse in namespace: user, being replaced by: #'clojure.string/reverse
WARNING: replace already refers to: #'clojure.core/replace in namespace: user, being replaced by: #'clojure.string/replace
nil
Show
Andy Fingerhut added a comment - clojure.string already has 2 name conflicts with clojure.core, for which most people probably do something like (require '[clojure.string :as str]) to avoid that:
user=> (use 'clojure.string)
WARNING: reverse already refers to: #'clojure.core/reverse in namespace: user, being replaced by: #'clojure.string/reverse
WARNING: replace already refers to: #'clojure.core/replace in namespace: user, being replaced by: #'clojure.string/replace
nil
Hide
Alex Miller added a comment -

I'm not concerned about overlapping the name. I'm concerned about overlapping it and meaning something different, particularly vs one of the most confusing functions in core already. clojure.core/contains? is better than linear time key search. clojure.string/whatever will be a linear time subsequence match.

Show
Alex Miller added a comment - I'm not concerned about overlapping the name. I'm concerned about overlapping it and meaning something different, particularly vs one of the most confusing functions in core already. clojure.core/contains? is better than linear time key search. clojure.string/whatever will be a linear time subsequence match.
Hide
Alex Miller added a comment -

Ruby uses "include?" for this.

Show
Alex Miller added a comment - Ruby uses "include?" for this.
Hide
Devin Walters added a comment -

I agree with Alex's comment about the overlap. Personally, I prefer the way "includes?" reads over "include?", but IMO either one is better than adding to the "contains?" confusion.

Show
Devin Walters added a comment - I agree with Alex's comment about the overlap. Personally, I prefer the way "includes?" reads over "include?", but IMO either one is better than adding to the "contains?" confusion.
Hide
Bozhidar Batsov added a comment -

I'm fine with `includes?`. Ruby is famous for the bad English used in its core library.

Show
Bozhidar Batsov added a comment - I'm fine with `includes?`. Ruby is famous for the bad English used in its core library.
Hide
Andrew Rosa added a comment -

Andy Fingerhut, since you are the author of the original patch do you desire to continue the work on it? If you prefer (or even need) to focus on different stuff, I would like to tackle this issue.

Show
Andrew Rosa added a comment - Andy Fingerhut, since you are the author of the original patch do you desire to continue the work on it? If you prefer (or even need) to focus on different stuff, I would like to tackle this issue.
Hide
Andy Fingerhut added a comment - - edited

Andrew Rosa, I am perfectly happy if someone else wants to work on a revised patch for this. If you are interested, go for it! And from Alex's comments, I believe my initial patch covers such a small fraction of what he wants, do not worry about keeping my name associated with any patch you submit – if yours is accepted, my patch will not have helped you in getting there.

Show
Andy Fingerhut added a comment - - edited Andrew Rosa, I am perfectly happy if someone else wants to work on a revised patch for this. If you are interested, go for it! And from Alex's comments, I believe my initial patch covers such a small fraction of what he wants, do not worry about keeping my name associated with any patch you submit – if yours is accepted, my patch will not have helped you in getting there.
Hide
Nola Stowe added a comment -

Andrew Rosa, I am interested on working on this. Are you currently working it?

Show
Nola Stowe added a comment - Andrew Rosa, I am interested on working on this. Are you currently working it?
Hide
Andrew Rosa added a comment -

Thanks Andy Fingerhut, didn't saw the answer here. I'm happy that Nola Stowe could take this one.

Show
Andrew Rosa added a comment - Thanks Andy Fingerhut, didn't saw the answer here. I'm happy that Nola Stowe could take this one.
Hide
Nola Stowe added a comment -

Updated patch to rename methods as specified and add tests.

Show
Nola Stowe added a comment - Updated patch to rename methods as specified and add tests.
Hide
Bozhidar Batsov added a comment -

A few remarks:

1. added should say 1.8, not 1.9.
2. The docstrings could be improved a bit - normally they should end with a . and refer to all the params of the function in question.

Show
Bozhidar Batsov added a comment - A few remarks: 1. added should say 1.8, not 1.9. 2. The docstrings could be improved a bit - normally they should end with a . and refer to all the params of the function in question.
Hide
Nola Stowe added a comment -

Thanks Bozhidar Batsov .. I will, and I also got feedback from the slack clojure-dev group (Alex, Ghadi, AndrewHr) so I will be submitting an improved patch soon

Show
Nola Stowe added a comment - Thanks Bozhidar Batsov .. I will, and I also got feedback from the slack clojure-dev group (Alex, Ghadi, AndrewHr) so I will be submitting an improved patch soon
Hide
Andy Fingerhut added a comment -

Also, I am not sure, but Alex Miller made this comment earlier: "Per the instructions at the top of the clojure.string ns, functions should take the broader CharSequence interface instead of String. Similar to existing clojure.string functions, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String."

Looking at the Java classes and interfaces, one of the classes that implements CharSequence is CharBuffer, and it has no indexOf or lastIndexOf methods. If one were to call the proposed index-of or last-index-of with a CharBuffer, I believe you would get an exception for an unknown method.

I don't know if Alex's sentence is essential to any patch to be considered, but wanted to point out that it seems like it effectively means writing your own implementation of each of these methods for some of the classes implementing the CharSequence interface.

Show
Andy Fingerhut added a comment - Also, I am not sure, but Alex Miller made this comment earlier: "Per the instructions at the top of the clojure.string ns, functions should take the broader CharSequence interface instead of String. Similar to existing clojure.string functions, you will need to provide a CharSequence implementation while also calling into the String functions when you actually have a String." Looking at the Java classes and interfaces, one of the classes that implements CharSequence is CharBuffer, and it has no indexOf or lastIndexOf methods. If one were to call the proposed index-of or last-index-of with a CharBuffer, I believe you would get an exception for an unknown method. I don't know if Alex's sentence is essential to any patch to be considered, but wanted to point out that it seems like it effectively means writing your own implementation of each of these methods for some of the classes implementing the CharSequence interface.
Hide
Nola Stowe added a comment -

Andy, yes you are right. Alex helped me with my current changes (not posted yet) and where needed I cast the value to a string so that it would have the indexOf methods.

Show
Nola Stowe added a comment - Andy, yes you are right. Alex helped me with my current changes (not posted yet) and where needed I cast the value to a string so that it would have the indexOf methods.
Hide
Nola Stowe added a comment -

Added type hints, using StringBuffer instead of string in tests.

Show
Nola Stowe added a comment - Added type hints, using StringBuffer instead of string in tests.
Hide
Nola Stowe added a comment -

Use critium and did some bench marking, I am not familiar enough to say "hey its good or no, too slow". I tried an optimization but it didn't make much difference.

notes here:

https://www.evernote.com/l/ABLYZbDr8ElIs6fkODAd3poBfIm_Bfr5kLo

Appreciate any feedback

Show
Nola Stowe added a comment - Use critium and did some bench marking, I am not familiar enough to say "hey its good or no, too slow". I tried an optimization but it didn't make much difference. notes here: https://www.evernote.com/l/ABLYZbDr8ElIs6fkODAd3poBfIm_Bfr5kLo Appreciate any feedback
Hide
Alex Miller added a comment -

Nola, some comments:

1) In index-of and last-index-of, I think we should use (instance? Character value) instead of char? - this will avoid a function invocation.

2) In index-of and last-index-of, in the branches where you know you have a Character, we should just invoke the proper function on the character instance directly rather than calling int. In this case that is (.charValue ^Character value). However, this function returns char so you still want the ^int hint.

3) In index-of and last-index-of, we should do a couple of things to maximally utilize primitive support for from-index. We should type-hint the incoming value as a ^long (only long and double are specialized cases in Clojure, not int). We should then use the function unchecked-int to optimally convert the long primitive to an int primitive.

4) We should type-hint the return value to a primitive long to avoid boxing the return.

Putting 1+2+3+4 together:

(defn index-of
  "Return index of value (string or char) in s, optionally searching forward from from-index."
  {:added "1.8"}
  (^long [^CharSequence s value]
   (if (instance? Character value)
     (.indexOf (.toString s) ^int (.charValue ^Character value))
     (.indexOf (.toString s) ^String value)))
  (^long [^CharSequence s value ^long from-index]
  (if (instance? Character value)
    (.indexOf (.toString s) ^int (.charValue ^Character value) (unchecked-int from-index))
    (.indexOf (.toString s) ^String value (unchecked-int from-index)))))

Similar changes in last-index-of.

5) In starts-with? and ends-with?, I would move the type hint for substr to the parameter declaration. This is more of a style preference on my part, no functional difference I believe.

6) On includes? I would do the same as #5, except .contains actually takes a CharSequence for the substr, so I would change the type hint from ^String to the broader ^CharSequence.

7) Can you edit the description and add a table with the "execution time mean" number for direct Java vs new Clojure fn for the patch after these changes? Please add tests for the from-index variants of index-of and last-index-of as well.

Show
Alex Miller added a comment - Nola, some comments: 1) In index-of and last-index-of, I think we should use (instance? Character value) instead of char? - this will avoid a function invocation. 2) In index-of and last-index-of, in the branches where you know you have a Character, we should just invoke the proper function on the character instance directly rather than calling int. In this case that is (.charValue ^Character value). However, this function returns char so you still want the ^int hint. 3) In index-of and last-index-of, we should do a couple of things to maximally utilize primitive support for from-index. We should type-hint the incoming value as a ^long (only long and double are specialized cases in Clojure, not int). We should then use the function unchecked-int to optimally convert the long primitive to an int primitive. 4) We should type-hint the return value to a primitive long to avoid boxing the return. Putting 1+2+3+4 together:
(defn index-of
  "Return index of value (string or char) in s, optionally searching forward from from-index."
  {:added "1.8"}
  (^long [^CharSequence s value]
   (if (instance? Character value)
     (.indexOf (.toString s) ^int (.charValue ^Character value))
     (.indexOf (.toString s) ^String value)))
  (^long [^CharSequence s value ^long from-index]
  (if (instance? Character value)
    (.indexOf (.toString s) ^int (.charValue ^Character value) (unchecked-int from-index))
    (.indexOf (.toString s) ^String value (unchecked-int from-index)))))
Similar changes in last-index-of. 5) In starts-with? and ends-with?, I would move the type hint for substr to the parameter declaration. This is more of a style preference on my part, no functional difference I believe. 6) On includes? I would do the same as #5, except .contains actually takes a CharSequence for the substr, so I would change the type hint from ^String to the broader ^CharSequence. 7) Can you edit the description and add a table with the "execution time mean" number for direct Java vs new Clojure fn for the patch after these changes? Please add tests for the from-index variants of index-of and last-index-of as well.
Hide
Nola Stowe added a comment -

optimizations as suggested by Alex Miller

Show
Nola Stowe added a comment - optimizations as suggested by Alex Miller
Hide
Andy Fingerhut added a comment -

Nola, not a big deal, but Rich has requested that patch file names end with ".patch" or ".diff", I think because whatever he uses to view them recognizes that suffix and puts it in a different viewing/editing mode.

Show
Andy Fingerhut added a comment - Nola, not a big deal, but Rich has requested that patch file names end with ".patch" or ".diff", I think because whatever he uses to view them recognizes that suffix and puts it in a different viewing/editing mode.
Hide
Nola Stowe added a comment -

Andy, so no need to keep around patches when used in comparisons?

Show
Nola Stowe added a comment - Andy, so no need to keep around patches when used in comparisons?
Hide
Andy Fingerhut added a comment -

It is ok to have multiple versions of patches attached, but names like add_functions_to_strings-2.patch etc. would be preferred over names like add_functions_to_strings.patch-2, so the suffix is ".patch".

Show
Andy Fingerhut added a comment - It is ok to have multiple versions of patches attached, but names like add_functions_to_strings-2.patch etc. would be preferred over names like add_functions_to_strings.patch-2, so the suffix is ".patch".
Hide
Nola Stowe added a comment -

original patch submitted by Nola

Show
Nola Stowe added a comment - original patch submitted by Nola
Hide
Nola Stowe added a comment -

Optimizations suggested by Alex

Show
Nola Stowe added a comment - Optimizations suggested by Alex
Hide
Alex Miller added a comment -

Nola, I re-did the timings on my machine - these were done with Java 1.8 and no Leiningen involved.

Show
Alex Miller added a comment - Nola, I re-did the timings on my machine - these were done with Java 1.8 and no Leiningen involved.
Hide
Alex Miller added a comment -

-4 patch is identical to -3 but removes one errant ^long type hint

Otherwise, looks good to me!

Show
Alex Miller added a comment - -4 patch is identical to -3 but removes one errant ^long type hint Otherwise, looks good to me!
Hide
Alex Miller added a comment -

Rich, since you didn't put any comments on here, I'm not sure if there's anything else you wanted, so marking screened.

Show
Alex Miller added a comment - Rich, since you didn't put any comments on here, I'm not sure if there's anything else you wanted, so marking screened.
Hide
Stuart Halloway added a comment -

I dislike the approach here for several reasons:

  • floats Java-isms (-1 for missing instead of nil?) up into the Clojure API
  • makes narrow string fns out of things that could/should be seq fns

Stepping back, this is all library work. Why should it be in core? If this started in a contrib, it could evolve much more freely.

Show
Stuart Halloway added a comment - I dislike the approach here for several reasons:
  • floats Java-isms (-1 for missing instead of nil?) up into the Clojure API
  • makes narrow string fns out of things that could/should be seq fns
Stepping back, this is all library work. Why should it be in core? If this started in a contrib, it could evolve much more freely.
Hide
Alex Miller added a comment -

re Java-isms: totally fair comment worth changing

re narrow string fns: if these were seq fns, they would presumably take seqables of chars and return seqables of chars. When I have strings, I want to work with strings (as with clojure.string/join and clojure.string/reverse vs their seq equivalents).

re library: these particular functions are the most commonly used string functions where current users drop to Java interop. Note that all 5 of these functions have over time been added to the cheatsheet (http://clojure.org/cheatsheet) because they are commonly used. I believe there is significant value in including them in core and standardizing their name and behavior across to CLJS.

Show
Alex Miller added a comment - re Java-isms: totally fair comment worth changing re narrow string fns: if these were seq fns, they would presumably take seqables of chars and return seqables of chars. When I have strings, I want to work with strings (as with clojure.string/join and clojure.string/reverse vs their seq equivalents). re library: these particular functions are the most commonly used string functions where current users drop to Java interop. Note that all 5 of these functions have over time been added to the cheatsheet (http://clojure.org/cheatsheet) because they are commonly used. I believe there is significant value in including them in core and standardizing their name and behavior across to CLJS.
Hide
Bozhidar Batsov added a comment -

I'm obviously biased, but I totally agree with everything Alex said. Having one core string library and a separate contrib string library is just impractical. With clojure.string already part of the language it's much more sensible to extend it where/when needed instead of encouraging the proliferation of tons of alternative libraries (there are already a couple of those out there, mostly because essential functions are missing). Initial designs are rarely perfect, there's nothing wrong in revisiting and improving them.

I believe this is one of the top voted tickets, which clearly indicates that the Clojure community is quite supportive of this additions.

Show
Bozhidar Batsov added a comment - I'm obviously biased, but I totally agree with everything Alex said. Having one core string library and a separate contrib string library is just impractical. With clojure.string already part of the language it's much more sensible to extend it where/when needed instead of encouraging the proliferation of tons of alternative libraries (there are already a couple of those out there, mostly because essential functions are missing). Initial designs are rarely perfect, there's nothing wrong in revisiting and improving them. I believe this is one of the top voted tickets, which clearly indicates that the Clojure community is quite supportive of this additions.
Hide
Michael Blume added a comment -

None of these functions return Strings so they could very easily be seq functions which use the String methods as a fast path when called with Strings, the trouble is then they wouldn't necessarily belong in clojure.string.

Show
Michael Blume added a comment - None of these functions return Strings so they could very easily be seq functions which use the String methods as a fast path when called with Strings, the trouble is then they wouldn't necessarily belong in clojure.string.
Hide
Bozhidar Batsov added a comment -

Such an argument can be made for existing functions like clojure.core/subs and clojure.string/reverse - we could have just went with generic functions that operate on seqs and converted the results to strings, but we didn't. It might be just me, but I'm really thinking this is being overanalysed. Sometimes it's really preferable to deal with some things as strings (not to mention more efficient).

Show
Bozhidar Batsov added a comment - Such an argument can be made for existing functions like clojure.core/subs and clojure.string/reverse - we could have just went with generic functions that operate on seqs and converted the results to strings, but we didn't. It might be just me, but I'm really thinking this is being overanalysed. Sometimes it's really preferable to deal with some things as strings (not to mention more efficient).
Hide
Rich Hickey added a comment -

I completely agree with Alex and Bozhidar re: strings vs seqs. Other langs that have only lists and use them for strings are really bad at strings. So - string-specific fns are ok. What about the Java-isms?

Show
Rich Hickey added a comment - I completely agree with Alex and Bozhidar re: strings vs seqs. Other langs that have only lists and use them for strings are really bad at strings. So - string-specific fns are ok. What about the Java-isms?
Hide
Bozhidar Batsov added a comment -

We should address them for sure. Let's wait for a couple of days for Nola to update the patch. If she's too busy I'll do the update myself.

Show
Bozhidar Batsov added a comment - We should address them for sure. Let's wait for a couple of days for Nola to update the patch. If she's too busy I'll do the update myself.
Hide
Nola Stowe added a comment -

I will be happy to work on it Saturday. Is nil an acceptable response when index-of "apple" "p" is not found? The other functions return booleans and I think those are ok.

Show
Nola Stowe added a comment - I will be happy to work on it Saturday. Is nil an acceptable response when index-of "apple" "p" is not found? The other functions return booleans and I think those are ok.
Hide
Bozhidar Batsov added a comment -

Yep, we should go with nil.

Show
Bozhidar Batsov added a comment - Yep, we should go with nil.
Hide
Nola Stowe added a comment -

Changed index-of and last-index-of to return nil instead of -1 (java-ism), thus had to remove the type hint for the return value.

Seems to have slowed down the function compared to having the function return -1. My benchmarks: https://www.evernote.com/l/ABL2wead73BJZYAkpH5yxsfX9JM8cpUiNhw

Show
Nola Stowe added a comment - Changed index-of and last-index-of to return nil instead of -1 (java-ism), thus had to remove the type hint for the return value. Seems to have slowed down the function compared to having the function return -1. My benchmarks: https://www.evernote.com/l/ABL2wead73BJZYAkpH5yxsfX9JM8cpUiNhw
Hide
Nola Stowe added a comment -

Updated patch to include return value in doc string for index-of last-index-of, otherwise same as last update.

Show
Nola Stowe added a comment - Updated patch to include return value in doc string for index-of last-index-of, otherwise same as last update.
Hide
Alex Miller added a comment -

Removing the hint causes the return type to be Object, which forces boxing in the "found" cases.

Show
Alex Miller added a comment - Removing the hint causes the return type to be Object, which forces boxing in the "found" cases.
Hide
Nola Stowe added a comment -

Is it worth it to remove the java-ism of returning -1 when not found? Personally, I don't really like a function returning one of two things... found index or nil ... I prefer it being an int at all times.

Show
Nola Stowe added a comment - Is it worth it to remove the java-ism of returning -1 when not found? Personally, I don't really like a function returning one of two things... found index or nil ... I prefer it being an int at all times.
Hide
Alex Miller added a comment - - edited

There is one whitespace issue in the -5 patch:

[~/code/clojure (master)]$ git apply ~/Downloads/add_functions_to_strings-5.patch
/Users/alex/Downloads/add_functions_to_strings-5.patch:34: trailing whitespace.
  (let [result
warning: 1 line adds whitespace errors.

I updated the timings in the description to include both -4 and -5 - the boxing there does have a significant impact.

Show
Alex Miller added a comment - - edited There is one whitespace issue in the -5 patch:
[~/code/clojure (master)]$ git apply ~/Downloads/add_functions_to_strings-5.patch
/Users/alex/Downloads/add_functions_to_strings-5.patch:34: trailing whitespace.
  (let [result
warning: 1 line adds whitespace errors.
I updated the timings in the description to include both -4 and -5 - the boxing there does have a significant impact.
Hide
Nola Stowe added a comment -

Fixed whitespace issue.

Show
Nola Stowe added a comment - Fixed whitespace issue.
Hide
Alex Miller added a comment -

Added -6 patch and timings that address most of the performance impact.

Show
Alex Miller added a comment - Added -6 patch and timings that address most of the performance impact.
Hide
Stuart Halloway added a comment -

Alex: Note that all of these functions return numbers and booleans, not strings nor seqs of anything. This is partially why they were left out of the original string implementation.

It isn't clear to me why index-of should be purely a string fn, I regularly need index-of with other things, and both my book and JoC use index-of as an example of a nice-to-have.

Show
Stuart Halloway added a comment - Alex: Note that all of these functions return numbers and booleans, not strings nor seqs of anything. This is partially why they were left out of the original string implementation. It isn't clear to me why index-of should be purely a string fn, I regularly need index-of with other things, and both my book and JoC use index-of as an example of a nice-to-have.
Hide
Nicola Mometto added a comment -

Stuart, I don't think the fact that those functions don't return string is really relevant, the important fact is that they operate on strings (just like c.set/superset? or subset? don't return sets) and are frequently used on them.

Show
Nicola Mometto added a comment - Stuart, I don't think the fact that those functions don't return string is really relevant, the important fact is that they operate on strings (just like c.set/superset? or subset? don't return sets) and are frequently used on them.
Hide
Bozhidar Batsov added a comment -

I'm with Nicola on this one - we have to be practical. There's a reason why this is the top-voted JIRA ticket - Clojure programmers want those functions. That's says plenty.

Anyways, having a generic `index-of` is definitely not a bad idea.

Show
Bozhidar Batsov added a comment - I'm with Nicola on this one - we have to be practical. There's a reason why this is the top-voted JIRA ticket - Clojure programmers want those functions. That's says plenty. Anyways, having a generic `index-of` is definitely not a bad idea.

People

Vote (29)
Watch (4)

Dates

  • Created:
    Updated: