Clojure

Regexp are never equal

Details

  • Type: Enhancement Enhancement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Declined
  • Affects Version/s: Release 1.5
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Patch:
    Code
  • Approval:
    Triaged

Description

The following (= #"asd" #"asd") will return false in CLJ, even if they are the same.

Consequently, everything with a regexp in it (lists, vectors, maps...) will never be equal.

It seems to have been fixed for CLJS, but not for CLJ.
https://github.com/clojure/clojurescript/commit/e35c3a57472fa62ae41591418a73794dc8ac6dde

Activity

Hide
Omer Iqbal added a comment -

added an implementation for the equiv method if both args are java.util.regex.Pattern

Show
Omer Iqbal added a comment - added an implementation for the equiv method if both args are java.util.regex.Pattern
Hide
Andy Fingerhut added a comment -

Omer, could you also include in your patch a few test cases? At least one that validates that two regex's that should be equal, are equal, and another that two regex's that should be different, are non-equal. Preferably the first of those tests should fail with the existing Clojure code, and pass with your changes.

Show
Andy Fingerhut added a comment - Omer, could you also include in your patch a few test cases? At least one that validates that two regex's that should be equal, are equal, and another that two regex's that should be different, are non-equal. Preferably the first of those tests should fail with the existing Clojure code, and pass with your changes.
Hide
Omer Iqbal added a comment -

I updated the patch with some tests. Hope I added them in the correct file. I also changed the implementation a bit, by instead of adding another implementation of equiv with a different signature, checking the type of the Object in the equiv method with signature (Object k1, Object k2).
For the sake of consistency I also added an EquivPred implementation, though I'm not entirely sure when its used.

Show
Omer Iqbal added a comment - I updated the patch with some tests. Hope I added them in the correct file. I also changed the implementation a bit, by instead of adding another implementation of equiv with a different signature, checking the type of the Object in the equiv method with signature (Object k1, Object k2). For the sake of consistency I also added an EquivPred implementation, though I'm not entirely sure when its used.
Hide
Andy Fingerhut added a comment -

All comments here refer to the patch named fix-CLJ-1182.diff dated Mar 13, 2013.

The location for the new tests looks reasonable. However, note that your new patch has your old changes plus some new ones, not just the new ones. In particular, the new signature for equiv is still in your latest patch. You should start from a clean pull of the latest Clojure master and make only the changes you want when creating a patch, not build on top of previous changes you have made.

Also, there are several whitespace-only changes in your patch that should not be included.

Show
Andy Fingerhut added a comment - All comments here refer to the patch named fix-CLJ-1182.diff dated Mar 13, 2013. The location for the new tests looks reasonable. However, note that your new patch has your old changes plus some new ones, not just the new ones. In particular, the new signature for equiv is still in your latest patch. You should start from a clean pull of the latest Clojure master and make only the changes you want when creating a patch, not build on top of previous changes you have made. Also, there are several whitespace-only changes in your patch that should not be included.
Hide
Omer Iqbal added a comment -

I uploaded a clean patch, removing the whitespace diff and only adding the latest changes. Thanks for clarifying the workflow!
Just to clarify, this refers to the patch named fix-CLJ-1182.diff dated Mar 13 1:34 PM

Show
Omer Iqbal added a comment - I uploaded a clean patch, removing the whitespace diff and only adding the latest changes. Thanks for clarifying the workflow! Just to clarify, this refers to the patch named fix-CLJ-1182.diff dated Mar 13 1:34 PM
Hide
Stuart Halloway added a comment -

I am recategorizing this as an enhancement, because if this is a bug it is a bug in Java – the Java Patterns class documents being immutable, but apparently does not implement .equals.

Other recent "make Clojure more complicated to work around problems in Java" patches have been rejected, and I suspect this one will be too, because it might impact the performance of equality everywhere.

Show
Stuart Halloway added a comment - I am recategorizing this as an enhancement, because if this is a bug it is a bug in Java – the Java Patterns class documents being immutable, but apparently does not implement .equals. Other recent "make Clojure more complicated to work around problems in Java" patches have been rejected, and I suspect this one will be too, because it might impact the performance of equality everywhere.
Hide
Stuart Halloway added a comment -

At first pass, Rich and I both believe that, as regex equality is undecidable, that Java made the right choice in using identity for equality, that this ticket should be declined, and the ClojureScript should revert to match.

But leaving this ticket open for now so that ClojureScript dev can weigh in.

Show
Stuart Halloway added a comment - At first pass, Rich and I both believe that, as regex equality is undecidable, that Java made the right choice in using identity for equality, that this ticket should be declined, and the ClojureScript should revert to match. But leaving this ticket open for now so that ClojureScript dev can weigh in.
Hide
Michael Drogalis added a comment -

What do you mean when you say "undecidable"?

Show
Michael Drogalis added a comment - What do you mean when you say "undecidable"?
Hide
Alexander Redington added a comment -

If Regex instances were interned by the reader, but still used identity for equality, then code like

(= #"abc" #"abc")

would return true, but it wouldn't diverge in the definition of equality for regular expressions between Java and Clojure.

Show
Alexander Redington added a comment - If Regex instances were interned by the reader, but still used identity for equality, then code like (= #"abc" #"abc") would return true, but it wouldn't diverge in the definition of equality for regular expressions between Java and Clojure.
Hide
Fogus added a comment -

Undecidable means that for any given regular expression, there is no single way to write it. For example #"[a]" #"a" both match the same strings, but are they the same? Maybe. But how can we decide if /any/ given regular expression matches all of the same strings as any other? The answer is, you can't. Java does provide a Pattern#pattern method that returns the string that was used to build it, but I'm not sure if that could/should be used as a basis for equality given the potential perf hit.

Show
Fogus added a comment - Undecidable means that for any given regular expression, there is no single way to write it. For example #"[a]" #"a" both match the same strings, but are they the same? Maybe. But how can we decide if /any/ given regular expression matches all of the same strings as any other? The answer is, you can't. Java does provide a Pattern#pattern method that returns the string that was used to build it, but I'm not sure if that could/should be used as a basis for equality given the potential perf hit.
Hide
Herwig Hochleitner added a comment -

I posted in Stu's thread: https://groups.google.com/d/topic/clojure-dev/OTPNJQbPtds/discussion
TL;DR: Disagree with undecidability, agree with reverting to identity based equality

Show
Herwig Hochleitner added a comment - I posted in Stu's thread: https://groups.google.com/d/topic/clojure-dev/OTPNJQbPtds/discussion TL;DR: Disagree with undecidability, agree with reverting to identity based equality
Hide
Michael Drogalis added a comment -

That makes sense to me. Thanks Fogus.

Show
Michael Drogalis added a comment - That makes sense to me. Thanks Fogus.
Hide
Herwig Hochleitner added a comment -

From my post to the ml thread, it might not be entirely clear, why I don't think we should implement equality for host regexes:

It would involve parsing and would leave a lot of room for errors and platform-idiosycracies to leak. And it would be different for every platform.

As Alexander said, I also think this ticket could be resolved by interning regex literals, akin to keywords. That, however, would warrant a design page first, because there are tradeoffs to be made about how far the interning should go.

Show
Herwig Hochleitner added a comment - From my post to the ml thread, it might not be entirely clear, why I don't think we should implement equality for host regexes: It would involve parsing and would leave a lot of room for errors and platform-idiosycracies to leak. And it would be different for every platform. As Alexander said, I also think this ticket could be resolved by interning regex literals, akin to keywords. That, however, would warrant a design page first, because there are tradeoffs to be made about how far the interning should go.
Hide
Rich Hickey added a comment -

Why are we spending time on this? Where is the problem statement? Does someone actually need this for a real world purpose not solved by using regex strings as keys?

Show
Rich Hickey added a comment - Why are we spending time on this? Where is the problem statement? Does someone actually need this for a real world purpose not solved by using regex strings as keys?
Hide
Michael Drogalis added a comment -

It was prompted as a matter of consistency, which I think is valid. I can't think of a good reason to use regex's as keys though.

Show
Michael Drogalis added a comment - It was prompted as a matter of consistency, which I think is valid. I can't think of a good reason to use regex's as keys though.
Hide
Trevor Hartman added a comment -

This issue surfaced as a bug in my code, e.g.: (get {#"^named$" 1} #"^named$") ;=> nil ; surprise!

Show
Trevor Hartman added a comment - This issue surfaced as a bug in my code, e.g.: (get {#"^named$" 1} #"^named$") ;=> nil ; surprise!

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: