<< Back to previous view

[CLJ-1182] Regexp are never equal Created: 12/Mar/13  Updated: 11/Apr/14  Resolved: 05/Sep/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Christian Fortin Assignee: Omer Iqbal
Resolution: Declined Votes: 0
Labels: None

Attachments: File fix-CLJ-1182.diff    
Patch: Code
Approval: Triaged


The following (= #"asd" #"asd") will return false in CLJ, even if they are the same.

Consequently, everything with a regexp in it (lists, vectors, maps...) will never be equal.

It seems to have been fixed for CLJS, but not for CLJ.

Comment by Omer Iqbal [ 12/Mar/13 4:08 PM ]

added an implementation for the equiv method if both args are java.util.regex.Pattern

Comment by Andy Fingerhut [ 12/Mar/13 5:54 PM ]

Omer, could you also include in your patch a few test cases? At least one that validates that two regex's that should be equal, are equal, and another that two regex's that should be different, are non-equal. Preferably the first of those tests should fail with the existing Clojure code, and pass with your changes.

Comment by Omer Iqbal [ 13/Mar/13 5:39 AM ]

I updated the patch with some tests. Hope I added them in the correct file. I also changed the implementation a bit, by instead of adding another implementation of equiv with a different signature, checking the type of the Object in the equiv method with signature (Object k1, Object k2).
For the sake of consistency I also added an EquivPred implementation, though I'm not entirely sure when its used.

Comment by Andy Fingerhut [ 13/Mar/13 1:04 PM ]

All comments here refer to the patch named fix-CLJ-1182.diff dated Mar 13, 2013.

The location for the new tests looks reasonable. However, note that your new patch has your old changes plus some new ones, not just the new ones. In particular, the new signature for equiv is still in your latest patch. You should start from a clean pull of the latest Clojure master and make only the changes you want when creating a patch, not build on top of previous changes you have made.

Also, there are several whitespace-only changes in your patch that should not be included.

Comment by Omer Iqbal [ 13/Mar/13 1:39 PM ]

I uploaded a clean patch, removing the whitespace diff and only adding the latest changes. Thanks for clarifying the workflow!
Just to clarify, this refers to the patch named fix-CLJ-1182.diff dated Mar 13 1:34 PM

Comment by Stuart Halloway [ 29/Mar/13 5:46 AM ]

I am recategorizing this as an enhancement, because if this is a bug it is a bug in Java – the Java Patterns class documents being immutable, but apparently does not implement .equals.

Other recent "make Clojure more complicated to work around problems in Java" patches have been rejected, and I suspect this one will be too, because it might impact the performance of equality everywhere.

Comment by Stuart Halloway [ 12/Apr/13 9:04 AM ]

At first pass, Rich and I both believe that, as regex equality is undecidable, that Java made the right choice in using identity for equality, that this ticket should be declined, and the ClojureScript should revert to match.

But leaving this ticket open for now so that ClojureScript dev can weigh in.

Comment by Michael Drogalis [ 12/Apr/13 9:32 AM ]

What do you mean when you say "undecidable"?

Comment by Alexander Redington [ 12/Apr/13 10:03 AM ]

If Regex instances were interned by the reader, but still used identity for equality, then code like

(= #"abc" #"abc")

would return true, but it wouldn't diverge in the definition of equality for regular expressions between Java and Clojure.

Comment by Fogus [ 12/Apr/13 10:13 AM ]

Undecidable means that for any given regular expression, there is no single way to write it. For example #"[a]" #"a" both match the same strings, but are they the same? Maybe. But how can we decide if /any/ given regular expression matches all of the same strings as any other? The answer is, you can't. Java does provide a Pattern#pattern method that returns the string that was used to build it, but I'm not sure if that could/should be used as a basis for equality given the potential perf hit.

Comment by Herwig Hochleitner [ 12/Apr/13 10:31 AM ]

I posted in Stu's thread: https://groups.google.com/d/topic/clojure-dev/OTPNJQbPtds/discussion
TL;DR: Disagree with undecidability, agree with reverting to identity based equality

Comment by Michael Drogalis [ 12/Apr/13 10:32 AM ]

That makes sense to me. Thanks Fogus.

Comment by Herwig Hochleitner [ 12/Apr/13 9:42 PM ]

From my post to the ml thread, it might not be entirely clear, why I don't think we should implement equality for host regexes:

It would involve parsing and would leave a lot of room for errors and platform-idiosycracies to leak. And it would be different for every platform.

As Alexander said, I also think this ticket could be resolved by interning regex literals, akin to keywords. That, however, would warrant a design page first, because there are tradeoffs to be made about how far the interning should go.

Comment by Rich Hickey [ 13/Apr/13 8:51 AM ]

Why are we spending time on this? Where is the problem statement? Does someone actually need this for a real world purpose not solved by using regex strings as keys?

Comment by Michael Drogalis [ 13/Apr/13 9:13 PM ]

It was prompted as a matter of consistency, which I think is valid. I can't think of a good reason to use regex's as keys though.

Comment by Trevor Hartman [ 11/Apr/14 11:13 AM ]

This issue surfaced as a bug in my code, e.g.: (get {#"^named$" 1} #"^named$") ;=> nil ; surprise!

Generated at Tue Sep 26 14:48:17 CDT 2017 using JIRA 4.4#649-r158309.