<< Back to previous view

[MATCH-48] Guards cannot be fn expressions Created: 10/Jan/12  Updated: 28/Jul/13  Resolved: 25/Feb/12

Status: Closed
Project: core.match
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Chris Gray Assignee: David Nolen
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File 0001-Added-new-multimethod-mutually-exclusive-inequality.patch     Text File 0001-Failing-test.patch     Text File 0002-Use-the-new-multimethod-and-add-some-tests.patch     File new-patches.diff    
Patch: Code and Test

 Description   

Anonymous function literals as guards (such as (a :when #(odd? %))) seem to confuse the match compiler. The attached test shows how.



 Comments   
Comment by David Nolen [ 10/Jan/12 10:04 PM ]

I'm on the fence about allowing inline fn expressions and fn literals as guards. The problem is that they can't be checked for equality and thus tests cannot be shared across guard patterns. I need to think about it some more but I don't consider this high priority in the near future. Any decision will have to take into consideration the goal of predicate dispatch.

Comment by Chris Gray [ 11/Jan/12 12:41 PM ]

Sorry, I think the fact that the functions in my earlier example were function literals was a bit of a red herring. The following test also fails.

(deftest guard-pattern-match-5
  (is (=
       (let [oddp odd?]
         (match [1 2]
                [a :when odd? b :when odd?] :a1
                [a :when oddp _] :a2
                [_ b :when even?] :a3
                :else :a4))
       :a2)))
Comment by David Nolen [ 11/Jan/12 12:44 PM ]

The earlier examples where not a red herring. This is likely a separate issue.

Comment by Chris Gray [ 11/Jan/12 12:50 PM ]

I really don't see how, given that there seems to be no code that specializes on the type of function given to a guard. My guess is that when guard-pattern-match-5 succeeds, guard-pattern-match-4 will succeed as well.

Comment by David Nolen [ 11/Jan/12 1:00 PM ]

Oh sorry you are right. This is exactly same problem. We can't know that odd? and oddp are the same. Again this is not something I'm interested in fixing without a lot more consideration.

Basically functions can't be tested for equality like types can. This means that the presence of a guard must create a backtrack point. However if we make guards work a little more like types (you have to declare them ahead of time) we lose a little bit of convenience but gain a lot of reasoning power and can share tests and avoid these pitfalls.

This discussion probably needs a design page.

Comment by Chris Gray [ 11/Jan/12 6:11 PM ]

Two more examples of the same problem, this time not using guards:

(deftest unequal-equal-tests
  (is (=
       (match ["foo" "bar"]
              [#".*" #"baz"] :a1
              [#"foo" _] :a2
              [_ "bar"] :a3
              :else :a4)
       :a2)))

(deftest unequal-equal-tests-2
  (is (=
       (let [a 1]
        (match [1 2]
               [(:or 1 2) 3] :a1
               [(:or 2 a) 2] :a2
               [1 _] :a3
               :else :a4))
       :a2)))
Comment by David Nolen [ 11/Jan/12 6:58 PM ]

They are not the same problem. These don't work for very different reasons. The first I probably won't even address and will leave for the community to deal with - I don't need robust pattern matching on regexes.

The second example is a legitimate bug around matching locals which is unrelated to this ticket. Feel free to open a new one for it.

Comment by Chris Gray [ 11/Jan/12 7:15 PM ]

Yes, you're right, the second is unrelated.

Comment by Chris Gray [ 12/Jan/12 11:20 AM ]

On further reflection, what all these examples show is that Maranget's algorithm is only correct for literals whose equality you can test at compile time. Thus, not even locals will work using his algorithm. Regexes and functions will certainly not work correctly 100% of the time.

What happens is that when multiple unequal tests in the same column can return a truthy value, you end up with a decision forest rather than a decision tree. If the first decision tree in the forest has the first and third end-states, while the second tree has the second end-state, if you end up in the third end-state, you must still check the second decision tree before you decide which end-state is actually correct.

This is a shame, since it means that compiling the matches becomes more complex. On the other hand, it seems like a great subject for a paper at programming-language conference, so there's always that.

On a serious note, though, this bug is major, and you should consider removing support for at least guards, locals, and regexes until it is fixed. The bugs that arise from it in the end-user's code are really hard to track down – it's as if `or` or `and` were broken 10% of the time.

Comment by David Nolen [ 12/Jan/12 11:29 AM ]

There is nothing wrong with Maranget's algorithm. We just have to be sure that we create a backtrack point - that's all.

Functions cannot be fixed because function equality is undecideable. So for guards we might create a backtrack point. I've already updated the README to describe what works at the moment. I have a branch which throws an error if :when is not given a vector of symbols or a symbol. It should probably be improved so that symbols are known top levels (no reassigning fns to locals).

Regex equality can probably be made to work but I'm not going to do it. (On further thought we can probably make patterns create backtrack points by default, can be overridden for those willing to make their patterns highly optimizeable)

Locals can be fixed, we'll definitely create a backtrack point for these.

Comment by Chris Gray [ 12/Jan/12 11:54 AM ]

I'm sorry, but this problem will not be solved by backtracking alone. At least not with the backtracking mechanism that currently exists.

With backtracking, you are still treating the problem as though you have a decision tree. A decision tree requires that all the tests at its nodes are mutually exclusive.

By assuming that you have a decision tree, once a match is found in the tree, that match is returned. As my last comment pointed out, that is not sufficient. You must also check to see if there is a match in an end-state that was declared earlier. I really don't see that that's possible given the current backtracking system.

Comment by David Nolen [ 12/Jan/12 1:26 PM ]

I don't follow you. Maranget's algorithm is not sufficient for pattern matching if we don't constrain columns to specific types. There are many things already in place to deal with the shortcomings of Maranget's approach given what we want to accomplish - for example, we actually have a grouping pass. This is the same approach that Racket uses as far as I can tell and they don't have any problems. Certainly none of the patterns you shown thus that far (besides fns exprs) pose any challenges that I can see.

Comment by Chris Gray [ 12/Jan/12 1:42 PM ]

Sorry, I was editing my comment as you made yours. I hope it is more clear now.

I guess I don't totally understand your code still, so I will try to rectify that before commenting again. From what I have seen, though, you are trying to build a decision tree. What I am saying is that isn't possible at compile time, since you can't ensure that the nodes of the tree are mutually exclusive.

Comment by David Nolen [ 12/Jan/12 1:51 PM ]

All known constructors are considered mutually exclusive. We group all constructors in a column preserving order as closely as possible. Decisions trees are created for these constructors. If we cannot know at compile time whether something is mutually exclusive (wildcards, locals), we create a backtrack point to handle them.

Consider that if we only have backtrack points (no trees) all tests could potentially be tried. Our approach is a hybrid one - we don't rely only on decisions trees and we don't rely only on backtracking.

When we get to integrating pattern matching on interfaces, protocols ambiguities of course become possibly. But even this can probably be handled reasonably with something like "prefers".

Comment by Chris Gray [ 12/Jan/12 3:07 PM ]

Consider the following pair of decision trees:

  1
 /
a
 \
  3*

  2*
 /
b--4*
 \
  5*

Where the numbers are the order in which the terminals appear, and they have stars beside them if they match. The correct terminal for core.match to return in this case is the second one. Currently, the code would return the third terminal. Suppose, however, that backtracking was added so that the tree rooted at b was checked for matches as well. This is certainly possible, though a lot of information would need to be kept about the match already found (which is what I meant about things not working with the current backtracking system). Also, you must ensure that the testing stops when you hit the second terminal, for two reasons – first, not doing so would imply that all terminals are checked, and second, the test to distinguish 4 from 5 could throw an exception. For similar reasons, the return value of the third terminal can't be computed – it could be a very long computation, or it could throw an exception.

Comment by David Nolen [ 12/Jan/12 3:28 PM ]

If the correct terminal is the second one, we will return the second one. No information needs to be kept around. I suggest you take a closer look at the code at this point.

Comment by Chris Gray [ 12/Jan/12 6:58 PM ]

Aah, you're right. (I think.) Might it be more accurate to say that the situation I proposed can't happen? That is, no two trees are created where there are lower-numbered terminals in the second tree than in the first tree?

Is the plan to add backtrack points for everything where equality can't be determined at compile time?

Comment by Chris Gray [ 12/Jan/12 11:27 PM ]

These patches implement the proper backtracking for tests that are not mutually exclusive.

Comment by David Nolen [ 13/Jan/12 9:15 AM ]

Wow, this is great. I've skimmed over the patches and they look pretty good. I will go over them more closely as soon as I can - there are a couple changes we should probably make. Thanks!

Comment by Chris Gray [ 13/Jan/12 11:08 AM ]

No problem. My apologies for the persistent misunderstanding.

Comment by David Nolen [ 14/Feb/12 4:39 PM ]

Sorry for the epic delay. Here are my notes on that patches:

1. Let's rename mutually-exclusive-inequality? to comparable?
2. I want GuardPatterns to comparable, I'm not going to support arbitrary fns - not worth it IMO.
As a compromise I'd be willing to rename :when patterns -> PredicatePatterns and GuardPatterns
can be allowed via :guard.
3. No need to declare OrPatterns as comparable - they aren't real patterns
4. No need to exclude VectorPatterns - the way that pattern matching works makes this unnecessary

If you make these changes I promise to apply these in a timely manner

Comment by Chris Gray [ 15/Feb/12 12:50 PM ]

Okay, patch uploaded. There is a fairly significant portion of it that is just cut and paste, which I'm not so happy about, but I don't think there's a way to do type inheritance, so I did the easier thing.

Comment by David Nolen [ 25/Feb/12 6:49 PM ]

Fixed, https://github.com/clojure/core.match/commit/ac92c6df3f70f56fbe12f9d3f46585e66102c50b

addressed MATCH-50

Generated at Tue Sep 23 11:26:17 CDT 2014 using JIRA 4.4#649-r158309.