<< Back to previous view

[NREPL-53] Middleware linearization is nondeterministically wrong Created: 02/May/14  Updated: 19/Jan/15  Resolved: 12/Jan/15

Status: Closed
Project: tools.nrepl
Component/s: None
Affects Version/s: 0.2.3
Fix Version/s: 0.2.7

Type: Defect Priority: Critical
Reporter: Gary Fredericks Assignee: Chas Emerick
Resolution: Completed Votes: 0
Labels: None

clojure 1.6.0

Attachments: Text File NREPL-53.patch    


I've had trouble getting my middleware ordered correctly, and have wittled it down to the following example which seems to only fail in some processes. I.e., it is deterministic within a particular jvm, but across jvms the result can change. My guess is that this has to do with hashing vars.

This code prints true (signifying a correct result) roughly 1/3 of the time and false otherwise.

(ns chas.core
  (:require [clojure.tools.nrepl.middleware :as mid]

(def wrap-foo identity)
(mid/set-descriptor! #'wrap-foo
                     {:expects #{#'clojure.tools.nrepl.middleware.session/session "eval"}})

(def wrap-bar identity)
(mid/set-descriptor! #'wrap-bar
                     {:requires #{#'clojure.tools.nrepl.middleware.session/session},
                      :expects #{"describe" #'clojure.tools.nrepl.middleware.pr-values/pr-values},
                      :handles {"stacktrace" {}}})

(defn -main
   ;; should return true, since #'wrap-foo belongs later in the stack than
   ;; #'clojure.tools.nrepl.middleware.session/session due to its :expects
   ;; clause
   (->> [#'wrap-foo
        (filter #{#'clojure.tools.nrepl.middleware.session/session #'wrap-foo})
        (= [#'clojure.tools.nrepl.middleware.session/session #'wrap-foo]))))

Leiningen makes it difficult to use forked versions of nrepl, so as a hackier workaround until this issue is resolved I've created this Leiningen plugin.

Comment by Gary Fredericks [ 02/May/14 10:35 PM ]

Interesting followup observation I made with Alan Malloy in IRC: if we remove "eval" from the :expects clause, it seems to return consistently true. On the other hand if we instead add #'clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval to the middleware list, suddenly it returns consistently false.

Comment by Gary Trakhman [ 04/Jun/14 3:35 PM ]

I should add my own experience report with this, hope to dig in to it someday soon: https://groups.google.com/d/msg/clojure/nUBBbYZHuTE/ScLBH-A2HkoJ

In my case, it was incorrectness, not indeterminism.

Comment by Chas Emerick [ 19/Aug/14 1:59 PM ]

Half of the middleware linearization stuff is too clever, the other half is really dumb. :-/

I'd love a patch for this. Absent one, I'll have to dig into it, but I'm not going to hold up 0.2.4 any further for that opportunity to come around.

Comment by Gary Fredericks [ 19/Aug/14 2:12 PM ]

Is this a fair summary of the requirements?

1) Should be deterministic
2) Should only return a result that respects the :expects and :requires clauses – if this is impossible it should throw an exception explaining why

Comment by Chas Emerick [ 19/Aug/14 2:58 PM ]

Yes, that's fair. I might add (3) try to not break anything significant that might rely upon undocumented behaviour, but I doubt that's helpful.

Comment by Gary Fredericks [ 25/Sep/14 9:29 PM ]

I've written generative tests for this, but test.check requires clojure 1.4 (while nrepl tests run on 1.2).

If I wrap them in code that NOOPs for old clojures, I imagine they will they get run in CI? Is that a reasonable approach?

Comment by Colin Jones [ 26/Sep/14 7:50 AM ]

That approach sounds great to me.

Comment by Gary Fredericks [ 26/Sep/14 10:06 AM ]

Given I've coded it that way, is there an easy way for me to run the tests without tweaking the pom.xml to specify 1.4?

Comment by Gary Fredericks [ 30/Sep/14 10:11 PM ]

Okay I think I finally tracked this down.

Basically the conj-sorted function was not a valid way of doing a topological sort.

I replaced it with an actual topological sort, that still pushes independent middlewares toward the end.

The generative tests are now passing, as are the rest of the tests.

At some point I'll circle back to my original problem and see if that's solved.

I still don't fully understand all of the linearization code, so it's possible that the problem is more subtle than this.

The fix is on a github branch here. I might want to wait on another release of test.check before making a patch, since it should have some new features that make the generative tests better (in particular I'd be able to remove the dependency on my test.chuck utility library, which obviously shouldn't be in the tools.nrepl codebase).

Comment by Gary Fredericks [ 04/Oct/14 10:07 AM ]

Attached a patch that contains the aforementioned topological sort, with a single regression test (and no generative tests).

Things to note:

  • I created a wrapper for clojure.core/ex-info so that in the case of a dependency cycle we can include with the exception message an example cycle, but don't have to assume clojure 1.4. If assuming clojure 1.4 is okay I can take that out and use clojure.core/ex-info directly.
  • I tried adding a test to verify that a dependency cycle creates an exception, but the test failed, and when I dug into it I discovered that the extend-deps function was doing strange things I did not expect, which means either it is buggy or I do not understand what it is supposed to do. See this commit for the failing test.
  • The topological sort now explicitly pushes middlewares with no dependencies to the end of the stack, instead of the more implicit fashion in the older code.
  • This patch does fix the problem I described in the description
Comment by Chas Emerick [ 12/Jan/15 8:23 AM ]

Great, thanks for the work! Did you have specification tests written, and then stripped them out? If so, I'd love to include them.

Comment by Gary Fredericks [ 12/Jan/15 10:05 AM ]

There are some generative tests somewhere that I think Colin was helping with – is that what you're referring to?

Comment by Chas Emerick [ 12/Jan/15 10:15 AM ]

No, you mentioned your test.chuck library, and waiting for a new test.check release before making a patch...it sounded like you had spec tests written, but then removed them for the patch.

Comment by Colin Jones [ 12/Jan/15 10:21 AM ]

Yeah I was trying to help with those, but got kind of hung up on what happens when there's a dependency cycle (a stack overflow, IIRC, instead of the `IllegalArgumentException` with a nice error message that I'd have wanted). Also sidetracked by other priorities - sorry Gary!

But I think that's a separate problem with `clojure.tools.nrepl.middleware/dependencies` not tracking nodes it has already visited: not caused by Gary's patch.

Comment by Chas Emerick [ 12/Jan/15 10:32 AM ]

OK, thanks for the clarification. I'll merge this now, but wait a few days to release to see if anything breaks from people using the snapshot. (Perhaps a good/expected thing given the shortcomings of conj-sorted.)

Comment by Gary Fredericks [ 12/Jan/15 11:10 AM ]

Oh right, the issue was that the generators for the tests were written using this helper macro that I've become quite attached to, and so far I haven't been able to convince Reid that it should be in test.check proper.

The tools.nrepl tests are here, and you can imagine how readable this would be without the for macro.

Comment by Chas Emerick [ 12/Jan/15 11:19 AM ]

Applied to master @ 8547857, now available in 0.2.7-SNAPSHOT in the OSS snapshots repo: https://oss.sonatype.org/content/repositories/snapshots

Comment by Gary Fredericks [ 12/Jan/15 11:32 AM ]

Thanks Chas!

Comment by Gary Fredericks [ 18/Jan/15 4:28 PM ]

Wait is there actually a problem with using a non-contrib library if it's just for tests?

Comment by Chas Emerick [ 19/Jan/15 4:04 AM ]

I can't see why that'd be a problem. Let's have the test-scoped dependency be a non-SNAPSHOT though.

Comment by Chas Emerick [ 19/Jan/15 4:08 AM ]

That said, if you can write the tests you want without using test.chuck, that'd be even better, just so that later maintainers don't have to grok those helpers. I see the utility of them (especially for), but it could still be a barrier compared to vanilla simple-check bits.

Comment by Gary Fredericks [ 19/Jan/15 7:56 AM ]

Okay, I've got a refactoring of the tests without test.chuck, I just have a bit of prep to do and then I'll make another ticket/patch.

Generated at Tue Jan 23 02:30:44 CST 2018 using JIRA 4.4#649-r158309.