<< Back to previous view

[NREPL-53] Middleware linearization is nondeterministically wrong Created: 02/May/14  Updated: 19/Nov/14

Status: Open
Project: tools.nrepl
Component/s: None
Affects Version/s: 0.2.3
Fix Version/s: None

Type: Defect Priority: Critical
Reporter: Gary Fredericks Assignee: Chas Emerick
Resolution: Unresolved Votes: 0
Labels: None

clojure 1.6.0

Attachments: Text File NREPL-53.patch    


I've had trouble getting my middleware ordered correctly, and have wittled it down to the following example which seems to only fail in some processes. I.e., it is deterministic within a particular jvm, but across jvms the result can change. My guess is that this has to do with hashing vars.

This code prints true (signifying a correct result) roughly 1/3 of the time and false otherwise.

(ns chas.core
  (:require [clojure.tools.nrepl.middleware :as mid]

(def wrap-foo identity)
(mid/set-descriptor! #'wrap-foo
                     {:expects #{#'clojure.tools.nrepl.middleware.session/session "eval"}})

(def wrap-bar identity)
(mid/set-descriptor! #'wrap-bar
                     {:requires #{#'clojure.tools.nrepl.middleware.session/session},
                      :expects #{"describe" #'clojure.tools.nrepl.middleware.pr-values/pr-values},
                      :handles {"stacktrace" {}}})

(defn -main
   ;; should return true, since #'wrap-foo belongs later in the stack than
   ;; #'clojure.tools.nrepl.middleware.session/session due to its :expects
   ;; clause
   (->> [#'wrap-foo
        (filter #{#'clojure.tools.nrepl.middleware.session/session #'wrap-foo})
        (= [#'clojure.tools.nrepl.middleware.session/session #'wrap-foo]))))

Leiningen makes it difficult to use forked versions of nrepl, so as a hackier workaround until this issue is resolved I've created this Leiningen plugin.

Comment by Gary Fredericks [ 02/May/14 10:35 PM ]

Interesting followup observation I made with Alan Malloy in IRC: if we remove "eval" from the :expects clause, it seems to return consistently true. On the other hand if we instead add #'clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval to the middleware list, suddenly it returns consistently false.

Comment by Gary Trakhman [ 04/Jun/14 3:35 PM ]

I should add my own experience report with this, hope to dig in to it someday soon: https://groups.google.com/d/msg/clojure/nUBBbYZHuTE/ScLBH-A2HkoJ

In my case, it was incorrectness, not indeterminism.

Comment by Chas Emerick [ 19/Aug/14 1:59 PM ]

Half of the middleware linearization stuff is too clever, the other half is really dumb. :-/

I'd love a patch for this. Absent one, I'll have to dig into it, but I'm not going to hold up 0.2.4 any further for that opportunity to come around.

Comment by Gary Fredericks [ 19/Aug/14 2:12 PM ]

Is this a fair summary of the requirements?

1) Should be deterministic
2) Should only return a result that respects the :expects and :requires clauses – if this is impossible it should throw an exception explaining why

Comment by Chas Emerick [ 19/Aug/14 2:58 PM ]

Yes, that's fair. I might add (3) try to not break anything significant that might rely upon undocumented behaviour, but I doubt that's helpful.

Comment by Gary Fredericks [ 25/Sep/14 9:29 PM ]

I've written generative tests for this, but test.check requires clojure 1.4 (while nrepl tests run on 1.2).

If I wrap them in code that NOOPs for old clojures, I imagine they will they get run in CI? Is that a reasonable approach?

Comment by Colin Jones [ 26/Sep/14 7:50 AM ]

That approach sounds great to me.

Comment by Gary Fredericks [ 26/Sep/14 10:06 AM ]

Given I've coded it that way, is there an easy way for me to run the tests without tweaking the pom.xml to specify 1.4?

Comment by Gary Fredericks [ 30/Sep/14 10:11 PM ]

Okay I think I finally tracked this down.

Basically the conj-sorted function was not a valid way of doing a topological sort.

I replaced it with an actual topological sort, that still pushes independent middlewares toward the end.

The generative tests are now passing, as are the rest of the tests.

At some point I'll circle back to my original problem and see if that's solved.

I still don't fully understand all of the linearization code, so it's possible that the problem is more subtle than this.

The fix is on a github branch here. I might want to wait on another release of test.check before making a patch, since it should have some new features that make the generative tests better (in particular I'd be able to remove the dependency on my test.chuck utility library, which obviously shouldn't be in the tools.nrepl codebase).

Comment by Gary Fredericks [ 04/Oct/14 10:07 AM ]

Attached a patch that contains the aforementioned topological sort, with a single regression test (and no generative tests).

Things to note:

  • I created a wrapper for clojure.core/ex-info so that in the case of a dependency cycle we can include with the exception message an example cycle, but don't have to assume clojure 1.4. If assuming clojure 1.4 is okay I can take that out and use clojure.core/ex-info directly.
  • I tried adding a test to verify that a dependency cycle creates an exception, but the test failed, and when I dug into it I discovered that the extend-deps function was doing strange things I did not expect, which means either it is buggy or I do not understand what it is supposed to do. See this commit for the failing test.
  • The topological sort now explicitly pushes middlewares with no dependencies to the end of the stack, instead of the more implicit fashion in the older code.
  • This patch does fix the problem I described in the description
Generated at Mon Dec 22 05:29:24 CST 2014 using JIRA 4.4#649-r158309.