<< Back to previous view

[ASYNC-64] Race condition when closing mults Created: 29/Apr/14  Updated: 16/Oct/14

Status: Open
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: James Reeves Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: mult

Approval: Triaged

 Description   

When a mult is tapped at around the same time as the source channel is closed, the tapped channel may not be closed.

(require '[clojure.core.async :refer (chan mult tap close!)])
(let [s (chan)
      m (mult s)
      c (chan)]
  (tap m c)
  (close! s)
  (impl/closed? c))

The above code will sometimes return true, and sometimes return false.

Cause: This is caused by the following code in the mult function:

(if (nil? val)
  (doseq [[c close?] @cs]
    (when close? (close! c)))

Any channels tapped after cs is dereferenced will not be closed.

Approach: A possible solution to this could be to always close channels tapped to a closed source. i.e.

(let [s (chan)
      m (mult s)
      c (chan)]
  (close! s)
  (tap m c))  ;; will always close c

This could be achieved by adding a flag to the cs atom to denote whether the mult is open or closed. If it's closed, any tapped channel is closed automatically.



 Comments   
Comment by James Reeves [ 30/Apr/14 6:05 AM ]

For reference, below is the custom fix for mult I'm using:

(defn mult [ch]
  (let [state (atom [true {}])
        m (reify
            Mux
            (muxch* [_] ch)
            Mult
            (tap* [_ ch close?]
              (let [add-ch    (fn [[o? cs]] [o? (if o? (assoc cs ch close?) cs)])
                    [open? _] (swap! state add-ch)]
                (when-not open? (close! ch))
                nil))
            (untap* [_ ch]
              (swap! state (fn [[open? cs]] [open? (dissoc cs ch)]))
              nil)
            (untap-all* [_]
              (swap! state (fn [[open? _]] [open? {}]))))
        dchan (chan 1)
        dctr (atom nil)
        done (fn [_] (when (zero? (swap! dctr dec))
                       (put! dchan true)))]
    (go-loop []
      (let [val (<! ch)]
        (if (nil? val)
          (let [[_ cs] (swap! state (fn [[_ cs]] [false cs]))]
            (doseq [[c close?] cs]
              (when close? (close! c))))
          (let [chs (keys (second @state))]
            (reset! dctr (count chs))
            (doseq [c chs]
              (when-not (put! c val done)
                (swap! dctr dec)
                (untap* m c)))
            (when (seq chs)
              (<! dchan))
            (recur)))))
    m))
Comment by David Nolen [ 14/Oct/14 6:10 AM ]

Is this also fixed in master? Thanks.

Comment by Ghadi Shayban [ 15/Oct/14 11:09 PM ]

I understand the scenario, but honestly I'm not sure this is a bug in mult or the usage. A channel shouldn't be expected to always yield a take. The consumer of the "late tap" can guard against it with alts or some other mechanism, and also you can enforce a no-late-taps through a policy on the "production" side of things.

Rich Hickey can you weigh in?

Comment by James Reeves [ 16/Oct/14 3:51 AM ]

The "tap" function currently has an explicit "close?" flag, and if a tapped channel isn't guaranteed to close when the source channel closes, that argument probably shouldn't exist. Also, if auto-closing taps is taken out, should we remove the "close?" argument on "sub" as well?

Comment by Ghadi Shayban [ 16/Oct/14 11:34 AM ]

It's more than respecting the flag. Related to the close behavior, channels can tap and untap without receiving anything while the mult process happily distributes a value to another set of channels (like the ABA problem). Could also make it an error to tap after the close is distributed to the last deref'ed set of channels. That is different than the familiar permanent nil receive, but mults already differ from simple channels.





[ASYNC-58] mult channel deadlocks when untapping a consuming channel whilst messages are being queued/blocked Created: 20/Feb/14  Updated: 23/Jun/14

Status: Open
Project: core.async
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Mathieu Gauthron Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: deadlock, mult, untap
Environment:

Mac 10.7.5; java version "1.7.0_40"; [org.clojure/clojure "1.5.1"]; [org.clojure/core.async "0.1.267.0-0d7780-alpha"]; Tested with cider and emacs 24.3


Approval: Triaged

 Description   

I have two (or more) listeners tapped onto a mult channel. I want to use them all then have one (or more) of them to leave at will without blocking the other consumer(s) or the publisher. Initially they work fine until one of them wants to stop listening. I thought the listener which drops out needs to (be a good citizen and) untap its channel from mult (otherwise a deadlock is systematic). However if messages are put into the mult before the leaving listener has had a chance to untap its channel, it creates a deadlock on the main thread (which is putting more messages simultaneously). I do not find a way to guarantee that I can untap the channel in time to avoid this race condition.

Once I have reproduced the deadlock, the repl is frozen until I interrupt with ctrl-c.
I have also tried to close the tapped channel before untapping it but the result was the same.

In the following snippet, the last (println "I'm done. You will never see this") is never reached. The publisher and the remaining consumer (consumer 1) are deadlocked even though consumer 2 was trying to leave in good terms.

(require '[clojure.core.async :refer (chan go <! >!! mult tap untap)])
(let [to-mult (chan 1)
      m (mult to-mult)]

  ;;consumer 1
  (let [c (chan 1)]
    (tap m c)
    (go (loop []
          (when-let [v (<! c)]
            (println "1 Got! " v)
            (recur))
          (println "1 Exiting!"))))

  ;;consumer 2
  (let [c (chan 1)]
    (tap m c)
    (go (loop []
          (when-let [v (<! c)]
            (when (= v 42)  ;; exit when value is not 42
              (println "2 Got! " v)
              (recur)))
          (println "2 about to leave!")
          (Thread/sleep 5000) ;; wait a bit to exacerbate the race condition
          (untap m c) ;; before unsubscribing this reader
          (println "2 Exiting."))))

   (println "about to put a few messages that work")
   (doseq [a (range 10)]
     (>!! to-mult 42))
   (println "about to put a message that will force the exit of 2")
   (>!! to-mult 43)
   (println "about to put a few more messages before reader 2 is unsubscribed to show the deadlock")
   (doseq [a (range 10)]
     (println "putting msg" a)
     (>!! to-mult 42))
   (println "I'm done. You will never see this"))
about to put a few messages that work
2 Got!  42
1 Got!  42
2 Got!  42
1 Got!  42
1 Got!  42
2 Got!  42
1 Got!  42
1 Got!  42
2 Got!  42
2 Got!  42
2 Got!  42
2 Got!  1 Got!  42
422 Got!  42

1 Got!  42
1 Got!  42
2 Got!  42
1 Got!  42
about to put a message that will force the exit of 2
1 Got!  42
2 Got!  about to put a few more messages before reader 2 is unsubscribed to show the deadlock
42
putting msg 1 Got!  0
2 about to leave!
43
1 Got!  42
putting msg 1
putting msg 2
putting msg 3
1 Got!  42
2 Exiting.


 Comments   
Comment by Ghadi Shayban [ 22/Apr/14 10:18 AM ]

Mathieu, this is probably expected. It's important to note that to guarantee correct ordering/flow when using a mult, you should enforce it on the source/producer side of the mult, and not asynchronously on the tap side.

Mult will deref a stable set taps just before distributing a value to them, and does not adjust dynamically during value distribution except when a tap has been closed [1]. If you would like to stably untap without closing the tap you can/should let the 'producer' do it in an ordered fashion in between values on the input channel.

Knowing that a put occurred to a closed channel is new on release 0.1.278.

In general, walking away on the consuming side of a channel is tricky. Depending on the semantics of your processes, if the producer side of a channel isn't aware that a close! can happen from the consumer side, you might have to launch a draining operation.

(defn drain [c] (go (when (some? (<! c)) (recur))))

Golang disallows closing a read-only channel FWIW [2]

Better documentation is probably warranted.

[1] https://github.com/clojure/core.async/blob/master/src/main/clojure/clojure/core/async.clj#L680-L682
[2] http://golang.org/ref/spec#Close





Generated at Sun Nov 23 14:24:43 CST 2014 using JIRA 4.4#649-r158309.