<< Back to previous view

[NREPL-10] SocketExceptions on transport/bencode when the other end has gone away Created: 19/Feb/12  Updated: 12/Oct/12  Resolved: 12/Oct/12

Status: Closed
Project: tools.nrepl
Component/s: None
Affects Version/s: None
Fix Version/s: 0.2.0, 0.2.0-RC1

Type: Defect Priority: Minor
Reporter: Colin Jones Assignee: Chas Emerick
Resolution: Completed Votes: 0
Labels: None


 Description   

When running the tests, I'm occasionally getting SocketExceptions (which don't fail the test suite) with the message "Broken pipe". I think these stem from one end of the nREPL socket trying to write to the socket after the other end has gone away.

There's an existing test (ensure-closeable) that explicitly expects a SocketException to be raised if the socket has had .close called directly on it, rather than having the other end hang up, which I think is what the "Broken pipe" tells us. I'm curious as to whether this is important for some reason I haven't seen, or whether catching all SocketExceptions on transport/bencode's write function and returning nil would be appropriate.

If the end doing the socket closing actually should affect the behavior, we could sniff the exception message, but that feels a bit yucky.

I can actually reproduce this in "real life" by firing up an nREPL server / client (`reply --nrepl --port 9999`), then attaching to it (`reply --nrepl --attach 9999`), and in the attached client, run (Thread/sleep 10000) and immediately kill the process. Then the "Broken pipe" exception shows up in the original REPL with the server in the same process.

I've got a patch for this in my bitbucket fork (socket-exception branch): https://bitbucket.org/trptcolin/nrepl/changeset/4a432f3ce95a



 Comments   
Comment by Chas Emerick [ 21/Feb/12 5:35 AM ]

What do we really want to happen when a SocketException (or really, any error) occurs?

I'm all for better error handling/reporting, but it shouldn't obfuscate what's going wrong. Catching and squelching isn't doing anyone any favors. That could be particularly confusing if done at the transport level. "You mean all the messages I've been sending have been dropped on the floor silently, and I was supposed to be checking for a nil return?"

Comment by Colin Jones [ 26/Feb/12 12:46 PM ]

Yeah, you're right. Although, isn't checking for a nil return already necessary to account for timeouts?

What about having a server-side transport error handler that's rebindable, and re-throws by default? Or maybe we could collect errors somewhere, similar to what clojure.core/agent-error does.

The more I think about it, the less important this seems, though - mainly an annoyance when running tests.

Comment by Chas Emerick [ 12/Oct/12 12:51 AM ]

Was simply a disconnection stacktrace due to a client closing up prior to an evaluation response being written to the transport.

Fixed in https://github.com/clojure/tools.nrepl/commit/2d90bdd4fe14298bbfcaab52ecdea48781c71604

Generated at Fri Oct 24 03:37:45 CDT 2014 using JIRA 4.4#649-r158309.