<< Back to previous view

[NREPL-30] Investigate hangs when server goes away Created: 14/Sep/12  Updated: 08/Oct/12  Resolved: 08/Oct/12

Status: Closed
Project: tools.nrepl
Component/s: None
Affects Version/s: 0.2.0-beta9, 0.2.0-beta10
Fix Version/s: 0.2.0-beta10

Type: Task Priority: Major
Reporter: Chas Emerick Assignee: Chas Emerick
Resolution: Completed Votes: 0
Labels: None


 Description   

See https://github.com/trptcolin/reply/issues/84#issuecomment-8481753 and maybe http://code.google.com/p/counterclockwise/issues/detail?id=405



 Comments   
Comment by Colin Jones [ 14/Sep/12 9:03 AM ]

If it helps, I did observe in some testing that after the server side of the socket goes down, writing to and flushing the client socket's output stream twice would bubble up a client exception. That seems a little keepalive-ish, but I'm not sure whether that sort of thing would do other transports any good.

Comment by Chas Emerick [ 03/Oct/12 12:37 PM ]

A fix for this is on master now, with commit 8a5dad2045434fcc06f2878de55f7dcdefa01a1b. There were a number of issues in the implementation of FnTransport that were preventing exceptions from bubbling up as they should have been.

No APIs were harmed in the course of implementing a fix. Also, no keepalive mechanism was required (which makes me slightly suspicious that the issue has actually been fixed, but we'll see). (Not TCP keepalive; that's irrelevant to the issue, and wouldn't help even if it were reliably standard/present.)

New tests added, and manual verification against nREPL servers running in different processes looks good too. I'll put out an 0.2.0-beta10 shortly so that people can start banging on this to see if it resolves issues in downstream projects/tools.

Note that I also found that it sometimes requires two sends (messages here, not writes to a socket) to provoke an error when writing to a transport that had been connected to a server that was closed or otherwise went away. Maybe it's a buffering issue? I don't have a good theory. Check out e.g. nrepl-test/transports-fail-on-disconnects if you want to play around with it.

Comment by Chas Emerick [ 04/Oct/12 5:06 PM ]

Note that the bencode transport should throw a java.net.SocketException when its connection is lost.

Comment by Chas Emerick [ 08/Oct/12 12:42 PM ]

Good reports received from downstream projects on the fix for this. Calling it good.

Generated at Wed Aug 20 19:23:15 CDT 2014 using JIRA 4.4#649-r158309.