[NREPL-30] Investigate hangs when server goes away Created: 14/Sep/12 Updated: 08/Oct/12 Resolved: 08/Oct/12 |
|
| Status: | Closed |
| Project: | tools.nrepl |
| Component/s: | None |
| Affects Version/s: | 0.2.0-beta9, 0.2.0-beta10 |
| Fix Version/s: | 0.2.0-beta10 |
| Type: | Task | Priority: | Major |
| Reporter: | Chas Emerick | Assignee: | Chas Emerick |
| Resolution: | Completed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
See https://github.com/trptcolin/reply/issues/84#issuecomment-8481753 and maybe http://code.google.com/p/counterclockwise/issues/detail?id=405 |
| Comments |
| Comment by Colin Jones [ 14/Sep/12 9:03 AM ] |
|
If it helps, I did observe in some testing that after the server side of the socket goes down, writing to and flushing the client socket's output stream twice would bubble up a client exception. That seems a little keepalive-ish, but I'm not sure whether that sort of thing would do other transports any good. |
| Comment by Chas Emerick [ 03/Oct/12 12:37 PM ] |
|
A fix for this is on master now, with commit 8a5dad2045434fcc06f2878de55f7dcdefa01a1b. There were a number of issues in the implementation of FnTransport that were preventing exceptions from bubbling up as they should have been. No APIs were harmed in the course of implementing a fix. Also, no keepalive mechanism was required (which makes me slightly suspicious that the issue has actually been fixed, but we'll see). (Not TCP keepalive; that's irrelevant to the issue, and wouldn't help even if it were reliably standard/present.) New tests added, and manual verification against nREPL servers running in different processes looks good too. I'll put out an 0.2.0-beta10 shortly so that people can start banging on this to see if it resolves issues in downstream projects/tools. Note that I also found that it sometimes requires two sends (messages here, not writes to a socket) to provoke an error when writing to a transport that had been connected to a server that was closed or otherwise went away. Maybe it's a buffering issue? I don't have a good theory. Check out e.g. nrepl-test/transports-fail-on-disconnects if you want to play around with it. |
| Comment by Chas Emerick [ 04/Oct/12 5:06 PM ] |
|
Note that the bencode transport should throw a java.net.SocketException when its connection is lost. |
| Comment by Chas Emerick [ 08/Oct/12 12:42 PM ] |
|
Good reports received from downstream projects on the fix for this. Calling it good. |