tools.nrepl

Investigate hangs when server goes away

Details

  • Type: Task Task
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: 0.2.0-beta9, 0.2.0-beta10
  • Fix Version/s: 0.2.0-beta10
  • Component/s: None
  • Labels:
    None

Activity

Hide
Colin Jones added a comment -

If it helps, I did observe in some testing that after the server side of the socket goes down, writing to and flushing the client socket's output stream twice would bubble up a client exception. That seems a little keepalive-ish, but I'm not sure whether that sort of thing would do other transports any good.

Show
Colin Jones added a comment - If it helps, I did observe in some testing that after the server side of the socket goes down, writing to and flushing the client socket's output stream twice would bubble up a client exception. That seems a little keepalive-ish, but I'm not sure whether that sort of thing would do other transports any good.
Hide
Chas Emerick added a comment -

A fix for this is on master now, with commit 8a5dad2045434fcc06f2878de55f7dcdefa01a1b. There were a number of issues in the implementation of FnTransport that were preventing exceptions from bubbling up as they should have been.

No APIs were harmed in the course of implementing a fix. Also, no keepalive mechanism was required (which makes me slightly suspicious that the issue has actually been fixed, but we'll see). (Not TCP keepalive; that's irrelevant to the issue, and wouldn't help even if it were reliably standard/present.)

New tests added, and manual verification against nREPL servers running in different processes looks good too. I'll put out an 0.2.0-beta10 shortly so that people can start banging on this to see if it resolves issues in downstream projects/tools.

Note that I also found that it sometimes requires two sends (messages here, not writes to a socket) to provoke an error when writing to a transport that had been connected to a server that was closed or otherwise went away. Maybe it's a buffering issue? I don't have a good theory. Check out e.g. nrepl-test/transports-fail-on-disconnects if you want to play around with it.

Show
Chas Emerick added a comment - A fix for this is on master now, with commit 8a5dad2045434fcc06f2878de55f7dcdefa01a1b. There were a number of issues in the implementation of FnTransport that were preventing exceptions from bubbling up as they should have been. No APIs were harmed in the course of implementing a fix. Also, no keepalive mechanism was required (which makes me slightly suspicious that the issue has actually been fixed, but we'll see). (Not TCP keepalive; that's irrelevant to the issue, and wouldn't help even if it were reliably standard/present.) New tests added, and manual verification against nREPL servers running in different processes looks good too. I'll put out an 0.2.0-beta10 shortly so that people can start banging on this to see if it resolves issues in downstream projects/tools. Note that I also found that it sometimes requires two sends (messages here, not writes to a socket) to provoke an error when writing to a transport that had been connected to a server that was closed or otherwise went away. Maybe it's a buffering issue? I don't have a good theory. Check out e.g. nrepl-test/transports-fail-on-disconnects if you want to play around with it.
Hide
Chas Emerick added a comment -

Note that the bencode transport should throw a java.net.SocketException when its connection is lost.

Show
Chas Emerick added a comment - Note that the bencode transport should throw a java.net.SocketException when its connection is lost.
Hide
Chas Emerick added a comment -

Good reports received from downstream projects on the fix for this. Calling it good.

Show
Chas Emerick added a comment - Good reports received from downstream projects on the fix for this. Calling it good.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: