[BUGS] SSH timeout

Tue Jan 1 20:04:44 EST 2008

Andrew Reilly wrote:
> On Mon, 31 Dec 2007 23:26:47 +1100
> Jerahmy Pocott <quakenet1 at optusnet.com.au> wrote:
> 
>> Hello all, and a happy new year!
>>
>> On this remote 4.2 stable box I'm having an issue with SSH timing out  
>> or something.. It seems to be when I'm running a long command where I  
>> don't provide any input for a few hours, like buildworld or a large  
>> tar, cp, etc. I come back to find the message:
>>
>> Read from remote host x.x.x.x: Connection reset by peer
>> Connection to x.x.x.x closed.
>>
>> Looking at the various configuration files there doesn't seem to be  
>> any timeout set, I was wondering if maybe some default is compiled in?  
>> Of course it could be a gateway closing the connection I guess.. But  
>> I'v never had this problem on older boxes.. Maybe I'm not looking in  
>> the right spot?
> 
> It hides, in the ssh_config man page, sort-of.  Try setting -o
> ServerAliveInterval to 30 or so (which means that ssh will send a
> message to the sshd server every 30 seconds, to see if it's
> alive (if nothing else has happened in the mean time)).  The
> default is 0, which means that this doesn't happen.  These
> messages are inside the encrypted channel, so firewalls and
> what-not should not be in a position to discard them, and so
> should keep the channel open.
> 

I remember striking this problem when I first started managing systems 
out on the Internet via ssh - I had only been using telnet on local 
systems before that.

"Connection reset by peer" suggested to me that the server was 
controlling the disconnect, so I figured that the place to fix this was 
the server (rather than the client as suggested by Andrew). After some 
research, I attacked /etc/sshd_config on the server. My standard hack on 
any server I manage is:

-#TCPKeepAlive yes
+TCPKeepAlive no
...
-#ClientAliveInterval 0
+ClientAliveInterval 5m

Check the man page for sshd_config for a description of those two knobs. 
I suggest that you read the blurb for TCPKeepAlive first. Setting the 
ClientAliveInterval to a large value like 5m means that a temporary 
network outage (up to 15 minutes) won't cause a disconnection of the 
client - but that's just my preference for my circumstances. The 
duration of ClientAliveInterval, and the value of ClientAliveCountMax 
(default 3) can obviously be seasoned to taste.

I have not had any ssh sessions disconnected since making these changes.

Andrew's solution is essentially the same (from the client side) for ssh 
keepalives, but my suggestion disables the problematic TCPKeepAlive as well.

>> My current work around has been to run any long processes in the  
>> background and pipe their output to a file, but this doesn't always  
>> capture all the messages, especially with things like buildworld..
> 
> Others have mentioned redirection of stderr as well, which should
> get everything.  Two other options you might want to try are
> explicitly for this purpose: nohup+script and screen.  Script
> should be in the base system, screen will be in ports.  They're a
> bit different, in that script will also record stdin (if any) in
> the typescript file, whereas screen maintains a "virtual
> terminal", server side.  It doesn't log anything, but it'll stick
> around if your session disconnects, and you can re-connect to it
> later.  The important part is that the job that you run from
> inside it doesn't get the HUP signal, and so doesn't stop.
> 
> Hope this helps.
> 
> Cheers and happy new year to you (all) too!
> 

New Year Greetings to all from me as well.

-- 
John Marshall