I use renci.sshnet in a multi-threaded/multi-concurrent-connections environment. Under heavy testing we were seeing a leakage of around 1,500 event handles a minute. Under normal use, the leak was still evident; with heavy use causing the system to run out of resources and crash after weeks of continuous operations.
Initially it was puzzling why the dispose was not being called on several of the classes creating event handles for synchronization. With the hope of clarity I will divide the proposed code changes out, file by file, discussing only the fix rational relevant to the file being discussed:
> Using Revision 28765 - SshCommand.cs
The crux of the event handle leakage was caused by circular references between the various Session, Channel and Command objects. In a loose organization, the Session object contained the Channel objects and the Channel objects contained the Command objects. The circular references arose from the Channel and Command objects placing eventhandler callbacks onto the session object as well as the Command object registering eventhandlers on the Channel object. Even though the channel and command objects were no longer referenced from any explicit code scope,... the eventhandler registration on the Session and Channle objects kept them from being garbage collected. Likewise, the channel and command object's back references to the Session object kept the Session object from being garbage collected. No garbage collection, no dispose,... no dispose, no freeing of the event handles.
In the original code, the eventhandlers were removed from the Session object in the Channel and Command destructors,... which are never called from the garbage collector because of the circular references. Once the eventhandlers were removed from the Session object earlier in the life cycle of the Channel and Command objects, the garbage collector began calling the Channel and Command destructors, which called their respective __Dispose()__ methods to cleanup system resources.
Starting on line 144, remove the two eventhandler registrations to the Session object from the constructor:
```
internal SshCommand(Session session, string commandText)
{
if (session == null)
throw new ArgumentNullException("session");
if (commandText == null)
throw new ArgumentNullException("commandText");
this._session = session;
this.CommandText = commandText;
this.CommandTimeout = new TimeSpan(0, 0, 0, 0, -1);
//! -- Shorten the eventhandling window by moving these to the BeginExecute() function --
//! this._session.Disconnected += Session_Disconnected;
//! this._session.ErrorOccured += Session_ErrorOccured;
}
```
Instead of keeping the eventhandlers registered through the entire object's life, only keep them registered within the __Begin/EndExecute()__ operating window; starting on Line 215:
```
public IAsyncResult BeginExecute(AsyncCallback callback, object state)
{
this._session.Disconnected += Session_Disconnected;
this._session.ErrorOccured += Session_ErrorOccured;
```
A little more went into the companion __EndExecute()__ method than just unregistering the eventhandlers. First we wrapped the function code in a __try/finally__ block. The eventhandler unregistration is placed into the __finally__ section. Next we moved the _this.channel = null_ out of the __EndExecute()__ function as we found that the member was being used outside of the __Begin/EndExecute()__ scope. The explicit call to the __Dispose()__ on the _asyncResult_ object prior to it being dismissed (or set equal to null) was left in from a failed fix attempt because it still has the value of returning unused system resources ahead of garbage collection. And lastly, we added, to the __finally__ section the deregistration of Command object eventhandlers from the Channel object.
```
public string EndExecute(IAsyncResult asyncResult)
{
try
{
if (this._asyncResult == asyncResult && this._asyncResult != null)
{
lock (this._endExecuteLock)
{
if (this._asyncResult != null)
{
// Make sure that operation completed if not wait for it to finish
this.WaitHandle(this._asyncResult.AsyncWaitHandle);
if (this._channel.IsOpen)
{
this._channel.SendEof();
this._channel.Close();
}
this._asyncResult.AsyncWaitHandle.Dispose();
this._asyncResult = null;
return this.Result;
}
}
}
}
finally
{
if (this._channel != null)
{
this._channel.DataReceived -= Channel_DataReceived;
this._channel.ExtendedDataReceived -= Channel_ExtendedDataReceived;
this._channel.RequestReceived -= Channel_RequestReceived;
this._channel.Closed -= Channel_Closed;
}
this._session.Disconnected -= Session_Disconnected;
this._session.ErrorOccured -= Session_ErrorOccured;
}
throw new ArgumentException("Either the IAsyncResult object did not come from the corresponding async method on this type, or EndExecute was called multiple times with the same IAsyncResult.");
}
```
> _more for this file in Part III_
Comments: I think we took care of most the issues you mentioned.
I did not move the session event subscription to BeginExecute.
Leaving it in the ctor allows us to pass more meaningful information as to the cause of a disconnect in the near future. Moving it to BeginExecute would mean you can only throw an exception saying the session has disconnected.
I may still change my mind on this though.
Thanks a lot for diving into this!
Initially it was puzzling why the dispose was not being called on several of the classes creating event handles for synchronization. With the hope of clarity I will divide the proposed code changes out, file by file, discussing only the fix rational relevant to the file being discussed:
> Using Revision 28765 - SshCommand.cs
The crux of the event handle leakage was caused by circular references between the various Session, Channel and Command objects. In a loose organization, the Session object contained the Channel objects and the Channel objects contained the Command objects. The circular references arose from the Channel and Command objects placing eventhandler callbacks onto the session object as well as the Command object registering eventhandlers on the Channel object. Even though the channel and command objects were no longer referenced from any explicit code scope,... the eventhandler registration on the Session and Channle objects kept them from being garbage collected. Likewise, the channel and command object's back references to the Session object kept the Session object from being garbage collected. No garbage collection, no dispose,... no dispose, no freeing of the event handles.
In the original code, the eventhandlers were removed from the Session object in the Channel and Command destructors,... which are never called from the garbage collector because of the circular references. Once the eventhandlers were removed from the Session object earlier in the life cycle of the Channel and Command objects, the garbage collector began calling the Channel and Command destructors, which called their respective __Dispose()__ methods to cleanup system resources.
Starting on line 144, remove the two eventhandler registrations to the Session object from the constructor:
```
internal SshCommand(Session session, string commandText)
{
if (session == null)
throw new ArgumentNullException("session");
if (commandText == null)
throw new ArgumentNullException("commandText");
this._session = session;
this.CommandText = commandText;
this.CommandTimeout = new TimeSpan(0, 0, 0, 0, -1);
//! -- Shorten the eventhandling window by moving these to the BeginExecute() function --
//! this._session.Disconnected += Session_Disconnected;
//! this._session.ErrorOccured += Session_ErrorOccured;
}
```
Instead of keeping the eventhandlers registered through the entire object's life, only keep them registered within the __Begin/EndExecute()__ operating window; starting on Line 215:
```
public IAsyncResult BeginExecute(AsyncCallback callback, object state)
{
this._session.Disconnected += Session_Disconnected;
this._session.ErrorOccured += Session_ErrorOccured;
```
A little more went into the companion __EndExecute()__ method than just unregistering the eventhandlers. First we wrapped the function code in a __try/finally__ block. The eventhandler unregistration is placed into the __finally__ section. Next we moved the _this.channel = null_ out of the __EndExecute()__ function as we found that the member was being used outside of the __Begin/EndExecute()__ scope. The explicit call to the __Dispose()__ on the _asyncResult_ object prior to it being dismissed (or set equal to null) was left in from a failed fix attempt because it still has the value of returning unused system resources ahead of garbage collection. And lastly, we added, to the __finally__ section the deregistration of Command object eventhandlers from the Channel object.
```
public string EndExecute(IAsyncResult asyncResult)
{
try
{
if (this._asyncResult == asyncResult && this._asyncResult != null)
{
lock (this._endExecuteLock)
{
if (this._asyncResult != null)
{
// Make sure that operation completed if not wait for it to finish
this.WaitHandle(this._asyncResult.AsyncWaitHandle);
if (this._channel.IsOpen)
{
this._channel.SendEof();
this._channel.Close();
}
this._asyncResult.AsyncWaitHandle.Dispose();
this._asyncResult = null;
return this.Result;
}
}
}
}
finally
{
if (this._channel != null)
{
this._channel.DataReceived -= Channel_DataReceived;
this._channel.ExtendedDataReceived -= Channel_ExtendedDataReceived;
this._channel.RequestReceived -= Channel_RequestReceived;
this._channel.Closed -= Channel_Closed;
}
this._session.Disconnected -= Session_Disconnected;
this._session.ErrorOccured -= Session_ErrorOccured;
}
throw new ArgumentException("Either the IAsyncResult object did not come from the corresponding async method on this type, or EndExecute was called multiple times with the same IAsyncResult.");
}
```
> _more for this file in Part III_
Comments: I think we took care of most the issues you mentioned.
I did not move the session event subscription to BeginExecute.
Leaving it in the ctor allows us to pass more meaningful information as to the cause of a disconnect in the near future. Moving it to BeginExecute would mean you can only throw an exception saying the session has disconnected.
I may still change my mind on this though.
Thanks a lot for diving into this!