Thread leak in GridContinuousProcessor?

classic Classic list List threaded Threaded
4 messages Options
Mirko Raner Mirko Raner
Reply | Threaded
Open this post in threaded view
|

Thread leak in GridContinuousProcessor?

Hi all,

we're having some issues with a lot of continuous-buffer-checker threads sticking around, which eventually causes our server to run out of thread handles. When we are no longer interested in updates from a continuous query we call .close() on the corresponding QueryCursor.
However, we keep accumulating more and more continuous-buffer-checker threads that essentially do nothing other than hogging system resources. I did some initial debugging and found out that GridContinuousProcessor creates and starts these threads and also stores them in a Collection called "threads". As far as I can tell, GridContinuousProcessor only adds to this collection, but never removes anything from it. So, at the very least, this looks like a memory leak to me, but if the threads complete their bodies the underlying system resources would still get released. Unfortunately, it seems that this is not the case, and the threads are still running. For the scenarios that I ran in the debugger, GridWorker.isCancelled() never returns true, and the thread body keeps looping after the sleep interval. Maybe the termination criterion should be isDone() instead? I also followed the code path that starts at closing the cursor. The call eventually passes on to onClose() in the anonymous GridCloseableAdapterIterator that is nested inside IgniteCacheProxy, where it calls the cancel() method on the underlying CacheQueryFuture. The future is in state DONE, and because of that onCancelled() returns false, and cancelQuery() is never called. As a result, the thread keeps looping forever.
Please let me know if my reasoning is incorrect or if I am making some wrong assumption. To me, it looks like that there is something wrong with the cancellation/termination mechanism.
After some basic usage scenarios the thread dump for our application shows 116 continuous-buffer-checker threads that are in Thread.sleep().
Other than calling QueryCursor.close(), are we missing some additional clean-up tasks that are necessary for the threads being reclaimed?
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Thread leak in GridContinuousProcessor?

Mirko Raner wrote
Hi all,

we're having some issues with a lot of continuous-buffer-checker threads sticking around, which eventually causes our server to run out of thread handles. When we are no longer interested in updates from a continuous query we call .close() on the corresponding QueryCursor.
However, we keep accumulating more and more continuous-buffer-checker threads that essentially do nothing other than hogging system resources. I did some initial debugging and found out that GridContinuousProcessor creates and starts these threads and also stores them in a Collection called "threads". As far as I can tell, GridContinuousProcessor only adds to this collection, but never removes anything from it. So, at the very least, this looks like a memory leak to me, but if the threads complete their bodies the underlying system resources would still get released. Unfortunately, it seems that this is not the case, and the threads are still running. For the scenarios that I ran in the debugger, GridWorker.isCancelled() never returns true, and the thread body keeps looping after the sleep interval. Maybe the termination criterion should be isDone() instead? I also followed the code path that starts at closing the cursor. The call eventually passes on to onClose() in the anonymous GridCloseableAdapterIterator that is nested inside IgniteCacheProxy, where it calls the cancel() method on the underlying CacheQueryFuture. The future is in state DONE, and because of that onCancelled() returns false, and cancelQuery() is never called. As a result, the thread keeps looping forever.
Please let me know if my reasoning is incorrect or if I am making some wrong assumption. To me, it looks like that there is something wrong with the cancellation/termination mechanism.
After some basic usage scenarios the thread dump for our application shows 116 continuous-buffer-checker threads that are in Thread.sleep().
Other than calling QueryCursor.close(), are we missing some additional clean-up tasks that are necessary for the threads being reclaimed?
Hi Mirko,

Great catch! These threads are really not cleaned up properly. I will fix it in the next release.

As a workaround I can suggest to set timeInterval to zero for now. That will work only if you're ok to live without time-based flushing, of course.

-Val
Mirko Raner Mirko Raner
Reply | Threaded
Open this post in threaded view
|

Re: Thread leak in GridContinuousProcessor?

Thanks for confirming so quickly, Val.

I don't think setting the time interval to zero will help in our case, the thread body will still keep on looping.
What is the schedule for the next release of Ignite? Could we download a nightly build (i.e., non-release) once the fix is committed? I guess as a last resort we can always build Ignite from source...
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Thread leak in GridContinuousProcessor?

Mirko,

If the timeInterval=0, this thread is never created, so no need to cleanup.

Anyway, I made the fix in the master branch. You will be able to download nightly build from here when it's ready: https://builds.apache.org/view/H-L/view/Ignite/job/Ignite-nightly/lastSuccessfulBuild/. Or you can checkout master and build yourself.

-Val