What happens when a client gets disconnected

classic Classic list List threaded Threaded
2 messages Options
MattNohelty MattNohelty
Reply | Threaded
Open this post in threaded view
|

What happens when a client gets disconnected

This post was updated on .
Sorry for the long delay in responding to this issue (http://apache-ignite-users.70518.x6.nabble.com/What-happens-when-a-client-gets-disconnected-tc27959.html).  I will work on
replicating this issue in a more controlled test environment and try to
grab thread dumps from there.

In a previous post you mentioned that the blocking in this thread dump
should only happen when a data node is affected which is usually a server
node and you also said that near cache consistency is observed
continuously.  If we have near caching enabled, does that mean clients
become data nodes?  If that's the case, does that explain why we are seeing
blocking when a client crashes or hangs?

Assuming this is related to near caching, is there any configuration to
adjust this behavior to give us availability over perfect consistency?
Having a failure on one client ripple across the entire system and
effectively take down all other clients of that cluster is a major problem.
We obviously want to avoid problems like an OOM error or a big GC pause in
the client application but if these things happen we need to be able to
absorb these gracefully and limit the blast radius to just that client
node.
aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: What happens when a client gets disconnected

Hi,

I guess that you should provide the full client and server logs,
configuration files and reproducer if it's possible for case when the
client node with near cache was able to crush the whole cluster.

Looks like it can be the issue here and the best way will be raise the
JIRA ticket for it after analyze of provided data.

BR,
Andrei

On 2019/07/31 14:54:42, Matt Nohelty <[hidden email]> wrote:
 > Sorry for the long delay in responding to this issue. I will work on>
 > replicating this issue in a more controlled test environment and try to>
 > grab thread dumps from there.>
 >
 > In a previous post you mentioned that the blocking in this thread dump>
 > should only happen when a data node is affected which is usually a
server>
 > node and you also said that near cache consistency is observed>
 > continuously. If we have near caching enabled, does that mean clients>
 > become data nodes? If that's the case, does that explain why we are
seeing>
 > blocking when a client crashes or hangs?>
 >
 > Assuming this is related to near caching, is there any configuration to>
 > adjust this behavior to give us availability over perfect consistency?>
 > Having a failure on one client ripple across the entire system and>
 > effectively take down all other clients of that cluster is a major
problem.>
 > We obviously want to avoid problems like an OOM error or a big GC
pause in>
 > the client application but if these things happen we need to be able to>
 > absorb these gracefully and limit the blast radius to just that client>
 > node.>
 >