Ignite client stuck

classic Classic list List threaded Threaded
4 messages Options
mikle-a mikle-a
Reply | Threaded
Open this post in threaded view
|

Ignite client stuck

This post was updated on .
Hi guys!

I have ignite cluster of 3 nodes and couple of applications with enabled
client mode and default communication spi configuration. One morning I found
one of applications being stucked with following messages in the log:

2020-03-15 07:42:20.881 [grid-timeout-worker-#103]  ERROR o.a.ignite.internal.util.typedef.G:137 - Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=tcp-comm-worker, blockedFor=932s]
2020-03-15 07:42:20.882 [grid-timeout-worker-#103]  WARN  o.a.ignite.internal.util.typedef.G:127 - Thread [name="tcp-comm-worker-#1", id=164, state=WAITING, blockCnt=19, waitCnt=13020]

2020-03-15 07:42:20.882 [grid-timeout-worker-#103]  ERROR ROOT:137 - Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1584257208296]]]

Thread [name="tcp-comm-worker-#1", id=164, state=WAITING, blockCnt=19, waitCnt=13020]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
        at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2911)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.access$6000(TcpCommunicationSpi.java:271)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:4489)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4294)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2237)
        at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

I was able to connect to the server via ssh, no connectivity problems been
noticed. Restart helped.

What are possible reasons for such behavior? Could you please advice any
measures to prevent such situation in the future, some kind of timeouts
maybe?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Ignite client stuck

This post was updated on .
Hi,

Ignite checks that each critical worker thread is  is alive and updating
its heartbeat timestamp. If that is not the case, the worker will be
regarded as blocked and Ignite will print a message to the log file. The
period of inactivity is specified by the
IgniteConfiguration.systemWorkerBlockedTimeout property.

more info here:
https://apacheignite.readme.io/docs/critical-failures-handling#section-critical-workers-health-check

In your case a communication worker failed [threadName=tcp-comm-worker,
blockedFor=932s]

Check your network to make sure that all ports are open and all hosts
reachable.

Thanks, Alex







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mikle-a mikle-a
Reply | Threaded
Open this post in threaded view
|

Re: Ignite client stuck

Hi!

Thanks a lot for your reply.

I am absolutely sure that all ports are open and hosts are reachable
because:
1) It had been working before
2) I was able to ping all other nodes from the client host while it was
stucked
3) It started working after restart

Any ideas?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite client stuck

Do you have server logs for the period of time when you were observing
"Blocked system-critical thread has been detected" error on the client?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/