Blocked system-critical thread has been detected

classic Classic list List threaded Threaded
2 messages Options
AravindJP AravindJP
Reply | Threaded
Open this post in threaded view
|

Blocked system-critical thread has been detected

I have Kubernetes Cluster (on GCP)  with Apache 2.8.1 (upgraded from 2.8.0 )
with  Gridgrain Control center installed. For last 1 weeks Ignite cluster
has 0 load (no read/write request to cluster) .  But I am seeing below
exception in my cluster node  with lot of threads in TIMED_WAITING, WAITING
STAGE, any clue why this behaviour occurs ?  This is happening 2nd time
without any load on cluster . Last week also I had same issue and restarted
the cluster and kept it idle to confirm this behaviour . I have uploaded
complete log also

here  logs-asia-ignite.gz
<http://apache-ignite-users.70518.x6.nabble.com/file/t2807/logs-asia-ignite.gz>  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Blocked system-critical thread has been detected

Hi,

Your log doesn't have the full thread dumps and I can't find some
information (e.g Topology Snapshots). However, I see that checkpoint thread
was blocked for a long time:

[02:45:50,849][SEVERE][tcp-disco-msg-worker-[3dac150e
10.20.4.18:47500]-#2][G] Blocked system-critical thread has been detected.
This can lead to cluster-wide undefined behaviour
[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#54,
blockedFor=172s]

But I see that it blocked not longer then 3 minutes.

I guess that checkpoint lock can't be taken until some other operation will
not be timeout. It can be some network related timeout or some operation
timeout.

So please check your configuration and find where you have 3 min timeout and
check what is related to this timeout.

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/