partition-exchanger system-critical thread blocked

classic Classic list List threaded Threaded
16 messages Options
Kamlesh Joshi Kamlesh Joshi
Reply | Threaded
Open this post in threaded view
|

partition-exchanger system-critical thread blocked

Hi Igniters,

 

Sometimes 'Blocked system-critical thread' errors are getting printed in prod cluster logs. As per below logs, it's saying exchange-worker blocked for 60s. There is no node join/left or cluster activation or cache creation, still why partition exchange is triggered. Even it is triggered,  it is blocked for 60s which is huge time from prod perspective.

 

Below are error details,

[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=partition-exchanger, blockedFor=60s]

[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, blockCnt=16, waitCnt=2469961]

 

Cluster is responding but these errors priting, we are not understanding what's the cause, could you please help us.

 

Below is log snippet,

 

[2020-11-09T10:51:21,458][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, startPtr=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], checkpointLockWait=0ms, checkpointLockHoldTime=34ms, walCpRecordFsyncDuration=7ms, pages=84152, reason='timeout']

[2020-11-09T10:51:23,499][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, pages=84152, markPos=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], walSegmentsCleared=5, walSegmentsCovered=[1279658 - 1279663], markDuration=79ms, pagesWrite=1195ms, fsync=845ms, total=2119ms]

[2020-11-09T10:51:36,788][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279664, segIdx=4, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]

[2020-11-09T10:51:36,954][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]

[2020-11-09T10:52:02,018][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279665, segIdx=5, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]

[2020-11-09T10:52:02,200][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]

[2020-11-09T10:52:36,541][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279666, segIdx=6, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]

[2020-11-09T10:52:36,703][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]

[2020-11-09T10:53:11,068][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279667, segIdx=7, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]

[2020-11-09T10:53:11,239][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]

[2020-11-09T10:53:45,694][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279668, segIdx=8, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]

[2020-11-09T10:53:45,868][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]

[2020-11-09T10:54:21,411][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279669, segIdx=9, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]

[2020-11-09T10:54:21,480][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=33221fc2-5ab0-4929-9d23-8940e0552272, startPtr=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], checkpointLockWait=0ms, checkpointLockHoldTime=51ms, walCpRecordFsyncDuration=5ms, pages=86602, reason='timeout']

[2020-11-09T10:54:21,578][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]

[2020-11-09T10:54:23,624][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=33221fc2-5ab0-4929-9d23-8940e0552272, pages=86602, markPos=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], walSegmentsCleared=6, walSegmentsCovered=[1279664 - 1279669], markDuration=94ms, pagesWrite=1219ms, fsync=925ms, total=2238ms]

[2020-11-09T10:54:45,727][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279670, segIdx=0, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]

[2020-11-09T10:54:45,886][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]

[2020-11-09T10:55:17,146][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279671, segIdx=1, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]

[2020-11-09T10:55:17,343][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]

[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=partition-exchanger, blockedFor=60s]

[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, blockCnt=16, waitCnt=2469961]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@223507fb, ownerName=null, ownerId=-1]

 

[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, finished=false, heartbeatTs=1604899458881]]]

org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, finished=false, heartbeatTs=1604899458881]

                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.7.6.jar:2.7.6]

                at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

[2020-11-09T10:55:18,894][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] No deadlocked threads detected.

[2020-11-09T10:55:21,404][WARN ][jvm-pause-detector-worker][IgniteKernal%EDIFCustomerCC] Possible too long JVM pause: 2477 milliseconds.

[2020-11-09T10:55:21,422][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] Thread dump at 2020/11/09 10:55:21 IST

Thread [name="sys-#213675%EDIFCustomerCC%", id=369652, state=TIMED_WAITING, blockCnt=0, waitCnt=1]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, ownerName=null, ownerId=-1]

        at sun.misc.Unsafe.park(Native Method)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

Thread [name="sys-#213674%EDIFCustomerCC%", id=369651, state=TIMED_WAITING, blockCnt=0, waitCnt=1]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, ownerName=null, ownerId=-1]

        at sun.misc.Unsafe.park(Native Method)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

 

Thanks and Regards,

Kamlesh Joshi

 


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

Can you please find an actual stack trace from partition-exchanger thread in that log?

One that starts with Thread [name="partition-exchanger" ?

Regards,
--
Ilya Kasnacheev


вт, 10 нояб. 2020 г. в 15:54, Kamlesh Joshi <[hidden email]>:

Hi Igniters,

 

Sometimes 'Blocked system-critical thread' errors are getting printed in prod cluster logs. As per below logs, it's saying exchange-worker blocked for 60s. There is no node join/left or cluster activation or cache creation, still why partition exchange is triggered. Even it is triggered,  it is blocked for 60s which is huge time from prod perspective.

 

Below are error details,

[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=partition-exchanger, blockedFor=60s]

[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, blockCnt=16, waitCnt=2469961]

 

Cluster is responding but these errors priting, we are not understanding what's the cause, could you please help us.

 

Below is log snippet,

 

[2020-11-09T10:51:21,458][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, startPtr=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], checkpointLockWait=0ms, checkpointLockHoldTime=34ms, walCpRecordFsyncDuration=7ms, pages=84152, reason='timeout']

[2020-11-09T10:51:23,499][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=3bd28c8f-d9c9-4110-ab8c-24cf3a5c44a3, pages=84152, markPos=FileWALPointer [idx=1279664, fileOff=28148165, len=49557], walSegmentsCleared=5, walSegmentsCovered=[1279658 - 1279663], markDuration=79ms, pagesWrite=1195ms, fsync=845ms, total=2119ms]

[2020-11-09T10:51:36,788][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279664, segIdx=4, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]

[2020-11-09T10:51:36,954][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000004.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279664.wal]

[2020-11-09T10:52:02,018][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279665, segIdx=5, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]

[2020-11-09T10:52:02,200][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000005.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279665.wal]

[2020-11-09T10:52:36,541][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279666, segIdx=6, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]

[2020-11-09T10:52:36,703][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000006.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279666.wal]

[2020-11-09T10:53:11,068][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279667, segIdx=7, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]

[2020-11-09T10:53:11,239][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000007.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279667.wal]

[2020-11-09T10:53:45,694][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279668, segIdx=8, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]

[2020-11-09T10:53:45,868][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000008.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279668.wal]

[2020-11-09T10:54:21,411][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279669, segIdx=9, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]

[2020-11-09T10:54:21,480][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=33221fc2-5ab0-4929-9d23-8940e0552272, startPtr=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], checkpointLockWait=0ms, checkpointLockHoldTime=51ms, walCpRecordFsyncDuration=5ms, pages=86602, reason='timeout']

[2020-11-09T10:54:21,578][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000009.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279669.wal]

[2020-11-09T10:54:23,624][INFO ][db-checkpoint-thread-#384%EDIFCustomerCC%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=33221fc2-5ab0-4929-9d23-8940e0552272, pages=86602, markPos=FileWALPointer [idx=1279670, fileOff=5561792, len=49557], walSegmentsCleared=6, walSegmentsCovered=[1279664 - 1279669], markDuration=94ms, pagesWrite=1219ms, fsync=925ms, total=2238ms]

[2020-11-09T10:54:45,727][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279670, segIdx=0, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]

[2020-11-09T10:54:45,886][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000000.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279670.wal]

[2020-11-09T10:55:17,146][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=1279671, segIdx=1, origFile=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, dstFile=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]

[2020-11-09T10:55:17,343][INFO ][wal-file-archiver%EDIFCustomerCC-#345%EDIFCustomerCC%][FileWriteAheadLogManager] Copied file [src=/datastore1/wal/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000000000001.wal, dst=/datastore1/archive/node00-651e4920-2fb4-4dd5-9258-52a9f359ac35/0000000001279671.wal]

[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=partition-exchanger, blockedFor=60s]

[2020-11-09T10:55:18,888][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread [name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING, blockCnt=16, waitCnt=2469961]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@223507fb, ownerName=null, ownerId=-1]

 

[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, finished=false, heartbeatTs=1604899458881]]]

org.apache.ignite.IgniteException: GridWorker [name=partition-exchanger, igniteInstanceName=EDIFCustomerCC, finished=false, heartbeatTs=1604899458881]

                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:513) [ignite-core-2.7.6.jar:2.7.6]

                at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.7.6.jar:2.7.6]

                at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

[2020-11-09T10:55:18,894][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] No deadlocked threads detected.

[2020-11-09T10:55:21,404][WARN ][jvm-pause-detector-worker][IgniteKernal%EDIFCustomerCC] Possible too long JVM pause: 2477 milliseconds.

[2020-11-09T10:55:21,422][WARN ][sys-stripe-118-#119%EDIFCustomerCC%][FailureProcessor] Thread dump at 2020/11/09 10:55:21 IST

Thread [name="sys-#213675%EDIFCustomerCC%", id=369652, state=TIMED_WAITING, blockCnt=0, waitCnt=1]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, ownerName=null, ownerId=-1]

        at sun.misc.Unsafe.park(Native Method)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

Thread [name="sys-#213674%EDIFCustomerCC%", id=369651, state=TIMED_WAITING, blockCnt=0, waitCnt=1]

    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42c969fd, ownerName=null, ownerId=-1]

        at sun.misc.Unsafe.park(Native Method)

        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)

        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)

        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

 

 

Thanks and Regards,

Kamlesh Joshi

 


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

HI Ilya Kasnacheev,

Below are log entries with  the tread name 'partition-exchanger'

############################

Line 41311:
[2020-11-09T06:44:13,605][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
                Line 41315:
[2020-11-09T06:44:13,606][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]]]
                Line 41316: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]
                Line 47325:
[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
                Line 47329:
[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]]]
                Line 47330: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]
#################################################


Below is the stack trace for 'exchange-worker-#344' worker thread,

#################################################
Line 41109: [2020-11-09T04:11:42,068][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2791, minorTopVer=0], force=false, evt=NODE_JOINED,
node=f11cfea2-5ece-4867-b276-e94fa8458f47]
                Line 41116: [2020-11-09T04:11:52,060][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=5c548b34-defa-4fcd-9bc7-364a4fbec8da, customEvt=null,
allowMerge=true]
                Line 41117: [2020-11-09T04:11:52,062][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2792,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
err=null]
                Line 41118: [2020-11-09T04:11:52,066][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=5c548b34-defa-4fcd-9bc7-364a4fbec8da, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1, 2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0],
sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0:0, AA.BB.CC.DD/AA.BB.CC.DD:0],
discPort=0, order=2792, intOrder=1527, lastExchangeTime=1604875311250,
loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
durationFromInit=0]
                Line 41119: [2020-11-09T04:11:52,067][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0], crd=false]
                Line 41120: [2020-11-09T04:11:52,072][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5c548b34-defa-4fcd-9bc7-364a4fbec8da]
                Line 41124: [2020-11-09T04:11:52,451][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=076584a1-154f-4e3c-8f45-6ee096a71868, customEvt=null,
allowMerge=true]
                Line 41125: [2020-11-09T04:11:52,453][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2793,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
err=null]
                Line 41126: [2020-11-09T04:11:52,458][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2793, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=076584a1-154f-4e3c-8f45-6ee096a71868, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1, 2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0],
sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0:0, AA.BB.CC.DD/AA.BB.CC.DD:0],
discPort=0, order=2793, intOrder=1528, lastExchangeTime=1604875311403,
loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
durationFromInit=10]
                Line 41127: [2020-11-09T04:11:52,458][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0], crd=false]
                Line 41128: [2020-11-09T04:11:52,472][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2793, minorTopVer=0], force=false, evt=NODE_JOINED,
node=076584a1-154f-4e3c-8f45-6ee096a71868]
                Line 41136: [2020-11-09T04:12:13,569][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21, customEvt=null,
allowMerge=true]
                Line 41137: [2020-11-09T04:12:13,571][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2794,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0],
err=null]
                Line 41138: [2020-11-09T04:12:13,575][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2794, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1], sockAddrs=[AA.BB.CC.DD/AA.BB.CC.DD:0,
0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=2794, intOrder=1529,
lastExchangeTime=1604875332557, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=2794,
minorTopVer=0], durationFromInit=0]
                Line 41139: [2020-11-09T04:12:13,576][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0], crd=false]
                Line 41140: [2020-11-09T04:12:13,586][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2794, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21]
                Line 41312: [2020-11-09T06:44:13,606][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469710]
                Line 43302: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=RUNNABLE,
blockCnt=16, waitCnt=2469710]
                Line 47326: [2020-11-09T10:55:18,888][WARN
][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]
                Line 49473: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]

Thanks and Regards,
Gangaiah



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

We need to see all the lines after the following (but before the next Thread):

                Line 41312: [2020-11-09T06:44:13,606][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469710]
                Line 43302: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=RUNNABLE,
blockCnt=16, waitCnt=2469710]
                Line 47326: [2020-11-09T10:55:18,888][WARN
][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]
                Line 49473: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]

Regards,
--
Ilya Kasnacheev


чт, 12 нояб. 2020 г. в 19:57, Gangaiah Gundeboina <[hidden email]>:
HI Ilya Kasnacheev,

Below are log entries with  the tread name 'partition-exchanger'

############################

Line 41311:
[2020-11-09T06:44:13,605][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
                Line 41315:
[2020-11-09T06:44:13,606][ERROR][tcp-disco-msg-worker-#2%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]]]
                Line 41316: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604884393601]
                Line 47325:
[2020-11-09T10:55:18,888][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=partition-exchanger,
blockedFor=60s]
                Line 47329:
[2020-11-09T10:55:18,889][ERROR][sys-stripe-118-#119%EDIFCustomerCC%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]]]
                Line 47330: org.apache.ignite.IgniteException: GridWorker
[name=partition-exchanger, igniteInstanceName=EDIFCustomerCC,
finished=false, heartbeatTs=1604899458881]
#################################################


Below is the stack trace for 'exchange-worker-#344' worker thread,

#################################################
Line 41109: [2020-11-09T04:11:42,068][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2791, minorTopVer=0], force=false, evt=NODE_JOINED,
node=f11cfea2-5ece-4867-b276-e94fa8458f47]
                Line 41116: [2020-11-09T04:11:52,060][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=5c548b34-defa-4fcd-9bc7-364a4fbec8da, customEvt=null,
allowMerge=true]
                Line 41117: [2020-11-09T04:11:52,062][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2792,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
err=null]
                Line 41118: [2020-11-09T04:11:52,066][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=5c548b34-defa-4fcd-9bc7-364a4fbec8da, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1, 2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0],
sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0:0, AA.BB.CC.DD/AA.BB.CC.DD:0],
discPort=0, order=2792, intOrder=1527, lastExchangeTime=1604875311250,
loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0],
durationFromInit=0]
                Line 41119: [2020-11-09T04:11:52,067][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2792, minorTopVer=0], crd=false]
                Line 41120: [2020-11-09T04:11:52,072][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2792, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5c548b34-defa-4fcd-9bc7-364a4fbec8da]
                Line 41124: [2020-11-09T04:11:52,451][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=076584a1-154f-4e3c-8f45-6ee096a71868, customEvt=null,
allowMerge=true]
                Line 41125: [2020-11-09T04:11:52,453][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2793,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
err=null]
                Line 41126: [2020-11-09T04:11:52,458][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2793, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=076584a1-154f-4e3c-8f45-6ee096a71868, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1, 2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0],
sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
2405:200:a10:fc05:a2d3:c1ff:fef2:25a0%eth0:0, AA.BB.CC.DD/AA.BB.CC.DD:0],
discPort=0, order=2793, intOrder=1528, lastExchangeTime=1604875311403,
loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0],
durationFromInit=10]
                Line 41127: [2020-11-09T04:11:52,458][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2793, minorTopVer=0], crd=false]
                Line 41128: [2020-11-09T04:11:52,472][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2793, minorTopVer=0], force=false, evt=NODE_JOINED,
node=076584a1-154f-4e3c-8f45-6ee096a71868]
                Line 41136: [2020-11-09T04:12:13,569][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=08260e5f-ae8d-44f2-b10a-dc3490133ee8,
crdVer=1602946449957, topVer=AffinityTopologyVersion [topVer=6,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_JOINED,
evtNode=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21, customEvt=null,
allowMerge=true]
                Line 41137: [2020-11-09T04:12:13,571][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=2794,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0],
err=null]
                Line 41138: [2020-11-09T04:12:13,575][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=30b55ea5-18a4-4c15-b45f-7fe420ac00bd,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2794, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21, addrs=[0:0:0:0:0:0:0:1%lo,
AA.BB.CC.DD, 127.0.0.1], sockAddrs=[AA.BB.CC.DD/AA.BB.CC.DD:0,
0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=2794, intOrder=1529,
lastExchangeTime=1604875332557, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=2794,
minorTopVer=0], durationFromInit=0]
                Line 41139: [2020-11-09T04:12:13,576][INFO
][exchange-worker-#344%EDIFCustomerCC%][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=2794, minorTopVer=0], crd=false]
                Line 41140: [2020-11-09T04:12:13,586][INFO
][exchange-worker-#344%EDIFCustomerCC%][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=2794, minorTopVer=0], force=false, evt=NODE_JOINED,
node=5ee5d2c6-76d6-4aba-a160-d084d3ad9e21]
                Line 41312: [2020-11-09T06:44:13,606][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469710]
                Line 43302: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=RUNNABLE,
blockCnt=16, waitCnt=2469710]
                Line 47326: [2020-11-09T10:55:18,888][WARN
][sys-stripe-118-#119%EDIFCustomerCC%][G] Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]
                Line 49473: Thread
[name="exchange-worker-#344%EDIFCustomerCC%", id=391, state=TIMED_WAITING,
blockCnt=16, waitCnt=2469961]

Thanks and Regards,
Gangaiah



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

Please find attached entire log file.

Regards,
Gangaiah

18.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2396/18.zip>  



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

For the first 4 thread dump, the problem was in establishing communication connection to one of nodes:

[2020-11-09T02:31:41,105][WARN ][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Connect timed out (consider increasing 'failureDetectionTimeout' configuration property) [addr=/kub4.101:47100, failureDetectionTimeout=60000]
[2020-11-09T02:31:41,105][WARN ][tcp-comm-worker-#1%EDIFCustomerCC%][TcpCommunicationSpi] Failed to connect to a remote node (make sure that destination node is alive and operating system firewall is disabled on local and remote hosts) [addrs=[/kub4.101:47100, /127.0.0.1:47100]]


For the last 2, it's not obvious what happens there. This may be an occurrence of https://issues.apache.org/jira/browse/IGNITE-13540

Regards.
--
Ilya Kasnacheev


пт, 13 нояб. 2020 г. в 23:06, Gangaiah Gundeboina <[hidden email]>:
Hi Ilya,

Please find attached entire log file.

Regards,
Gangaiah

18.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2396/18.zip



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

Regarding exchange worker, the jira saying that 2.8.1 and 2.9 but we are
using 2.7.6 version. So it could be in 2.7.6 as well. But still I am not
understanding here is, there is no node join/left or cluster activation or
cache creation near to the error logging time. Don't know why PME triggered.


Regarding communication worker first four thread dump, if we see it's
blocked 110secs it's huge time. It's happening in production, it's should
not blocked this much time even we have given failure detection time out and
networktimeout 60 secs. I don't think there is network issue in prod, there
is something which blocked this one. We have around 300 thick clients
connected to the prod cluster.

Both are very important for us as it's in prod, any more pointers which
could have cause these issues.

Regards,
Gangaiah





-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

Again we got same issue 'tcp-comm-worker-#1' blocked. We have observed one
thing whenever multiple clients joining to cluster at same time, the error
is printing. I could see there is only one thread(tcp-comm-worker), checked
from logs and also from visualvm.

From documentation also, could see there is only one thread per node below
are details. Is it same thread?

#######################################################
Initialization of exchange means adding future
GridDhtPartitionsExchangeFuture to queue on each node.

Put to this queue (GridCachePartitionExchangeManager.ExchangeWorker#futQ) is
done from the discovery thread.

'Exchange worker' thread manages this queue. Using only one exchange worker
thread per node provides a strict processing order for futures.
########################################################

And also I heard cluster will kick the clients where more
transactions/atomic updates happening while waiting for partition-exchange
to complete. Is it could be reason, could you please help us to find out the
cause and fix the issue.

Thanks and Regards,
Gangaiah



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

Can you provide stack trace from that thread?

No, this is a different thread than exchange worker.

I don't think that cluster is supposed to fail clients in this case. We have timeout for transactions waiting on PME now.

Regards,
--
Ilya Kasnacheev


сб, 12 дек. 2020 г. в 10:23, Gangaiah Gundeboina <[hidden email]>:
Hi Ilya,

Again we got same issue 'tcp-comm-worker-#1' blocked. We have observed one
thing whenever multiple clients joining to cluster at same time, the error
is printing. I could see there is only one thread(tcp-comm-worker), checked
from logs and also from visualvm.

From documentation also, could see there is only one thread per node below
are details. Is it same thread?

#######################################################
Initialization of exchange means adding future
GridDhtPartitionsExchangeFuture to queue on each node.

Put to this queue (GridCachePartitionExchangeManager.ExchangeWorker#futQ) is
done from the discovery thread.

'Exchange worker' thread manages this queue. Using only one exchange worker
thread per node provides a strict processing order for futures.
########################################################

And also I heard cluster will kick the clients where more
transactions/atomic updates happening while waiting for partition-exchange
to complete. Is it could be reason, could you please help us to find out the
cause and fix the issue.

Thanks and Regards,
Gangaiah



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

We have configured atomictyMode 'ATOMIC' for all caches it's not
transactional, below is our cache configuration for one of the caches.

############### Cache Configuration ################

<property name="name" value="DeviceDetailsCache" />

                    <property name="cacheMode" value="PARTITIONED" />

                    <property name="atomicityMode" value="ATOMIC" />

                    <property name="backups" value="1" />

                    <property name="statisticsEnabled" value="true"/>

 

I have searched with word 'tcp-comm-worker' in whole log file, could see
single thread tcp-comm-worker-#1

 And also attached whole log file, please check  ignite.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2396/ignite.log>  

You can refer logs from 2020-12-11 12:50:00 to 12:54:00, we can see the
tread blocked while multiple clients trying to connect to cluster.

#############################

Line 2351: [2020-12-07T02:22:17,350][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1514, waitCnt=7411]

                Line 4507: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1514, waitCnt=7411]

                Line 50755: [2020-12-10T05:14:53,287][WARN
][sys-stripe-97-#98%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1556, waitCnt=7861]

                Line 54210: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1556, waitCnt=7861]

                Line 60084: [2020-12-10T14:30:33,360][WARN
][sys-stripe-33-#34%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 62467: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 69864: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 76870: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 84077: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 90968: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 97851: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 105066: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 113229: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 120851: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 124129: [2020-12-10T14:32:36,828][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 128469: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 136378: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 144518: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 150128: [2020-12-10T14:34:40,071][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 155898: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 164791: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 168214: [2020-12-10T14:35:43,175][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 172241: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 175537: [2020-12-10T14:37:46,346][WARN
][sys-stripe-64-#65%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 179032: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 190529: [2020-12-11T12:52:19,275][WARN
][sys-stripe-111-#112%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 194008: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 197166: [2020-12-11T12:54:26,461][WARN
][sys-stripe-28-#29%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 200639: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 203939: [2020-12-11T12:56:31,000][WARN
][sys-stripe-92-#93%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 207692: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 211113: [2020-12-11T12:59:38,095][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 214772: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 218112: [2020-12-11T13:01:41,252][WARN
][sys-stripe-28-#29%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 221990: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 225629: [2020-12-11T13:03:45,522][WARN
][sys-stripe-47-#48%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 229948: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

 

###########################################



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

Thread [name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE, blockCnt=1570, waitCnt=8053]
        at sun.nio.ch.Net.poll(Native Method)
        at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
        - locked java.lang.Object@77988a56
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
        - locked java.lang.Object@545b55af
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.access$6000(TcpCommunicationSpi.java:271)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:4489)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4294)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2237)
        at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

This and other messages in the log point to connectivity problems between your nodes.

Regards,
--
Ilya Kasnacheev


пн, 14 дек. 2020 г. в 21:07, Gangaiah Gundeboina <[hidden email]>:
Hi Ilya,

We have configured atomictyMode 'ATOMIC' for all caches it's not
transactional, below is our cache configuration for one of the caches.

############### Cache Configuration ################

<property name="name" value="DeviceDetailsCache" />

                    <property name="cacheMode" value="PARTITIONED" />

                    <property name="atomicityMode" value="ATOMIC" />

                    <property name="backups" value="1" />

                    <property name="statisticsEnabled" value="true"/>



I have searched with word 'tcp-comm-worker' in whole log file, could see
single thread tcp-comm-worker-#1

 And also attached whole log file, please check  ignite.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2396/ignite.log

You can refer logs from 2020-12-11 12:50:00 to 12:54:00, we can see the
tread blocked while multiple clients trying to connect to cluster.

#############################

Line 2351: [2020-12-07T02:22:17,350][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1514, waitCnt=7411]

                Line 4507: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1514, waitCnt=7411]

                Line 50755: [2020-12-10T05:14:53,287][WARN
][sys-stripe-97-#98%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1556, waitCnt=7861]

                Line 54210: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1556, waitCnt=7861]

                Line 60084: [2020-12-10T14:30:33,360][WARN
][sys-stripe-33-#34%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 62467: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 69864: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 76870: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 84077: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 90968: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 97851: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 105066: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 113229: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 120851: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 124129: [2020-12-10T14:32:36,828][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 128469: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 136378: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 144518: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 150128: [2020-12-10T14:34:40,071][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 155898: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 164791: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 168214: [2020-12-10T14:35:43,175][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 172241: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 175537: [2020-12-10T14:37:46,346][WARN
][sys-stripe-64-#65%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 179032: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1561, waitCnt=7918]

                Line 190529: [2020-12-11T12:52:19,275][WARN
][sys-stripe-111-#112%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 194008: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 197166: [2020-12-11T12:54:26,461][WARN
][sys-stripe-28-#29%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 200639: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 203939: [2020-12-11T12:56:31,000][WARN
][sys-stripe-92-#93%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 207692: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8053]

                Line 211113: [2020-12-11T12:59:38,095][WARN
][tcp-disco-msg-worker-#2%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 214772: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 218112: [2020-12-11T13:01:41,252][WARN
][sys-stripe-28-#29%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 221990: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 225629: [2020-12-11T13:03:45,522][WARN
][sys-stripe-47-#48%EDIFCustomerCC%][G] Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]

                Line 229948: Thread
[name="tcp-comm-worker-#1%EDIFCustomerCC%", id=369, state=RUNNABLE,
blockCnt=1570, waitCnt=8054]



###########################################



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

We are using 2.7.6 version, is it available in the version.

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="clientMode" value="true"/>
    <property name="communicationSpi">
        <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
            <property name="forceClientToServerConnections" value="true"/>
        </bean>
    </property>
</bean>

Thanks and Regards,
Gangaiah



-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

We are using 2.7.6 version, is it available in the version.

And also, the clients which are running in kubernetes able to connect to the
cluster. Sometimes facing the issue whenever we restart the multiple
clients/adding multiple clients to cluster and suddenly sometimes the error
printing in logs but cluster is up and running. Sometimes even few clients
getting disconnected from cluster whenever you see the error in logs but not
all the times.





-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

I think it is only available in 2.9.0.

Prior to 2.9.0, clients outside of kubernetes may not connect to k8s cluster.

Regards,
--
Ilya Kasnacheev


вт, 15 дек. 2020 г. в 14:55, Gangaiah Gundeboina <[hidden email]>:
Hi Ilya,

We are using 2.7.6 version, is it available in the version.

And also, the clients which are running in kubernetes able to connect to the
cluster. Sometimes facing the issue whenever we restart the multiple
clients/adding multiple clients to cluster and suddenly sometimes the error
printing in logs but cluster is up and running. Sometimes even few clients
getting disconnected from cluster whenever you see the error in logs but not
all the times.





-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Gangaiah Gundeboina Gangaiah Gundeboina
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hi Ilya,

Our server nodes not running in kubernetes, so we have simple ignite cluster
no relation to kubernetes.

But we have few clients which running inside kubernetes cluster and few
clients are out side kubernets, both are connecting to same ignite cluster.




-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: partition-exchanger system-critical thread blocked

Hello!

I don't think running client nodes in kubernetes is supported. You can only use thin client inside k8s if your server nodes are outside.

Regards,
--
Ilya Kasnacheev


вт, 15 дек. 2020 г. в 20:25, Gangaiah Gundeboina <[hidden email]>:
Hi Ilya,

Our server nodes not running in kubernetes, so we have simple ignite cluster
no relation to kubernetes.

But we have few clients which running inside kubernetes cluster and few
clients are out side kubernets, both are connecting to same ignite cluster.




-----
Thanks and Regards,
Gangaiah
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/