Error while adding the node the baseline topology

classic Classic list List threaded Threaded
5 messages Options
krkumar24061975@gmail.com krkumar24061975@gmail.com
Reply | Threaded
Open this post in threaded view
|

Error while adding the node the baseline topology

Hi guys - I am running into the following issue when trying to add a node to the baseline topology? Its happening only after we had upgraded from 2.3 to 2.75. Any pointers would be appreciated.

2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout
(ms)=771038
[2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s]
[2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING, blockCnt=0, waitCnt=36470]

[2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]]]
class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]

Thanx and Regards,
KR Kumar
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Error while adding the node the baseline topology

Hi,

What is the application doing while you are changing the topology? Is the cluster under the load?

Generally, we've added critical failure handlers in the latest version of Ignite and the message reported is printed out by them:

-
Denis


On Tue, Oct 22, 2019 at 7:57 AM KR Kumar <[hidden email]> wrote:
Hi guys - I am running into the following issue when trying to add a node to the baseline topology? Its happening only after we had upgraded from 2.3 to 2.75. Any pointers would be appreciated.

2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout
(ms)=771038
[2019-10-22 10:31:45,635][ERROR][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=data-streamer-stripe-30, blockedFor=95s]
[2019-10-22 10:31:45,635][WARN ][tcp-disco-msg-worker-#2][G] Thread [name="data-streamer-stripe-30-#79", id=110, state=TIMED_WAITING, blockCnt=0, waitCnt=36470]

[2019-10-22 10:31:45,637][ERROR][tcp-disco-msg-worker-#2][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]]]
class org.apache.ignite.IgniteException: GridWorker [name=data-streamer-stripe-30, igniteInstanceName=null, finished=false, heartbeatTs=1571754609956]

Thanx and Regards,
KR Kumar
krkumar24061975@gmail.com krkumar24061975@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: Error while adding the node the baseline topology

Hi - The application is doing two things, one thread is writing 2kb size
events to the ignite cache as a key value and other thread is executing
ignite SQLs thru ignite jdbc connections. The throughput is anything between
25K to 40K events per second on the cache size. We are using data streamer
for writing the key value cache. The cluster has 4 nodes with 198GB ram and
48 cores.

We got a similar error again and here is the error description:

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
[2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
[name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
waitCnt=5352642]

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
heartbeatTs=1572010973019]]]

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Error while adding the node the baseline topology

Have you tried to turn of the failure handling following  the previously shared documentation page? It looks like some timeouts need to be tuned.

Denis

On Friday, October 25, 2019, [hidden email] <[hidden email]> wrote:
Hi - The application is doing two things, one thread is writing 2kb size
events to the ignite cache as a key value and other thread is executing
ignite SQLs thru ignite jdbc connections. The throughput is anything between
25K to 40K events per second on the cache size. We are using data streamer
for writing the key value cache. The cluster has 4 nodes with 198GB ram and
48 cores.

We got a similar error again and here is the error description:

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
[2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
[name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
waitCnt=5352642]

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
heartbeatTs=1572010973019]]]

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
-
Denis

Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

Re: Error while adding the node the baseline topology

This message actually looks worrisome:
    2019-10-22 10:31:42,441][WARN ][data-streamer-stripe-3-#52][PageMemoryImpl] Parking thread=data-streamer-stripe-3-#52 for timeout (ms)=771038

It means that Ignite's throttling algorithm has decided to put a thread to sleep for 771 seconds.

Can you share your persistence configuration (DataStorageConfiguration or PersistenceStorageConfiguration).

Thanks,
Stan

On Thu, Oct 31, 2019 at 2:39 AM Denis Magda <[hidden email]> wrote:
Have you tried to turn of the failure handling following  the previously shared documentation page? It looks like some timeouts need to be tuned.

Denis

On Friday, October 25, 2019, [hidden email] <[hidden email]> wrote:
Hi - The application is doing two things, one thread is writing 2kb size
events to the ignite cache as a key value and other thread is executing
ignite SQLs thru ignite jdbc connections. The throughput is anything between
25K to 40K events per second on the cache size. We are using data streamer
for writing the key value cache. The cluster has 4 nodes with 198GB ram and
48 cores.

We got a similar error again and here is the error description:

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [threadName=data-streamer-stripe-0, blockedFor=2032s]
[2019-10-25 10:16:45,399][WARN ][disco-event-worker-#142][G] Thread
[name="data-streamer-stripe-0-#49", id=80, state=WAITING, blockCnt=7,
waitCnt=5352642]

[2019-10-25 10:16:45,399][ERROR][disco-event-worker-#142][root] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker
[name=data-streamer-stripe-0, igniteInstanceName=null, finished=false,
heartbeatTs=1572010973019]]]

Thanx and Regards,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
-
Denis