Triggering Rebalancing Programmatically get error while requesting

classic Classic list List threaded Threaded
11 messages Options
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Triggering Rebalancing Programmatically get error while requesting

This post was updated on .

Hi all,

I am trying to install a life cycle to automatically set up baseline
topology.
I registered the event and wrote the code as instructed in the link below
https://apacheignite.readme.io/docs/baseline-topology
<https://apacheignite.readme.io/docs/baseline-topology

*I use testcase as follows*
- Continually make requests to write data to the cache
- Turn on the nodes in the ipfinder

When the number of nodes increases from 2 to 3, the following error is
received in the console

/[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=grid-timeout-worker,
blockedFor=36s]
[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
igniteInstanceName=TravelInventoryTesting, finished=false,
heartbeatTs=1553481506244]]]
class org.apache.ignite.IgniteException: GridWorker
[name=grid-timeout-worker, igniteInstanceName=TravelInventoryTesting,
finished=false, heartbeatTs=1553481506244]
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)/

I have attached the logs of the nodes.

*Thanks and best regards*

logs.rar <http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Hello!

Have you tried disabling failure detection, see if errors goes away?

Regards,
--
Ilya Kasnacheev


пн, 25 мар. 2019 г. в 06:25, luongbd.hust <[hidden email]>:
logs.rar <http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar>
Hi all,

I am trying to install a life cycle to automatically set up baseline
topology.
I registered the event and wrote the code as instructed in the link below
https://apacheignite.readme.io/docs/baseline-topology
<https://apacheignite.readme.io/docs/baseline-topology

*I use testcase as follows*
- Continually make requests to write data to the cache
- Turn on the nodes in the ipfinder

When the number of nodes increases from 2 to 3, the following error is
received in the console

/[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=grid-timeout-worker,
blockedFor=36s]
[09:39:03,020][SEVERE][tcp-disco-msg-worker-#2%TravelInventoryTesting%][]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=grid-timeout-worker,
igniteInstanceName=TravelInventoryTesting, finished=false,
heartbeatTs=1553481506244]]]
class org.apache.ignite.IgniteException: GridWorker
[name=grid-timeout-worker, igniteInstanceName=TravelInventoryTesting,
finished=false, heartbeatTs=1553481506244]
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at
org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at
org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.lambda$new$0(ServerImpl.java:2663)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7181)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)/

I have attached the logs of the nodes.

*Thanks and best regards*





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Thank Ilya Kasnacheev
I tried the way as you instructed.
But everything remains unchanged.
Cluster still does not meet the requests from clients.
And I am understanding that "Critical Failures Handling" cannot change
errors that occur.
*Thank and best regards*




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Hello!

Can you please re-run this case with "Critical Failures Handling" disabled, let it hang for some time, and then share logs of this run?

In this case it is reacting to timeout and not error, so maybe there's no error in the first place. I can see waiting on partition release future, but to understand its implications I need to see more logs.

Regards,
--
Ilya Kasnacheev


пн, 25 мар. 2019 г. в 12:17, luongbd.hust <[hidden email]>:
Thank Ilya Kasnacheev
I tried the way as you instructed.
But everything remains unchanged.
Cluster still does not meet the requests from clients.
And I am understanding that "Critical Failures Handling" cannot change
errors that occur.
*Thank and best regards*




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

In reply to this post by ilya.kasnacheev
hi Ilya,

I tried to follow the way you instructed.
But nothing has changed.
I have attached a log and configuration when testing.

disable-fail-handling.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/disable-fail-handling.rar>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Hello!

Can you please collect thread dumps from all nodes (after waiting around a minute once the error appears)?

Regards,
--
Ilya Kasnacheev


вт, 26 мар. 2019 г. в 05:34, luongbd.hust <[hidden email]>:
hi Ilya,

I tried to follow the way you instructed.
But nothing has changed.
I have attached a log and configuration when testing.

disable-fail-handling.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/disable-fail-handling.rar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Thank you for your enthusiasm

I attached the logs for a longer time after the error occurred.

logs.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
yakov yakov
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Ilya, have you had a chance to look into threaddumps?

--Yakov


ср, 27 мар. 2019 г. в 06:18, luongbd.hust <[hidden email]>:
Thank you for your enthusiasm

I attached the logs for a longer time after the error occurred.

logs.rar
<http://apache-ignite-users.70518.x6.nabble.com/file/t2354/logs.rar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Yes
I spent a lot of time trying to understand the cause of the error.
Including my company's time working so I don't want to waste it without
solving the problem.
So I decided to ask the community for help.
Because of my own ability, it is difficult to understand an open source
project like this.
I only understand the level of application for the product.
Sorry for the trouble.
I still hope someone can help me solve this problem.
Currently I have no way to solve this problem



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Hello!

Unfortunately, it is hard to say what is going on without thread dumps. Can you collect those using `jstack` utility?

I suspect you have some kind of deadlock.

There are suspicious things in your logs, but it's not completely clear what happens here.

Regards,
--
Ilya Kasnacheev


ср, 27 мар. 2019 г. в 10:43, luongbd.hust <[hidden email]>:
Yes
I spent a lot of time trying to understand the cause of the error.
Including my company's time working so I don't want to waste it without
solving the problem.
So I decided to ask the community for help.
Because of my own ability, it is difficult to understand an open source
project like this.
I only understand the level of application for the product.
Sorry for the trouble.
I still hope someone can help me solve this problem.
Currently I have no way to solve this problem



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
luongbd.hust luongbd.hust
Reply | Threaded
Open this post in threaded view
|

Re: Triggering Rebalancing Programmatically get error while requesting

Thanks ilya.
I'm currently switching to another task.
I am trying to come back to this issue soon.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/