Blocked system-critical thread has been detected - After upgrade to 2.8.1

classic Classic list List threaded Threaded
5 messages Options
Manu Manu
Reply | Threaded
Open this post in threaded view
|

Blocked system-critical thread has been detected - After upgrade to 2.8.1

Hi!

We have been working with Ignite 2.7.6 without incidents, since we upgrade
to 2.8.1 (same machine, same resources) we are getting "Blocked
system-critical thread", Ignite server nodes stops responding.

We have been notice that after several hours (about 8 or 9), it recovers
itself, but after some queries, countdown latches, queues and topics
creation stops working again.

We are tried to modify number of threads, timeouts without success.

Any idea?

Thanks!!

logs-from-ignite-server-data-in-ignite-server-data-0-7.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t547/logs-from-ignite-server-data-in-ignite-server-data-0-7.txt>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Blocked system-critical thread has been detected - After upgrade to 2.8.1

Hi,
   Looks like the system is being slowed down for some reason.
   Many Ignite operations are taking a long time(>30 sec) where they should
complete much quicker.

   Are you able to go back to 2.7.6 on the exact same config/setup, run the
system for
   a period of time then go to 2.8.1 and reproduce these issues?

   Are you using persistent store? If so, can you clean it and retry?
   Try disabling persistence to see if that helps, and if it does then we
need to diagnose
   why it is causing these issues.

   Describe your topology? How many clients/servers/thin clients etc.

   Can you simplify your topology to one server (and possibly one client)
and retry to see where
   the bottleneck is?  Also try lessening the load on the system to the
minimum possible to see whether
   these problems are throughput/load dependent.

   If you are unable to resolve this, but it is reproducible then describe
steps/include a small
  reproducer and logs(including GC logs) from all nodes.
https://apacheignite.readme.io/docs/jvm-and-system-tuning#detailed-garbage-collection-stats
Thanks, Alex





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Manu Manu
Reply | Threaded
Open this post in threaded view
|

Re: Blocked system-critical thread has been detected - After upgrade to 2.8.1

Hi Alex, thanks so much.

We are reduced topology to picture below (1 server node and 3 clients).

- 1 Ignite server node: IMDB with persistence enabled
- 3 Ignite client nodes: for SQL query, messaging (topic, queue) and
countdown latches.

All pluggable elements (TOPIC listener and QUEUE listener) are online.

This topology works perfectly with 2.7.6. But with 2.8.1 not...

Also we are detected that failure (blocked thread) occurs when pluggable
modules are online (green lines and blocks) and we make only 1 request (not
by heavy load).

<http://apache-ignite-users.70518.x6.nabble.com/file/t547/arch_i.png>



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Blocked system-critical thread has been detected - After upgrade to 2.8.1

Manu,
  Can you set up the lightest load/simplest topology possible, send the
Ignite logs from
  all nodes(server and clients), including the GC logs, and we will take a
look.
 
https://apacheignite.readme.io/docs/jvm-and-system-tuning#detailed-garbage-collection-stats
Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Manu Manu
Reply | Threaded
Open this post in threaded view
|

Re: Blocked system-critical thread has been detected - After upgrade to 2.8.1

Hi Alex

We can't share GC logs as they contains sensitive data. We have solve it by
creating a new data cluster with persistence enabled and moving data from
problematic cluster to new one.

As far as we can see, it seems that the problem is in the checkpoint
process, for some unknown reason (maybe it has to do with the migration from
2.7.6 to 2.8.1 and the new changes in persistence management) the checkpoint
thread is blocked.

Anyway, we will be attentive to this topic.

Thank you very much, greetings!




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/