ignite cluster lock up

classic Classic list List threaded Threaded
7 messages Options
Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

ignite cluster lock up

Hi, 


we have 10 clients (thick) connected to a ignite cluster (2 node, 16 threads each, plenty of ram). 

These clients are expected to stay connected indefinitely. 

New clients (thick) keep coming in, do a few queries and then they go out. 

All of these work fine for sometime - a few hours.


Then what we notice is, suddenly ignite gets into a lockup mode. 

New clients do not get connected. Old clients (those 10 mentioned above) cannot fetch data etc.


THe only way to get out of this lockup is to reboot those 10 clients one after the other. 

When a random client in that 10 list is rebooted, the lock goes away and everything works fine. 


Attached are the logs.


 


 




ignite-f5ab4e98.0.zip (364K) Download Attachment
Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

attached are the config files of the server and the client.




From: Mahesh Renduchintala
Sent: Friday, July 5, 2019 12:37 AM
To: [hidden email]
Subject: ignite cluster lock up
 

Hi, 


we have 10 clients (thick) connected to a ignite cluster (2 node, 16 threads each, plenty of ram). 

These clients are expected to stay connected indefinitely. 

New clients (thick) keep coming in, do a few queries and then they go out. 

All of these work fine for sometime - a few hours.


Then what we notice is, suddenly ignite gets into a lockup mode. 

New clients do not get connected. Old clients (those 10 mentioned above) cannot fetch data etc.


THe only way to get out of this lockup is to reboot those 10 clients one after the other. 

When a random client in that 10 list is rebooted, the lock goes away and everything works fine. 


Attached are the logs.


 


 




config.zip (4K) Download Attachment
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

Hello!

It looks to me that the node was out due to long GC or something similar. Try increasing failureDetectionTimeout on server nodes in case you expect long pauses.

Regards,
--
Ilya Kasnacheev


пт, 5 июл. 2019 г. в 07:43, Mahesh Renduchintala <[hidden email]>:

attached are the config files of the server and the client.




From: Mahesh Renduchintala
Sent: Friday, July 5, 2019 12:37 AM
To: [hidden email]
Subject: ignite cluster lock up
 

Hi, 


we have 10 clients (thick) connected to a ignite cluster (2 node, 16 threads each, plenty of ram). 

These clients are expected to stay connected indefinitely. 

New clients (thick) keep coming in, do a few queries and then they go out. 

All of these work fine for sometime - a few hours.


Then what we notice is, suddenly ignite gets into a lockup mode. 

New clients do not get connected. Old clients (those 10 mentioned above) cannot fetch data etc.


THe only way to get out of this lockup is to reboot those 10 clients one after the other. 

When a random client in that 10 list is rebooted, the lock goes away and everything works fine. 


Attached are the logs.


 


 



Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

The long JVM pauses are probably due to long time taken by GC...

The -XMX parameter is 64GB for me. 

should I be using more aggressive parameters to free up runtime heap quicker on the server node


I am using the recommended JVM options on ignite website. 

https://apacheignite.readme.io/docs/jvm-and-system-tuning#garbage-collection-tuning



ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

Hello!

64G is massive amount of heap. You should definitely increase all timeouts if you have more than, let's say, 16G.

Full GCs have to happen sometime, and they will be long.

Regards,
--
Ilya Kasnacheev


пт, 5 июл. 2019 г. в 16:16, Mahesh Renduchintala <[hidden email]>:

The long JVM pauses are probably due to long time taken by GC...

The -XMX parameter is 64GB for me. 

should I be using more aggressive parameters to free up runtime heap quicker on the server node


I am using the recommended JVM options on ignite website. 

https://apacheignite.readme.io/docs/jvm-and-system-tuning#garbage-collection-tuning
Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads, delivering in-memory speeds at petabyte scale



Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

We are now testing by increasing failureDetectionTimeout values


Even if full GC is running, why are ignite system threads blocked?

why aren't ignite system threads free to accept new connections?

Why exactly would rebooting a few of previously connected nodes, reset everything. 


There could be something else as well.

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: ignite cluster lock up

Hello!

When full GC is running, all threads are effectively blocked. This is why it's named 'GC pause'.

Regards,
--
Ilya Kasnacheev


сб, 6 июл. 2019 г. в 12:33, Mahesh Renduchintala <[hidden email]>:

We are now testing by increasing failureDetectionTimeout values


Even if full GC is running, why are ignite system threads blocked?

why aren't ignite system threads free to accept new connections?

Why exactly would rebooting a few of previously connected nodes, reset everything. 


There could be something else as well.