Node stopped. Ignite node is in invalid state due to a critical failure.

classic Classic list List threaded Threaded
9 messages Options
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Node stopped. Ignite node is in invalid state due to a critical failure.

Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

Hi I have attached some details here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

How do I enable GC logs? I'm running the Debian package.

In summary:
1- Only ignite is running on the host
2- Ignite is configured to use 4GB heap
3- Host has 16GB total
4- 10GB off-heap configured
5- Above is same for all 3 hosts

On Wed, 28 Aug 2019 at 08:28, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

Hello!

Don't see any logs here.

This setting may be problematic because you consume 14 out of 16GB by a single Ignite process so system may decide to swap something out. I recommend decreasing heap to 2G if possible. Should also make GC faster.

I'm not sure how to enable GC logs when running a package.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 17:21, John Smith <[hidden email]>:
Hi I have attached some details here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

How do I enable GC logs? I'm running the Debian package.

In summary:
1- Only ignite is running on the host
2- Ignite is configured to use 4GB heap
3- Host has 16GB total
4- 10GB off-heap configured
5- Above is same for all 3 hosts

On Wed, 28 Aug 2019 at 08:28, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

The drop box link here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

Didn't take any logs just some visor printout and some Linux command printouts and cat some info/stats from the logs just to be sure I wasn't reading the wrong values.
Everything else as far as am aware is default config.

Plus 4GB heap is nothing, it shouldn't cause a huge delay?

On Wed, 28 Aug 2019 at 11:09, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Don't see any logs here.

This setting may be problematic because you consume 14 out of 16GB by a single Ignite process so system may decide to swap something out. I recommend decreasing heap to 2G if possible. Should also make GC faster.

I'm not sure how to enable GC logs when running a package.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 17:21, John Smith <[hidden email]>:
Hi I have attached some details here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

How do I enable GC logs? I'm running the Debian package.

In summary:
1- Only ignite is running on the host
2- Ignite is configured to use 4GB heap
3- Host has 16GB total
4- 10GB off-heap configured
5- Above is same for all 3 hosts

On Wed, 28 Aug 2019 at 08:28, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

Hello!

Well, my recommendation is to find a way to enable GC logs and collect regular logs as well, from all nodes.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 18:18, John Smith <[hidden email]>:
The drop box link here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

Didn't take any logs just some visor printout and some Linux command printouts and cat some info/stats from the logs just to be sure I wasn't reading the wrong values.
Everything else as far as am aware is default config.

Plus 4GB heap is nothing, it shouldn't cause a huge delay?

On Wed, 28 Aug 2019 at 11:09, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Don't see any logs here.

This setting may be problematic because you consume 14 out of 16GB by a single Ignite process so system may decide to swap something out. I recommend decreasing heap to 2G if possible. Should also make GC faster.

I'm not sure how to enable GC logs when running a package.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 17:21, John Smith <[hidden email]>:
Hi I have attached some details here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

How do I enable GC logs? I'm running the Debian package.

In summary:
1- Only ignite is running on the host
2- Ignite is configured to use 4GB heap
3- Host has 16GB total
4- 10GB off-heap configured
5- Above is same for all 3 hosts

On Wed, 28 Aug 2019 at 08:28, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Node stopped. Ignite node is in invalid state due to a critical failure.

For GC logs I think I can just change the ignite.sh script in /usr/share/apache-ignite/bin ???

Like right now I have a "heavy" load running and everything seems to be fine...

Ok I'll check if it happens again....

On Wed, 28 Aug 2019 at 11:21, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Well, my recommendation is to find a way to enable GC logs and collect regular logs as well, from all nodes.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 18:18, John Smith <[hidden email]>:
The drop box link here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

Didn't take any logs just some visor printout and some Linux command printouts and cat some info/stats from the logs just to be sure I wasn't reading the wrong values.
Everything else as far as am aware is default config.

Plus 4GB heap is nothing, it shouldn't cause a huge delay?

On Wed, 28 Aug 2019 at 11:09, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Don't see any logs here.

This setting may be problematic because you consume 14 out of 16GB by a single Ignite process so system may decide to swap something out. I recommend decreasing heap to 2G if possible. Should also make GC faster.

I'm not sure how to enable GC logs when running a package.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 17:21, John Smith <[hidden email]>:
Hi I have attached some details here: https://www.dropbox.com/s/etm61xeb9mghs9m/ignite-details.log?dl=0

How do I enable GC logs? I'm running the Debian package.

In summary:
1- Only ignite is running on the host
2- Ignite is configured to use 4GB heap
3- Host has 16GB total
4- 10GB off-heap configured
5- Above is same for all 3 hosts

On Wed, 28 Aug 2019 at 08:28, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It's hard to say what happens here. Do you have GC log? Please make sure to collect it.

Is there anything running in the same JVM with Ignite?

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 15:13, John Smith <[hidden email]>:
I'm not doing anything fancy with the cache I have 3 million records partitioned cache over 3 servers. And all I do is some put and gets. Unless I have a bad config?

On Wed., Aug. 28, 2019, 6:32 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

Well, it's pretty descriptive. Node was dropped from topology because of long GC pauses.

Either find ways to decrease GC pauses, or increase failureDetectionTimeout.

Regards,
--
Ilya Kasnacheev


ср, 28 авг. 2019 г. в 00:18, John Smith <[hidden email]>:
Hi, running 2.7.0

I noticed one of my nodes was down. It seems to have turned itself off, because of: Ignite node is in invalid state due to a critical failure.

I attached logs here: https://www.dropbox.com/s/82li1020a5ig4ty/ignite-failled.log?dl=0