Sudden node failure on Ignite v2.7.5

classic Classic list List threaded Threaded
8 messages Options
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Sudden node failure on Ignite v2.7.5

This post was updated on .
Hi Igniters,

Recently (11.07.2019), we have upgraded our ignite versin from 2.7.0 to
2.7.5. Just like after 11 hours one of our nodes killed itself without any
notification. I am adding the details that I could get from the server and
the topology we use;

*Ignite version*: 2.7.5
*Cluster size*: 16
*Client size*: 22
*Cluster OS version*: Centos 7
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64
*Java version* :
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

By the way this is a production environment on VM and we have been using this
topology for almost 5 months. Our average tps size is ~5000 for the cluster.
We have 8 to 10 different object that we persist on ignite, some of them
relatively big and some are just strings.

ignite.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip
gc.current
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/gc.current
hs_err_pid18537.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/hs_err_pid18537.log
 



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
Pavel Vinokurov Pavel Vinokurov
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

Hi,

It looks like the issue described in https://issues.apache.org/jira/browse/IGNITE-11953
There a two options:
1. I believe it will be fixed very soon. Thus you could create a build based on 2.7.5 with cherry-picked fix.
2. You could remove the cachegroup property from the cache configuration.
In this case you have to start a new cluster and it requires much more heap memory on the startup since there are about 650 caches.

Thanks,
Pavel

сб, 13 июл. 2019 г. в 18:18, ihalilaltun <[hidden email]>:
Hi Igniters,

Recently (11.07.2019), we have upgraded our ignite versin from 2.7.0 to
2.7.5. Just like after 11 hours one of our nodes killed itself without any
notification. I am adding the details that I could get from the server and
the topology we use;

*Ignite version*: 2.7.5
*Cluster size*: 16
*Client size*: 22
*Cluster OS version*: Centos 7
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64
*Java version* :
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

By the way this is a production environment and we have been using this
topology for almost 5 months. Our average tps size is ~5000 for the cluster.
We have 8 to 10 different object that we persist on ignite, some of them
relatively big and some ara just strings.

ignite.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip
gc.current
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/gc.current
hs_err_pid18537.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/hs_err_pid18537.log




-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Regards

Pavel Vinokurov

ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

Hi Pavel,

Thanks for you reply. Since we use the whole sysyem on production
environment we cannot apply the second solution.
Do you have any estimated time for the first solution/fix?

Thanks.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
Павлухин Иван Павлухин Иван
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

Hi,

It seems that the issue [1] was already fixed.

[1] https://issues.apache.org/jira/browse/IGNITE-11953

вт, 16 июл. 2019 г. в 09:30, ihalilaltun <[hidden email]>:

>
> Hi Pavel,
>
> Thanks for you reply. Since we use the whole sysyem on production
> environment we cannot apply the second solution.
> Do you have any estimated time for the first solution/fix?
>
> Thanks.
>
>
>
> -----
> İbrahim Halil Altun
> Senior Software Engineer @ Segmentify
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

This post was updated on .
Hi Ivan

Thanks for the reply. I've checked the jira issue and it says it will be
released in v2.8, when do you think v2.8 will be released?
I dont want to get a custom build with cherry-pick :(



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
Павлухин Иван Павлухин Иван
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

Unfortunately I do not know. Currently there is no release activity on 2.8.

пт, 19 июл. 2019 г. в 09:39, ihalilaltun <[hidden email]>:

>
> Hi Ivan
>
> Thanks for the reply. I've checked the jira issue and it says it will be
> released in v2.8, when do you think v2.8 will be released?
>
>
>
> -----
> İbrahim Halil Altun
> Senior Software Engineer @ Segmentify
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Sudden node failure on Ignite v2.7.5

In reply to this post by Pavel Vinokurov
Majid, 

Send an email to this address to unsubscribe. We can't do that for you:

-
Denis


On Mon, Jul 15, 2019 at 1:42 AM Majid Salimi <[hidden email]> wrote:
unsubscribe me, please! I don't want to receive emails.

On Mon, Jul 15, 2019 at 1:06 PM Pavel Vinokurov <[hidden email]> wrote:
Hi,

It looks like the issue described in https://issues.apache.org/jira/browse/IGNITE-11953
There a two options:
1. I believe it will be fixed very soon. Thus you could create a build based on 2.7.5 with cherry-picked fix.
2. You could remove the cachegroup property from the cache configuration.
In this case you have to start a new cluster and it requires much more heap memory on the startup since there are about 650 caches.

Thanks,
Pavel

сб, 13 июл. 2019 г. в 18:18, ihalilaltun <[hidden email]>:
Hi Igniters,

Recently (11.07.2019), we have upgraded our ignite versin from 2.7.0 to
2.7.5. Just like after 11 hours one of our nodes killed itself without any
notification. I am adding the details that I could get from the server and
the topology we use;

*Ignite version*: 2.7.5
*Cluster size*: 16
*Client size*: 22
*Cluster OS version*: Centos 7
*Cluster Kernel version*: 4.4.185-1.el7.elrepo.x86_64
*Java version* :
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

By the way this is a production environment and we have been using this
topology for almost 5 months. Our average tps size is ~5000 for the cluster.
We have 8 to 10 different object that we persist on ignite, some of them
relatively big and some ara just strings.

ignite.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip
gc.current
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/gc.current
hs_err_pid18537.log
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/hs_err_pid18537.log




-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Regards

Pavel Vinokurov



--
Regards,
Majid Salimi Beni
M.Sc. Student of Computer Engineering,
Department of Computer Science and Engineering & IT
Shiraz University
Attachments area
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Fwd: Sudden node failure on Ignite v2.7.5

In reply to this post by ihalilaltun
I think we'll release the next version in October. GridGain might release it earlier in its community edition.

-
Denis


---------- Forwarded message ---------
From: ihalilaltun <[hidden email]>
Date: Thu, Jul 18, 2019 at 11:39 PM
Subject: Re: Sudden node failure on Ignite v2.7.5
To: <[hidden email]>


Hi Ivan

Thanks for the reply. I've checked the jira issue and it says it will be
released in v2.8, when do you think v2.8 will be released?



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/