Getting error Node is out of topology (probably, due to short-time network problems)

classic Classic list List threaded Threaded
8 messages Options
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

Getting error Node is out of topology (probably, due to short-time network problems)

Dear Team,

 

We are having 16  ignite worker nodes as data grid nodes  and the application is working fine . After few days/hours  we are getting  warning “Node is out of topology (probably, due to short-time network problems)”  and few nodes got down with System Critical error and cache was stopped on the particular nodes .  Attaching the ignite logs

Please suggest us what could be the issue and how to get the issue resolved.

 

Sent from Mail for Windows 10

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


error.txt (301K) Download Attachment
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

Re: Getting error Node is out of topology (probably, due to short-time network problems)

Can you also provide the logs for the few minutes before the "Node is out of
topology" message?

Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi Igor,

 

    PFA.  Complete log of the node for Node is out of topology(16 nodes are being used indicated as XX.XX.XXX.node1 to XX.XX.XXX.node16 in the log)

 

Sent from Mail for Windows 10

 


From: ibelyakov <[hidden email]>
Sent: Monday, November 16, 2020 8:00:33 PM
To: [hidden email] <[hidden email]>
Subject: Re: Getting error Node is out of topology (probably, due to short-time network problems)
 
"External email. Open with Caution"

Can you also provide the logs for the few minutes before the "Node is out of
topology" message?

Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


ignite-e86691f0.0.log (762K) Download Attachment
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi,

According to the provided log I see "Blocked system-critical thread has been
detected" message and that the node was segmented since it was unable to
respond to another node. Most probably it's caused by JVM pauses, possibly
related with GC.

Do you collect GC logs for the nodes?

You can find an information how to enable GC logs here:
https://ignite.apache.org/docs/latest/perf-and-troubleshooting/troubleshooting#detailed-gc-logs

Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi Igor,

 

     Asper the belowsuggesion, we have incorporated  jvm property as below and run all the 16 nodes.

“-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=10000”

 

Even though one of the node is out of topology and cache was stopped . PFA GClog and Ignite log for the same. Please suggest what can be done further.

 

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Monday, November 23, 2020 3:03 PM
To: [hidden email]
Subject: RE: Getting error Node is out of topology (probably, due to short-time network problems)

 

"External email. Open with Caution"

Hi,

According to the provided log I see "Blocked system-critical thread has been
detected" message and that the node was segmented since it was unable to
respond to another node. Most probably it's caused by JVM pauses, possibly
related with GC.

Do you collect GC logs for the nodes?

You can find an information how to enable GC logs here:
https://ignite.apache.org/docs/latest/perf-and-troubleshooting/troubleshooting#detailed-gc-logs

Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


ignite-e14afbe9.0.log (1M) Download Attachment
GClog.txt (28K) Download Attachment
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi Team,

 

As suggested , we have incorporated  below changes for running the nodes. Still we are getting “Getting error Node is out of topology (probably, due to short-time network problems)”. Also the data in the cache is lost every time the cache node being stopped

-Xms 512M

-Xmx5G

-XX:+AlwaysPreTouch

-XX:+UseG1GC

-XX:+ScavengeBeforeFullGC

-XX:+DisableExplicitGC

-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000

 

  We are having 16 nodes as data grid nodes/Computation nodes with each server having 4 CPU’s with 5GB RAM.

Attaching  configuration file and log files and GC log files.

 

We are observing this behavior when nodes are idle(no cache operation/no computation )  for some time (from few hours to few days). Please suggest

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Thursday, November 26, 2020 10:09 AM
To: [hidden email]
Subject: RE: Getting error Node is out of topology (probably, due to short-time network problems)

 

"External email. Open with Caution"

Hi Igor,

 

     Asper the belowsuggesion, we have incorporated  jvm property as below and run all the 16 nodes.

“-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=10000”

 

Even though one of the node is out of topology and cache was stopped . PFA GClog and Ignite log for the same. Please suggest what can be done further.

 

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Monday, November 23, 2020 3:03 PM
To: [hidden email]
Subject: RE: Getting error Node is out of topology (probably, due to short-time network problems)

 

"External email. Open with Caution"

Hi,

According to the provided log I see "Blocked system-critical thread has been
detected" message and that the node was segmented since it was unable to
respond to another node. Most probably it's caused by JVM pauses, possibly
related with GC.

Do you collect GC logs for the nodes?

You can find an information how to enable GC logs here:
https://ignite.apache.org/docs/latest/perf-and-troubleshooting/troubleshooting#detailed-gc-logs

Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

 


ignite-NODE6.log (1M) Download Attachment
GClog.txt (22K) Download Attachment
ignite-NODE8.log (1M) Download Attachment
ignite-worker-config.xml (31K) Download Attachment
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi,

According to the provided GC logs I don't see anything suspicious.

Do you run Ignite nodes on VMs? If yes, do you have monitoring and is it
possible to check CPU usage during period of time when the issue happend?

Regards,
Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

RE: Getting error Node is out of topology (probably, due to short-time network problems)

Hi Igor,

 

    We have Observed the CPU and memory  utilizations on the servers where these nodes are running and the CPU utilizations are very less . Still we are observing the issue (sometimes we are getting jvm pauses after that node is getting out of topology, sometimes without jvm pauses  node is getting out of topology)

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Tuesday, December 15, 2020 9:19 PM
To: [hidden email]
Subject: RE: Getting error Node is out of topology (probably, due to short-time network problems)

 

"External email. Open with Caution"

Hi,

According to the provided GC logs I don't see anything suspicious.

Do you run Ignite nodes on VMs? If yes, do you have monitoring and is it
possible to check CPU usage during period of time when the issue happend?

Regards,
Igor



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you