Node Segmentation Error

classic Classic list List threaded Threaded
4 messages Options
BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

Node Segmentation Error

Dear Team,

 

We are running 16  Ignite nodes, few nodes are getting down with  below error . Please let us know what could be possible reasons and solution  if node is segmented and getting down.

Error:

Node is out of topology (probably, due to short-time network problems).

[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false]

 

Below are he jvm args we are providing to the nodes

JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit

-XX:+AlwaysPreTouch

-XX:+UseG1GC

-XX:+ScavengeBeforeFullGC

-XX:+DisableExplicitGC 

-XX:+PrintGCDetails

-XX:MaxGCPauseMillis=200

-Xloggc:/path/to/logs/GClog.txt  

Djava.net.preferIPv4Stack=true  -Dserver --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED"

 

PFA the log attached

 

 

Sent from Mail for Windows 10

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


ignite-worker-config.xml (38K) Download Attachment
ignite-ad4f2ad9.0.log (1M) Download Attachment
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node Segmentation Error

Hello!

Do you also have logs from other server nodes?

Here, I don't see anything particularly suspicious. Maybe there indeed were some short-term network problems?

Regards,
--
Ilya Kasnacheev


ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <[hidden email]>:

Dear Team,

 

We are running 16  Ignite nodes, few nodes are getting down with  below error . Please let us know what could be possible reasons and solution  if node is segmented and getting down.

Error:

Node is out of topology (probably, due to short-time network problems).

[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false]

 

Below are he jvm args we are providing to the nodes

JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit

-XX:+AlwaysPreTouch

-XX:+UseG1GC

-XX:+ScavengeBeforeFullGC

-XX:+DisableExplicitGC 

-XX:+PrintGCDetails

-XX:MaxGCPauseMillis=200

-Xloggc:/path/to/logs/GClog.txt  

Djava.net.preferIPv4Stack=true  -Dserver --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED"

 

PFA the log attached

 

 

Sent from Mail for Windows 10

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

BEELA GAYATRI BEELA GAYATRI
Reply | Threaded
Open this post in threaded view
|

RE: Node Segmentation Error

Hi Ilya,

 

  PFA., all 16 nodes logs  and the node8 has been stopped with segmentation issue.

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Thursday, January 7, 2021 5:02 PM
To: [hidden email]
Subject: Re: Node Segmentation Error

 

"External email. Open with Caution"

Hello!

 

Do you also have logs from other server nodes?

 

Here, I don't see anything particularly suspicious. Maybe there indeed were some short-term network problems?

 

Regards,

--

Ilya Kasnacheev

 

 

ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <[hidden email]>:

Dear Team,

 

We are running 16  Ignite nodes, few nodes are getting down with  below error . Please let us know what could be possible reasons and solution  if node is segmented and getting down.

Error:

Node is out of topology (probably, due to short-time network problems).

[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false]

 

Below are he jvm args we are providing to the nodes

JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit

-XX:+AlwaysPreTouch

-XX:+UseG1GC

-XX:+ScavengeBeforeFullGC

-XX:+DisableExplicitGC 

-XX:+PrintGCDetails

-XX:MaxGCPauseMillis=200

-Xloggc:/path/to/logs/GClog.txt  

Djava.net.preferIPv4Stack=true  -Dserver --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED"

 

PFA the log attached

 

 

Sent from Mail for Windows 10

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

 


ignite_logs.zip (5M) Download Attachment
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Node Segmentation Error

Hello!

It seems that node 8 was kicked out of cluster by node 7 after some timeout:
[03:07:18,822][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=9989, rmtAddr=/xx.xx.xxx.IP8:47500, rmtPort=47500]   
[03:07:18,876][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cd
ec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false],errMsg=Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cdec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, order=8, addr=[xx.xx.xxx.IP8, 127.0.0.1], daemon=false]]]
[03:07:18,876][INFO][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] New next node [newNext=TcpDiscoveryNode [id=44f32796-5f72-4153-9b2a-ffe8dfde0947, addrs=[xx.xx.xxx.IP9, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP9:47500], discPort=47500, order=9, intOrder=9, lastExchangeTime=1609146421080, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false]]
[03:07:18,885][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
[03:07:18,962][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false]
[03:07:18,989][INFO][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Topology snapshot [ver=23, locNode=cdec00c4, servers=15, clients=2, state=ACTIVE, CPUs=60, offheap=17.0GB, heap=48.0GB]

It's hard to say what caused this issue. Maybe there was indeed a short-lived network glitch.

Regards,
--
Ilya Kasnacheev


пт, 8 янв. 2021 г. в 09:23, BEELA GAYATRI <[hidden email]>:

Hi Ilya,

 

  PFA., all 16 nodes logs  and the node8 has been stopped with segmentation issue.

 

Sent from Mail for Windows 10

 

From: [hidden email]
Sent: Thursday, January 7, 2021 5:02 PM
To: [hidden email]
Subject: Re: Node Segmentation Error

 

"External email. Open with Caution"

Hello!

 

Do you also have logs from other server nodes?

 

Here, I don't see anything particularly suspicious. Maybe there indeed were some short-term network problems?

 

Regards,

--

Ilya Kasnacheev

 

 

ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <[hidden email]>:

Dear Team,

 

We are running 16  Ignite nodes, few nodes are getting down with  below error . Please let us know what could be possible reasons and solution  if node is segmented and getting down.

Error:

Node is out of topology (probably, due to short-time network problems).

[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false]

 

Below are he jvm args we are providing to the nodes

JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit

-XX:+AlwaysPreTouch

-XX:+UseG1GC

-XX:+ScavengeBeforeFullGC

-XX:+DisableExplicitGC 

-XX:+PrintGCDetails

-XX:MaxGCPauseMillis=200

-Xloggc:/path/to/logs/GClog.txt  

Djava.net.preferIPv4Stack=true  -Dserver --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED"

 

PFA the log attached

 

 

Sent from Mail for Windows 10

 

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you