![]() |
Dear Team, We are running 16 Ignite nodes, few nodes are getting down with below error . Please let us know what could be possible reasons and solution if node is segmented and getting down. Error: Node is out of topology (probably, due to short-time network problems). [23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, SERVER8/XX.XX.XXX.IP8:47500],
discPort=47500, order=8, intOrder=8, lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false] Below are he jvm args we are providing to the nodes JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:MaxGCPauseMillis=200 -Xloggc:/path/to/logs/GClog.txt Djava.net.preferIPv4Stack=true -Dserver --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/sun.nio.ch=ALL-UNNAMED --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED" PFA the log attached Sent from Mail for Windows 10 =====-----=====-----===== |
![]() |
Hello! Do you also have logs from other server nodes? Here, I don't see anything particularly suspicious. Maybe there indeed were some short-term network problems? Regards, -- Ilya Kasnacheev ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <[hidden email]>:
|
![]() |
Hi Ilya, PFA., all 16 nodes logs and the node8 has been stopped with segmentation issue. Sent from Mail for Windows 10 From: [hidden email] "External email. Open with Caution"
Hello! Do you also have logs from other server nodes? Here, I don't see anything particularly suspicious. Maybe there indeed were some short-term network problems? Regards, -- Ilya Kasnacheev ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <[hidden email]>:
=====-----=====-----===== |
![]() |
Hello! It seems that node 8 was kicked out of cluster by node 7 after some timeout: [03:07:18,822][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=9989, rmtAddr=/xx.xx.xxx.IP8:47500, rmtPort=47500] [03:07:18,876][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cd ec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false],errMsg=Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cdec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, order=8, addr=[xx.xx.xxx.IP8, 127.0.0.1], daemon=false]]] [03:07:18,876][INFO][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] New next node [newNext=TcpDiscoveryNode [id=44f32796-5f72-4153-9b2a-ffe8dfde0947, addrs=[xx.xx.xxx.IP9, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP9:47500], discPort=47500, order=9, intOrder=9, lastExchangeTime=1609146421080, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false]] [03:07:18,885][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' [03:07:18,962][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false] [03:07:18,989][INFO][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Topology snapshot [ver=23, locNode=cdec00c4, servers=15, clients=2, state=ACTIVE, CPUs=60, offheap=17.0GB, heap=48.0GB] It's hard to say what caused this issue. Maybe there was indeed a short-lived network glitch. Regards, -- Ilya Kasnacheev пт, 8 янв. 2021 г. в 09:23, BEELA GAYATRI <[hidden email]>:
|
Free forum by Nabble | Edit this page |