![]() ![]() |
BEELA GAYATRI |
![]() |
Dear Team, We are having 16 ignite worker nodes as data grid nodes and the application is working fine . After few days/hours we are getting warning “Node is out of topology (probably, due to short-time network problems)” and few nodes got down with System Critical error and cache was stopped on the particular nodes . Attaching the ignite logs Please suggest us what could be the issue and how to get the issue resolved. Sent from Mail for Windows 10 =====-----=====-----===== |
![]() |
Can you also provide the logs for the few minutes before the "Node is out of
topology" message? Igor -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/ |
![]() ![]() |
BEELA GAYATRI |
![]() |
Hi Igor, PFA. Complete log of the node for Node is out of topology(16 nodes are being used indicated as XX.XX.XXX.node1 to XX.XX.XXX.node16 in the log) Sent from Mail for Windows 10 From: ibelyakov <[hidden email]>
Sent: Monday, November 16, 2020 8:00:33 PM To: [hidden email] <[hidden email]> Subject: Re: Getting error Node is out of topology (probably, due to short-time network problems) "External email. Open with Caution"
Can you also provide the logs for the few minutes before the "Node is out of topology" message? Igor -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/ =====-----=====-----===== |
![]() |
Hi,
According to the provided log I see "Blocked system-critical thread has been detected" message and that the node was segmented since it was unable to respond to another node. Most probably it's caused by JVM pauses, possibly related with GC. Do you collect GC logs for the nodes? You can find an information how to enable GC logs here: https://ignite.apache.org/docs/latest/perf-and-troubleshooting/troubleshooting#detailed-gc-logs Igor -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/ |
![]() ![]() |
BEELA GAYATRI |
![]() |
Hi Igor, Asper the belowsuggesion, we have incorporated jvm property as below and run all the 16 nodes. “-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=10000” Even though one of the node is out of topology and cache was stopped . PFA GClog and Ignite log for the same. Please suggest what can be done further. Sent from Mail for Windows 10 From: [hidden email] "External email. Open with Caution" =====-----=====-----===== |
![]() ![]() |
BEELA GAYATRI |
![]() |
Hi Team, As suggested , we have incorporated below changes for running the nodes. Still we are getting “Getting error Node is out of topology (probably, due to short-time network problems)”. Also the data in the cache is lost every time the cache node being stopped -Xms 512M -Xmx5G -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000 We are having 16 nodes as data grid nodes/Computation nodes with each server having 4 CPU’s with 5GB RAM. Attaching configuration file and log files and GC log files. We are observing this behavior when nodes are idle(no cache operation/no computation ) for some time (from few hours to few days). Please suggest Sent from Mail for Windows 10 From: [hidden email] "External email. Open with Caution"
Hi Igor, Asper the belowsuggesion, we have incorporated jvm property as below and run all the 16 nodes. “-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=10000” Even though one of the node is out of topology and cache was stopped . PFA GClog and Ignite log for the same. Please suggest what can be done further. Sent from
Mail for Windows 10 From: [hidden email] "External email. Open with Caution" =====-----=====-----===== ![]() ![]() ![]() ![]() |
![]() |
Hi,
According to the provided GC logs I don't see anything suspicious. Do you run Ignite nodes on VMs? If yes, do you have monitoring and is it possible to check CPU usage during period of time when the issue happend? Regards, Igor -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/ |
![]() ![]() |
BEELA GAYATRI |
![]() |
Hi Igor, We have Observed the CPU and memory utilizations on the servers where these nodes are running and the CPU utilizations are very less . Still we are observing the issue (sometimes we are getting jvm pauses after that node is getting out of topology, sometimes without jvm pauses node is getting out of topology) Sent from Mail for Windows 10 From: [hidden email] "External email. Open with Caution" =====-----=====-----===== |
Free forum by Nabble | Edit this page |