Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

classic Classic list List threaded Threaded
5 messages Options
VincentCE VincentCE
Reply | Threaded
Open this post in threaded view
|

Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

This post was updated on .
Hello!

In our project we are currently using ignite 2.81 and using zookeeper.
During the last couple of days we were facing shutdowns of some of our
ignite-server nodes.

Please find the logs below:

1) Why can there occur such long jvm/gc pauses although previous metrics in
the log do not indicate that imho?

2) We have the following timeouts set for the server-nodes. Which of them
would influence the handling after such long gc-pauses in order to avoid a
restart of the node?

Thanks in advance for your help!

Configs:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="peerClassLoadingEnabled" value="true" />
        <property name="failureDetectionTimeout" value="600000" />
        <property name="systemWorkerBlockedTimeout" value="600000" />
        <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi">
                <property name="zkConnectionString"
value="${ZOOKEEPER_CONNECT}"/>
                <property name="sessionTimeout" value="30000"/>
                <property name="zkRootPath" value="/apacheIgnite"/>
                <property name="joinTimeout" value="10000"/>
            </bean>
        </property>
        <property name="communicationSpi">
            <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
                <property name="socketWriteTimeout" value="30000" />
            </bean>
        </property> ....

LOGs:

[12:46:21,142][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:47:21,146][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:56:18.016]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=20318MB, free=44.88%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:47:21,146][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:48:21,154][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:57:18.025]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=13057MB, free=64.58%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:48:21,154][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:49:21,162][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:58:18.029]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=8768MB, free=76.21%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=14, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:49:21,162][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:50:21,163][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:59:18.031]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0.03%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=7632MB, free=79.3%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:50:21,163][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:51:21,168][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:00:18.038]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=27712MB, free=24.82%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:51:21,168][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:52:21,174][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:01:18.045]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=27118MB, free=26.44%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:52:21,174][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:53:21,183][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:02:18.048]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=20510MB, free=44.36%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:53:21,183][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:54:21,186][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:03:18.055]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=14928MB, free=59.51%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:54:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:54:43,809][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 1042 milliseconds.
[12:55:06,263][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 22404 milliseconds.
[12:55:07,081][INFO][zk-null-EventThread][ZookeeperClient] ZooKeeper client
state changed [prevState=Connected, newState=Disconnected]
[12:55:07,631][SEVERE][grid-nio-worker-tcp-comm-1-#37][TcpCommunicationSpi]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1,
bytesRcvd=1070545017136, bytesSent=76240610573, bytesRcvd0=864051,
bytesSent0=19236, select=true, super=GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false,
heartbeatTs=1604062506627, hashCode=1206603371, interrupted=false,
runner=grid-nio-worker-tcp-comm-1-#37]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=GridNioRecoveryDescriptor [acked=120544, resendCnt=0,
rcvCnt=115641, sentCnt=120546, reserved=true, lastAck=115616,
nodeLeft=false, node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], connected=false, connectCnt=198,
queueLimit=4096, reserveCnt=253, pairedConnections=false],
outRecovery=GridNioRecoveryDescriptor [acked=120544, resendCnt=0,
rcvCnt=115641, sentCnt=120546, reserved=true, lastAck=115616,
nodeLeft=false, node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], connected=false, connectCnt=198,
queueLimit=4096, reserveCnt=253, pairedConnections=false], closeSocket=true,
outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
super=GridNioSessionImpl [locAddr=/10.251.20.44:40114,
rmtAddr=/10.251.19.248:47100, createTime=1604058723468, closeTime=0,
bytesSent=17735247, bytesRcvd=1550895977, bytesSent0=19236,
bytesRcvd0=864051, sndSchedTime=1604058723468, lastSndTime=1604062506627,
lastRcvTime=1604062481469, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@3973847c, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
        at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
        at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1324)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2449)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2216)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1857)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:07,631][WARNING][grid-nio-worker-tcp-comm-1-#37][TcpCommunicationSpi]
Client disconnected abruptly due to network connection loss or because the
connection was left open on application shutdown. [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[12:55:08,215][SEVERE][grid-nio-worker-tcp-comm-0-#36][TcpCommunicationSpi]
Failed to read data from remote connection (will wait for 2000ms).
class org.apache.ignite.IgniteCheckedException: Failed to select events on
selector.
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2245)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1857)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException
        at
java.base/java.nio.channels.spi.AbstractSelectableChannel.register(Unknown
Source)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2060)
        ... 3 more
[12:55:08,688][SEVERE][sys-#63][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:10,727][SEVERE][sys-#59][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:12,772][SEVERE][sys-#69][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:14,819][SEVERE][sys-#70][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:16,853][SEVERE][sys-#62][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:18,913][SEVERE][sys-#57][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:20,951][SEVERE][sys-#58][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:21,186][INFO][grid-timeout-worker-#35][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:04:18.056]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=23856MB, free=35.29%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:55:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:55:23,002][SEVERE][sys-#66][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:25,049][SEVERE][sys-#61][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:27,097][SEVERE][sys-#65][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:29,165][SEVERE][sys-#60][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:31,207][SEVERE][sys-#67][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:33,241][SEVERE][sys-#63][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:35,278][SEVERE][sys-#59][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
        at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
        at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
[12:55:37,081][WARNING][zk-client-timer-null][ZookeeperClient] Failed to
establish ZooKeeper connection, close client [timeout=30000]
[12:55:37,082][WARNING][zk-client-timer-null][ZookeeperDiscoveryImpl]
Connection to Zookeeper server is lost, local node SEGMENTED.
[12:55:37,083][WARNING][disco-event-worker-#71][GridDiscoveryManager] Local
node SEGMENTED: ZookeeperClusterNode
[id=3f58f4f5-bb5a-4650-91f1-ebc3e3a40dac, addrs=[10.251.20.44, 127.0.0.1],
order=257, loc=true, client=false]
[12:55:37,107][SEVERE][disco-event-worker-#71][] Critical system error
detected. Will be handled accordingly to configured handler
[hnd=StopNodeFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SEGMENTATION, err=null]]
[12:55:37,114][WARNING][disco-event-worker-#71][CacheDiagnosticManager] Page
locks dump:

Thread=[name=data-streamer-stripe-0-#15, id=30], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-0-#15 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-1-#16, id=31], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-1-#16 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-10-#25, id=40], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-10-#25 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-11-#26, id=41], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-11-#26 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-12-#27, id=42], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-12-#27 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-13-#28, id=43], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-13-#28 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-2-#17, id=32], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-2-#17 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-3-#18, id=33], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-3-#18 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-4-#19, id=34], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-4-#19 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-5-#20, id=35], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-5-#20 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-6-#21, id=36], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-6-#21 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-7-#22, id=37], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-7-#22 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-8-#23, id=38], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-8-#23 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-9-#24, id=39], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-9-#24 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=exchange-worker-#72, id=119], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#72 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-#57, id=102], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#57 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#58, id=103], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#58 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#59, id=104], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#59 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#60, id=105], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#60 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#61, id=106], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#61 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#62, id=107], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#62 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#63, id=108], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#63 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#64, id=109], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#64 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#65, id=110], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#65 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#66, id=111], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#66 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#67, id=112], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#67 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#68, id=113], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#68 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#69, id=114], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#69 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#70, id=115], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#70 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-stripe-0-#1, id=16], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-0-#1 time=(1604062537107, 2020-10-30
12:55:37.107)


Thread=[name=sys-stripe-1-#2, id=17], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-1-#2 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-10-#11, id=26], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-10-#11 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-11-#12, id=27], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-11-#12 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-12-#13, id=28], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-12-#13 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-13-#14, id=29], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-13-#14 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-2-#3, id=18], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-2-#3 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-3-#4, id=19], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-3-#4 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-4-#5, id=20], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-4-#5 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-5-#6, id=21], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-5-#6 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-6-#7, id=22], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-6-#7 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-7-#8, id=23], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-7-#8 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-8-#9, id=24], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-8-#9 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-9-#10, id=25], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-9-#10 time=(1604062537108, 2020-10-30
12:55:37.108)



[12:55:37,115][SEVERE][disco-event-worker-#71][FailureProcessor] Ignite node
is in invalid state due to a critical failure.
[12:55:37,115][SEVERE][node-stopper][] Stopping local node on Ignite
failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
[12:55:37,118][INFO][node-stopper][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[12:55:37,126][INFO][node-stopper][GridJettyRestProtocol] Command protocol
successfully stopped: Jetty REST
[12:55:37,189][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=ignite-sys-cache]


                                                         ... 974 caches are stoppen in this section ...


[12:55:37,519][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=SomeCache]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgniteCacheService$IgniteComputeClusterGroupSizes,
alias=c.a.f.s.a.ignite.IgniteCacheService$IgniteComputeClusterGroupSizes]
[12:55:43,705][INFO][node-stopper][GridDeploymentLocalStore] Removed
undeployed class: GridDeployment [ts=1603209014896, depMode=SHARED,
clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@6a2f6f80,
clsLdrId=5f324b64571-3f58f4f5-bb5a-4650-91f1-ebc3e3a40dac, userVer=0,
loc=true,
sampleClsName=org.apache.ignite.internal.processors.continuous.GridContinuousProcessor,
pendingUndeploy=false, undeployed=true, usage=0]
[12:55:43,711][INFO][node-stopper][IgniteKernal]

>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 2.8.1#20200521-sha1:864220966caa4157c4fee8a1bc85171623963604
>>> stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 9 days, 21:04:40.583





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

Hi,

Long JVM pauses can lead to different problems, but in your case, I see some network problems that lead to a segmentation of some nodes:

Connection to Zookeeper server is lost, local node SEGMENTED.
What you can do to avoid current problem:

1)You should find out the reason for the pause. 22 seconds is a huge pause that can cause some operations to fail. It might be GC issues, but I am assuming you are using VMs and these pauses might just be VM pauses.
2)You can turn off you MMAP:

IGNITE_WAL_MMAP=false

3)You can increase client failure detection and failure detection timeouts:

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setFailureDetectionTimeout-long-
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/spi/IgniteSpiAdapter.html#clientFailureDetectionTimeout--

4)You can reduce communication timeouts (because I see your communication connection cannot be established):

<property name="communicationSpi">
       <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
          ...
         <property name="connectTimeout" value="5000"/>
         <property name="maxConnectTimeout" value="10000"/>
          ...
    </bean>
</property>

However, as I mentioned earlier, you must first figure out the reason for the pause of your virtual machine.

BR,
Andrei

10/30/2020 5:56 PM, VincentCE пишет:
Hello!

In our project we are currently using ignite 2.81 and using zookeeper.
During the last couple of days we were facing shutdowns of some of our
ignite-server nodes.

Please find the logs below:

1) Why can there occur such long jvm/gc pauses although previous metrics in
the log do not indicate that imho?

2) We have the following timeouts set for the server-nodes. Which of them
would influence the handling after such long gc-pauses in order to avoid a
restart of the node?

Thanks in advance for your help!

Configs:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="peerClassLoadingEnabled" value="true" />
        <property name="failureDetectionTimeout" value="600000" />
        <property name="systemWorkerBlockedTimeout" value="600000" />
        <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi">
                <property name="zkConnectionString"
value="${ZOOKEEPER_CONNECT}"/>
                <property name="sessionTimeout" value="30000"/>
                <property name="zkRootPath" value="/apacheIgnite"/>
                <property name="joinTimeout" value="10000"/>
            </bean>
        </property>
        <property name="communicationSpi">
            <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
                <property name="socketWriteTimeout" value="30000" />
            </bean>
        </property> ....

LOGs:

[12:46:21,142][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:47:21,146][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:56:18.016]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=20318MB, free=44.88%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:47:21,146][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:48:21,154][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:57:18.025]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=13057MB, free=64.58%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:48:21,154][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:49:21,162][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:58:18.029]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=8768MB, free=76.21%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=14, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:49:21,162][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:50:21,163][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 20:59:18.031]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0.03%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=7632MB, free=79.3%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:50:21,163][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:51:21,168][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:00:18.038]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=27712MB, free=24.82%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:51:21,168][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:52:21,174][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:01:18.045]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=27118MB, free=26.44%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:52:21,174][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:53:21,183][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:02:18.048]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=20510MB, free=44.36%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:53:21,183][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:54:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:03:18.055]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=14928MB, free=59.51%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:54:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:54:43,809][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 1042 milliseconds.
[12:55:06,263][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 22404 milliseconds.
[12:55:07,081][INFO][zk-null-EventThread][ZookeeperClient] ZooKeeper client
state changed [prevState=Connected, newState=Disconnected]
[12:55:07,631][SEVERE][grid-nio-worker-tcp-comm-1-#37][TcpCommunicationSpi]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1,
bytesRcvd=1070545017136, bytesSent=76240610573, bytesRcvd0=864051,
bytesSent0=19236, select=true, super=GridWorker
[name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false,
heartbeatTs=1604062506627, hashCode=1206603371, interrupted=false,
runner=grid-nio-worker-tcp-comm-1-#37]]],
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768],
inRecovery=GridNioRecoveryDescriptor [acked=120544, resendCnt=0,
rcvCnt=115641, sentCnt=120546, reserved=true, lastAck=115616,
nodeLeft=false, node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], connected=false, connectCnt=198,
queueLimit=4096, reserveCnt=253, pairedConnections=false],
outRecovery=GridNioRecoveryDescriptor [acked=120544, resendCnt=0,
rcvCnt=115641, sentCnt=120546, reserved=true, lastAck=115616,
nodeLeft=false, node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], connected=false, connectCnt=198,
queueLimit=4096, reserveCnt=253, pairedConnections=false], closeSocket=true,
outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric@69a257d1,
super=GridNioSessionImpl [locAddr=/10.251.20.44:40114,
rmtAddr=/10.251.19.248:47100, createTime=1604058723468, closeTime=0,
bytesSent=17735247, bytesRcvd=1550895977, bytesSent0=19236,
bytesRcvd0=864051, sndSchedTime=1604058723468, lastSndTime=1604062506627,
lastRcvTime=1604062481469, readsPaused=false,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=o.a.i.i.util.nio.GridDirectParser@3973847c, directMode=true],
GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]
java.io.IOException: Connection reset by peer
	at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.IOUtil.read(Unknown Source)
	at java.base/sun.nio.ch.SocketChannelImpl.read(Unknown Source)
	at
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1324)
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2449)
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2216)
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1857)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:07,631][WARNING][grid-nio-worker-tcp-comm-1-#37][TcpCommunicationSpi]
Client disconnected abruptly due to network connection loss or because the
connection was left open on application shutdown. [cls=class
o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[12:55:08,215][SEVERE][grid-nio-worker-tcp-comm-0-#36][TcpCommunicationSpi]
Failed to read data from remote connection (will wait for 2000ms).
class org.apache.ignite.IgniteCheckedException: Failed to select events on
selector.
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2245)
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1857)
	at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException
	at
java.base/java.nio.channels.spi.AbstractSelectableChannel.register(Unknown
Source)
	at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2060)
	... 3 more
[12:55:08,688][SEVERE][sys-#63][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:10,727][SEVERE][sys-#59][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:12,772][SEVERE][sys-#69][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:14,819][SEVERE][sys-#70][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:16,853][SEVERE][sys-#62][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:18,913][SEVERE][sys-#57][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:20,951][SEVERE][sys-#58][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3f58f4f5, uptime=9 days, 21:04:18.056]
    ^-- H/N/C [hosts=96, nodes=96, CPUs=1082]
    ^-- CPU [cur=-100%, avg=-100%, GC=0%]
    ^-- PageMemory [pages=16626106]
    ^-- Heap [used=23856MB, free=35.29%, comm=36864MB]
    ^-- Off-heap [used=65326MB, free=9.12%, comm=71760MB]
    ^--   sysMemPlc region [used=0MB, free=99.21%, comm=40MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^--   Default_Region region [used=65325MB, free=8.87%, comm=71680MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=14, qSize=0]
[12:55:21,186][INFO][grid-timeout-worker-#35][IgniteKernal] FreeList
[name=Default_Region##FreeList, buckets=256, dataPages=287347,
reusePages=3169711]
[12:55:23,002][SEVERE][sys-#66][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:25,049][SEVERE][sys-#61][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:27,097][SEVERE][sys-#65][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:29,165][SEVERE][sys-#60][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:31,207][SEVERE][sys-#67][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:33,241][SEVERE][sys-#63][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:35,278][SEVERE][sys-#59][TcpCommunicationSpi] Failed to send message
to remote node [node=ZookeeperClusterNode
[id=0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5, addrs=[127.0.0.1, 10.251.19.248],
order=117, loc=false, client=false], msg=GridIoMessage [plc=2,
topic=TOPIC_METRICS, topicOrd=29, ordered=false, timeout=0,
skipOnTimeout=false, msg=ClusterMetricsUpdateMessage []]]
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Remote node does not observe current node in topology :
0123dd90-265e-4bbb-a4e8-ec9d33dd0ce5
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3622)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
	at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
	at
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.updateMetrics(ClusterProcessor.java:509)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor.access$2200(ClusterProcessor.java:85)
	at
org.apache.ignite.internal.processors.cluster.ClusterProcessor$MetricsUpdateTimeoutObject.run(ClusterProcessor.java:788)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
	at java.base/java.lang.Thread.run(Unknown Source)
[12:55:37,081][WARNING][zk-client-timer-null][ZookeeperClient] Failed to
establish ZooKeeper connection, close client [timeout=30000]
[12:55:37,082][WARNING][zk-client-timer-null][ZookeeperDiscoveryImpl]
Connection to Zookeeper server is lost, local node SEGMENTED.
[12:55:37,083][WARNING][disco-event-worker-#71][GridDiscoveryManager] Local
node SEGMENTED: ZookeeperClusterNode
[id=3f58f4f5-bb5a-4650-91f1-ebc3e3a40dac, addrs=[10.251.20.44, 127.0.0.1],
order=257, loc=true, client=false]
[12:55:37,107][SEVERE][disco-event-worker-#71][] Critical system error
detected. Will be handled accordingly to configured handler
[hnd=StopNodeFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SEGMENTATION, err=null]]
[12:55:37,114][WARNING][disco-event-worker-#71][CacheDiagnosticManager] Page
locks dump:

Thread=[name=data-streamer-stripe-0-#15, id=30], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-0-#15 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-1-#16, id=31], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-1-#16 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-10-#25, id=40], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-10-#25 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-11-#26, id=41], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-11-#26 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-12-#27, id=42], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-12-#27 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-13-#28, id=43], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-13-#28 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-2-#17, id=32], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-2-#17 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-3-#18, id=33], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-3-#18 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-4-#19, id=34], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-4-#19 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-5-#20, id=35], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-5-#20 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-6-#21, id=36], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-6-#21 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-7-#22, id=37], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-7-#22 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-8-#23, id=38], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-8-#23 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=data-streamer-stripe-9-#24, id=39], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-9-#24 time=(1604062537108,
2020-10-30 12:55:37.108)


Thread=[name=exchange-worker-#72, id=119], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#72 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-#57, id=102], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#57 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#58, id=103], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#58 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#59, id=104], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#59 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#60, id=105], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#60 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#61, id=106], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#61 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#62, id=107], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#62 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#63, id=108], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#63 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#64, id=109], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#64 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#65, id=110], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#65 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#66, id=111], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#66 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#67, id=112], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#67 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#68, id=113], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#68 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#69, id=114], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#69 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-#70, id=115], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=sys-#70 time=(1604062537108, 2020-10-30 12:55:37.108)


Thread=[name=sys-stripe-0-#1, id=16], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-0-#1 time=(1604062537107, 2020-10-30
12:55:37.107)


Thread=[name=sys-stripe-1-#2, id=17], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-1-#2 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-10-#11, id=26], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-10-#11 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-11-#12, id=27], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-11-#12 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-12-#13, id=28], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-12-#13 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-13-#14, id=29], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-13-#14 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-2-#3, id=18], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-2-#3 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-3-#4, id=19], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-3-#4 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-4-#5, id=20], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-4-#5 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-5-#6, id=21], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-5-#6 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-6-#7, id=22], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-6-#7 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-7-#8, id=23], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-7-#8 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-8-#9, id=24], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-8-#9 time=(1604062537108, 2020-10-30
12:55:37.108)


Thread=[name=sys-stripe-9-#10, id=25], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-9-#10 time=(1604062537108, 2020-10-30
12:55:37.108)



[12:55:37,115][SEVERE][disco-event-worker-#71][FailureProcessor] Ignite node
is in invalid state due to a critical failure.
[12:55:37,115][SEVERE][node-stopper][] Stopping local node on Ignite
failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
[12:55:37,118][INFO][node-stopper][GridTcpRestProtocol] Command protocol
successfully stopped: TCP binary
[12:55:37,126][INFO][node-stopper][GridJettyRestProtocol] Command protocol
successfully stopped: Jetty REST
[12:55:37,189][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=ignite-sys-cache]
 ... 974 caches are stoppen in this section ...
[12:55:37,519][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=CVAR1-RE]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,703][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePositionResultComputation,
alias=c.a.f.s.a.ignite.IgnitePositionResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgnitePartialResultComputation,
alias=c.a.f.s.a.ignite.IgnitePartialResultComputation]
[12:55:43,704][INFO][node-stopper][GridDeploymentPerVersionStore] Class was
undeployed in SHARED or CONTINUOUS mode [cls=class
c.a.f.s.a.ignite.IgniteCacheService$IgniteComputeClusterGroupSizes,
alias=c.a.f.s.a.ignite.IgniteCacheService$IgniteComputeClusterGroupSizes]
[12:55:43,705][INFO][node-stopper][GridDeploymentLocalStore] Removed
undeployed class: GridDeployment [ts=1603209014896, depMode=SHARED,
clsLdr=jdk.internal.loader.ClassLoaders$AppClassLoader@6a2f6f80,
clsLdrId=5f324b64571-3f58f4f5-bb5a-4650-91f1-ebc3e3a40dac, userVer=0,
loc=true,
sampleClsName=org.apache.ignite.internal.processors.continuous.GridContinuousProcessor,
pendingUndeploy=false, undeployed=true, usage=0]
[12:55:43,711][INFO][node-stopper][IgniteKernal] 

+---------------------------------------------------------------------------------+
Ignite ver. 2.8.1#20200521-sha1:864220966caa4157c4fee8a1bc85171623963604
stopped OK
+---------------------------------------------------------------------------------+
Grid uptime: 9 days, 21:04:40.583




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
VincentCE VincentCE
Reply | Threaded
Open this post in threaded view
|

Re: Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

Hi aealexsandrov,

thanks a lot for your answer!

I have questions regarding your points in 3) and 4):

3) As you can see in the configs I posted we have set a quite large
failureDetectionTimeout=600000. So I guess increasing it even more would not
help us here a lot. Am I right?

4) Why would decreasing these timeout settings be useful here? It seems that
the connection establishment is being retried until finally after 30s
"Failed to
establish ZooKeeper connection, close client [timeout=30000]" appears.
Looking into the source code indicates that the timeout being used here
comes from the sessionTimeout in ZookeeperDiscoverySpi configuration which
is indeed 30000ms in our case. -> Are you saying that applying "<property
name="connectTimeout" value="5000"/> <property name="maxConnectTimeout"
value="10000"/>" would solve the root problem with establishing the
connection?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
VincentCE VincentCE
Reply | Threaded
Open this post in threaded view
|

Re: Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

Hi aealexsandrov respectively igniters,

I would really appreciate to get some answers to my follow-up questions in
particular to 4).

Thanks a lot!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Too long JVM pause out of nowhere leading into shutdowns of ignite-servers

Hello!

I'm afraid you're mostly on your own when it comes to ZooKeeper discovery. The recommendations usually apply to TCP/IP Discovery.

for 3) I think it is correct to assume that ZooKeeper timeout (probably configurable separately) is the culprit here, not the failure detection timeout.

Regards,
--
Ilya Kasnacheev


пн, 9 нояб. 2020 г. в 17:02, VincentCE <[hidden email]>:
Hi aealexsandrov respectively igniters,

I would really appreciate to get some answers to my follow-up questions in
particular to 4).

Thanks a lot!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/