TcpDiscoverySpi worker thread failed with assertion error

classic Classic list List threaded Threaded
3 messages Options
Ropugg Ropugg
Reply | Threaded
Open this post in threaded view
|

TcpDiscoverySpi worker thread failed with assertion error

Sorry for using the same topic of
http://apache-ignite-users.70518.x6.nabble.com/TcpDiscoverySpi-worker-thread-failed-with-assertion-error-td14554.html
Actually I met the same issue exactly, although it was fixed in
https://issues.apache.org/jira/browse/IGNITE-5562

We have more than 6 clusters, each cluster has four or six nodes, and never
see this issue in other cluster. This issue only occurred in this node twice
at the past three weeks. It was crashed suddenly.
The logs show there is not load almostly, and other nodes in this clusters
work well.

Could someone give me any feedback to avoid this issue?

----------------------------

>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.7.0#20181130-sha1:256ae4012cb143b4855b598b740a6f3499ead4db
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 2.6.32-754.23.1.el6.x86_64 amd64
>>> CPU(s): 6
>>> Heap: 22.0GB
>>> VM name: [hidden email]
>>> Ignite instance name:
>>> prod-ignite-18w.an.xx.xxx.com_47600_prod-ignite-19w.an.xx.xxx.com_47600
>>> Local node [ID=831BB843-E190-48B1-B828-BBC9A4407B47, order=2,
>>> clientMode=false]
>>> Local node addresses: [prod-ignite-19w.xx.xxx.com/xx.xxx.xxx.xxx,
>>> /127.0.0.1]
>>> Local ports: TCP:10800 TCP:11211 TCP:47100 TCP:47600

--------------------------
[2020-03-04 03:12:09,649][INFO
][grid-timeout-worker-#23%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][tan_47600]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=831bb843,
name=prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600,
uptime=5 days, 04:06:39.768]
    ^-- H/N/C [hosts=6, nodes=6, CPUs=20]
    ^-- CPU [cur=0.47%, avg=1.11%, GC=0%]
    ^-- PageMemory [pages=585588]
    ^-- Heap [used=10009MB, free=55.57%, comm=22528MB]
    ^-- Off-heap [used=2300MB, free=81.58%, comm=2742MB]
    ^--   sysMemPlc region [used=0MB, free=99.17%, comm=40MB]
    ^--   CACHE_NODE_xG_Region region [used=2300MB, free=81.28%,
comm=2662MB]
    ^--   TxLog region [used=0MB, free=100%, comm=40MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=32, qSize=0]
[2020-03-04 03:12:09,649][INFO
][grid-timeout-worker-#23%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][tan_47600]
FreeList
[name=prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600,
buckets=256, dataPages=1, reusePages=0]
[2020-03-04 03:12:09,649][INFO
][grid-timeout-worker-#23%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][tan_47600]
FreeList
[name=prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600,
buckets=256, dataPages=365506, reusePages=2926]
[2020-03-04
03:12:48,906][ERROR][tcp-disco-msg-worker-#2%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node
in order to prevent cluster wide instability.
java.lang.AssertionError: -2977
        at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.onMessageSent(TcpDiscoveryStatistics.java:317)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:3301)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5305)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2828)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2611)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7188)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[2020-03-04 03:12:48,936][INFO ][node-stop-thread][GridTcpRestProtocol]
Command protocol successfully stopped: TCP binary
[2020-03-04
03:12:48,938][ERROR][tcp-disco-msg-worker-#2%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][root]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: -2977]]
java.lang.AssertionError: -2977
        at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.onMessageSent(TcpDiscoveryStatistics.java:317)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:3301)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5305)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2828)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2611)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7188)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2700)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7119)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[2020-03-04 03:12:48,938][INFO ][node-stop-thread][GridServiceProcessor]
Shutting down distributed service [name=cacheQueryService, execId8=65a391f9]
[2020-03-04 03:12:48,941][WARN
][tcp-disco-msg-worker-#2%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][FailureProcessor]
No deadlocked threads detected.
[2020-03-04 03:12:52,315][WARN ][jvm-pause-detector-worker][tan_47600]
Possible too long JVM pause: 3334 milliseconds.
[2020-03-04 03:12:52,342][WARN
][tcp-disco-msg-worker-#2%prod-ignite-18w.xx.xxx.com_47600_prod-ignite-19w.xx.xxx.com_47600%][FailureProcessor]
Thread dump at 2020/03/04 03:12:52 PST
Thread [name="node-stop-thread", id=59122, state=RUNNABLE, blockCnt=4,
waitCnt=5]
        at
o.a.i.i.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:464)
        at
o.a.i.i.processors.cache.GridCachePartitionExchangeManager.onKernalStop0(GridCachePartitionExchangeManager.java:777)
        at
o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.onKernalStop(GridCacheSharedManagerAdapter.java:120)
        at
o.a.i.i.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:1114)
        at o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2280)
        at o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2228)
        at
o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2612)
        - locked o.a.i.i.IgnitionEx$IgniteNamedInstance@624fb95
        at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2575)
        at o.a.i.i.IgnitionEx.stop(IgnitionEx.java:379)
        at
o.a.i.spi.discovery.tcp.ServerImpl$RingMessageWorker$1.run(ServerImpl.java:2719)
        at java.lang.Thread.run(Thread.java:748)
------------------------



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|

Re: TcpDiscoverySpi worker thread failed with assertion error

Hi

the issue was fixed: https://issues.apache.org/jira/browse/IGNITE-11952

please check the latest version 2.8.0

Thanks,
Mike.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ropugg Ropugg
Reply | Threaded
Open this post in threaded view
|

Re: TcpDiscoverySpi worker thread failed with assertion error

Thanks, Mike!
I will update and verify it.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/