Can't write to Ignite cluster

classic Classic list List threaded Threaded
6 messages Options
Ray Ray
Reply | Threaded
Open this post in threaded view
|

Can't write to Ignite cluster

I'm running a 3 nodes Ignite 2.4 cluster.
After some time, I can't write to this cluster using datastreamer.
Here's the log snippet.

[2018-07-12T09:31:05,839][ERROR][srvc-deploy-#179][GridServiceProcessor]
Error when executing service: null
 org.apache.ignite.IgniteException: Failed to resolve nodes topology
[cacheGrp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=17315,
minorTopVer=0], history=[AffinityTopologyVersion [topVer=19039,
minorTopVer=0], AffinityTopologyVersion [topVer=19040, minorTopVer=0],
AffinityTopologyVersion [topVer=19041, minorTopVer=0],
AffinityTopologyVersion [topVer=19042, minorTopVer=0],
AffinityTopologyVersion [topVer=19043, minorTopVer=0],
AffinityTopologyVersion [topVer=19044, minorTopVer=0],
AffinityTopologyVersion [topVer=19045, minorTopVer=0],
AffinityTopologyVersion [topVer=19046, minorTopVer=0],
AffinityTopologyVersion [topVer=19047, minorTopVer=0],
AffinityTopologyVersion [topVer=19048, minorTopVer=0],
AffinityTopologyVersion [topVer=19049, minorTopVer=0],
AffinityTopologyVersion [topVer=19050, minorTopVer=0],
AffinityTopologyVersion [topVer=19051, minorTopVer=0],
AffinityTopologyVersion [topVer=19052, minorTopVer=0],
AffinityTopologyVersion [topVer=19053, minorTopVer=0],
AffinityTopologyVersion [topVer=19054, minorTopVer=0],
AffinityTopologyVersion [topVer=19055, minorTopVer=0],
AffinityTopologyVersion [topVer=19056, minorTopVer=0],
AffinityTopologyVersion [topVer=19057, minorTopVer=0],
AffinityTopologyVersion [topVer=19058, minorTopVer=0],
AffinityTopologyVersion [topVer=19059, minorTopVer=0],
AffinityTopologyVersion [topVer=19060, minorTopVer=0],
AffinityTopologyVersion [topVer=19061, minorTopVer=0],
AffinityTopologyVersion [topVer=19062, minorTopVer=0],
AffinityTopologyVersion [topVer=19063, minorTopVer=0],
AffinityTopologyVersion [topVer=19064, minorTopVer=0],
AffinityTopologyVersion [topVer=19065, minorTopVer=0],
AffinityTopologyVersion [topVer=19066, minorTopVer=0],
AffinityTopologyVersion [topVer=19067, minorTopVer=0],
AffinityTopologyVersion [topVer=19068, minorTopVer=0],
AffinityTopologyVersion [topVer=19069, minorTopVer=0],
AffinityTopologyVersion [topVer=19070, minorTopVer=0],
AffinityTopologyVersion [topVer=19071, minorTopVer=0],
AffinityTopologyVersion [topVer=19072, minorTopVer=0],
AffinityTopologyVersion [topVer=19073, minorTopVer=0],
AffinityTopologyVersion [topVer=19074, minorTopVer=0],
AffinityTopologyVersion [topVer=19075, minorTopVer=0],
AffinityTopologyVersion [topVer=19076, minorTopVer=0],
AffinityTopologyVersion [topVer=19077, minorTopVer=0],
AffinityTopologyVersion [topVer=19078, minorTopVer=0],
AffinityTopologyVersion [topVer=19079, minorTopVer=0],
AffinityTopologyVersion [topVer=19080, minorTopVer=0],
AffinityTopologyVersion [topVer=19081, minorTopVer=0],
AffinityTopologyVersion [topVer=19082, minorTopVer=0],
AffinityTopologyVersion [topVer=19083, minorTopVer=0],
AffinityTopologyVersion [topVer=19084, minorTopVer=0],
AffinityTopologyVersion [topVer=19085, minorTopVer=0],
AffinityTopologyVersion [topVer=19086, minorTopVer=0],
AffinityTopologyVersion [topVer=19087, minorTopVer=0],
AffinityTopologyVersion [topVer=19088, minorTopVer=0],
AffinityTopologyVersion [topVer=19089, minorTopVer=0],
AffinityTopologyVersion [topVer=19090, minorTopVer=0],
AffinityTopologyVersion [topVer=19091, minorTopVer=0],
AffinityTopologyVersion [topVer=19092, minorTopVer=0],
AffinityTopologyVersion [topVer=19093, minorTopVer=0],
AffinityTopologyVersion [topVer=19094, minorTopVer=0],
AffinityTopologyVersion [topVer=19095, minorTopVer=0],
AffinityTopologyVersion [topVer=19096, minorTopVer=0],
AffinityTopologyVersion [topVer=19097, minorTopVer=0],
AffinityTopologyVersion [topVer=19098, minorTopVer=0],
AffinityTopologyVersion [topVer=19099, minorTopVer=0],
AffinityTopologyVersion [topVer=19100, minorTopVer=0],
AffinityTopologyVersion [topVer=19101, minorTopVer=0],
AffinityTopologyVersion [topVer=19102, minorTopVer=0],
AffinityTopologyVersion [topVer=19103, minorTopVer=0],
AffinityTopologyVersion [topVer=19104, minorTopVer=0],
AffinityTopologyVersion [topVer=19105, minorTopVer=0],
AffinityTopologyVersion [topVer=19106, minorTopVer=0],
AffinityTopologyVersion [topVer=19107, minorTopVer=0],
AffinityTopologyVersion [topVer=19108, minorTopVer=0],
AffinityTopologyVersion [topVer=19109, minorTopVer=0],
AffinityTopologyVersion [topVer=19110, minorTopVer=0],
AffinityTopologyVersion [topVer=19111, minorTopVer=0],
AffinityTopologyVersion [topVer=19112, minorTopVer=0],
AffinityTopologyVersion [topVer=19113, minorTopVer=0],
AffinityTopologyVersion [topVer=19114, minorTopVer=0],
AffinityTopologyVersion [topVer=19115, minorTopVer=0],
AffinityTopologyVersion [topVer=19116, minorTopVer=0],
AffinityTopologyVersion [topVer=19117, minorTopVer=0],
AffinityTopologyVersion [topVer=19118, minorTopVer=0],
AffinityTopologyVersion [topVer=19119, minorTopVer=0],
AffinityTopologyVersion [topVer=19120, minorTopVer=0],
AffinityTopologyVersion [topVer=19121, minorTopVer=0],
AffinityTopologyVersion [topVer=19122, minorTopVer=0],
AffinityTopologyVersion [topVer=19123, minorTopVer=0],
AffinityTopologyVersion [topVer=19124, minorTopVer=0],
AffinityTopologyVersion [topVer=19125, minorTopVer=0],
AffinityTopologyVersion [topVer=19126, minorTopVer=0],
AffinityTopologyVersion [topVer=19127, minorTopVer=0],
AffinityTopologyVersion [topVer=19128, minorTopVer=0],
AffinityTopologyVersion [topVer=19129, minorTopVer=0],
AffinityTopologyVersion [topVer=19130, minorTopVer=0],
AffinityTopologyVersion [topVer=19131, minorTopVer=0],
AffinityTopologyVersion [topVer=19132, minorTopVer=0],
AffinityTopologyVersion [topVer=19133, minorTopVer=0],
AffinityTopologyVersion [topVer=19134, minorTopVer=0],
AffinityTopologyVersion [topVer=19135, minorTopVer=0],
AffinityTopologyVersion [topVer=19136, minorTopVer=0],
AffinityTopologyVersion [topVer=19137, minorTopVer=0],
AffinityTopologyVersion [topVer=19138, minorTopVer=0],
AffinityTopologyVersion [topVer=19139, minorTopVer=0],
AffinityTopologyVersion [topVer=19140, minorTopVer=0],
AffinityTopologyVersion [topVer=19141, minorTopVer=0],
AffinityTopologyVersion [topVer=19142, minorTopVer=0],
AffinityTopologyVersion [topVer=19143, minorTopVer=0],
AffinityTopologyVersion [topVer=19144, minorTopVer=0],
AffinityTopologyVersion [topVer=19145, minorTopVer=0],
AffinityTopologyVersion [topVer=19146, minorTopVer=0],
AffinityTopologyVersion [topVer=19147, minorTopVer=0],
AffinityTopologyVersion [topVer=19148, minorTopVer=0],
AffinityTopologyVersion [topVer=19149, minorTopVer=0],
AffinityTopologyVersion [topVer=19150, minorTopV
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.resolveDiscoCache(GridDiscoveryManager.java:2001)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.cacheGroupAffinityNodes(GridDiscoveryManager.java:1898)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.affinityNodes(GridCacheUtils.java:459)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.nodes(GridCacheQueryAdapter.java:603)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.nodes(GridCacheQueryAdapter.java:575)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:522)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.service.GridServiceProcessor.serviceEntries(GridServiceProcessor.java:1525)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.service.GridServiceProcessor.access$1800(GridServiceProcessor.java:124)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.service.GridServiceProcessor$TopologyListener$1.run0(GridServiceProcessor.java:1767)
~[ignite-core-2.4.0.jar:2.4.0]
        at
org.apache.ignite.internal.processors.service.GridServiceProcessor$DepRunnable.run(GridServiceProcessor.java:2008)
[ignite-core-2.4.0.jar:2.4.0]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_161]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_161]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2018-07-12T09:31:06,023][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP
discovery spawning a new thread for connection [rmtAddr=/10.252.218.225,
rmtPort=32686]
[2018-07-12T09:31:06,023][INFO
][tcp-disco-sock-reader-#26287][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.252.218.225:32686, rmtPort=32686]
[2018-07-12T09:31:06,160][INFO
][tcp-disco-sock-reader-#26283][TcpDiscoverySpi] Finished serving remote
node connection [rmtAddr=/10.252.218.225:45189, rmtPort=45189
[2018-07-12T09:31:06,764][WARN ][exchange-worker-#162][diagnostic] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=17316, minorTopVer=0], node=03c0f866-38f5-4354-aabb-3e96f1fa17d4].
Dumping pending objects that might be the cause:
[2018-07-12T09:31:16,764][WARN ][exchange-worker-#162][diagnostic] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=17316, minorTopVer=0], node=03c0f866-38f5-4354-aabb-3e96f1fa17d4].
Dumping pending objects that might be the cause:
[2018-07-12T09:31:22,982][INFO ][grid-timeout-worker-#119][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=03c0f866, uptime=54:15:15.636]
    ^-- H/N/C [hosts=3, nodes=3, CPUs=168]
    ^-- CPU [cur=0.03%, avg=0.02%, GC=0%]
    ^-- PageMemory [pages=10756]
    ^-- Heap [used=1815MB, free=98.62%, comm=131072MB]
    ^-- Non heap [used=461MB, free=-1%, comm=599MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=12, qSize=0]
    ^-- System thread pool [active=0, idle=9, qSize=0]
[2018-07-12T09:31:26,764][WARN ][exchange-worker-#162][diagnostic] Failed to
wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=17316, minorTopVer=0], node=03c0f866-38f5-4354-aabb-3e96f1fa17d4].
Dumping pending objects that might be the cause:

And the cluster configuration is
   <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="peerClassLoadingEnabled" value="true"/>
        <property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="defaultDataRegionConfiguration">
                <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="default_Region"/>
                    <property name="initialSize" value="#{64L * 1024 * 1024
* 1024}"/>
                    <property name="maxSize" value="#{128L * 1024 * 1024 *
1024}"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{8L *
1024 * 1024 * 1024}"/>
                </bean>
            </property>
            <property name="walMode" value="BACKGROUND"/>
            <property name="walFlushFrequency" value="5000"/>
            <property name="checkpointFrequency" value="60000"/>
            </bean>
        </property>
        <property name="discoverySpi">
                <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                    <property name="localPort" value="49500"/>
                    <property name="ipFinder">
                        <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                            <property name="addresses">
                                <list>

<value>node1:49500</value>
<value>node2</value>
<value>node3:49500</value>
                                </list>
                            </property>
                        </bean>
                    </property>
                </bean>
            </property>
        <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
                <constructor-arg type="java.lang.String"
value="config/ignite-log4j2.xml"/>
            </bean>
        </property>
    </bean>
</beans>

And I observed the topology version is growing very fast which is very
strange.
Because there's no new data written into the cluster.
And I can query from the cluster but can't write any new data into the
cluster.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ray Ray
Reply | Threaded
Open this post in threaded view
|

Re: Can't write to Ignite cluster

Here's the full log and thread dump for three nodes and client to ingest
data.

node1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.zip>  
node2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.zip>  
node3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.zip>  
client_log.client_log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/client_log.client_log>  

And I can only query Ignite cluster using sqlline.
When when I try to launch a Java client, it failed.
The log is similar with client_log I attached.
These two logs is printed again and again
18/07/13 01:33:18 WARN cache.GridCachePartitionExchangeManager: Failed to
wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
18/07/13 01:33:18 WARN internal.diagnostic: Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=25308, minorTopVer=0],
node=3c164ab8-0cf1-4451-8bfe-0c415ac932cd]. Dumping pending objects that
might be the cause:

Can anybody advise me why the topology version keeps increasing so I can do
a preliminary research?
From my prior experience with Ignite, the topology version shouldn't be
increasing when there's no data ingested into cluster.
 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Pavel Vinokurov Pavel Vinokurov
Reply | Threaded
Open this post in threaded view
|

Re: Can't write to Ignite cluster

Hi,

It looks pretty strange such topology version increasing.
Would you be able to show  how do you launch the cluster and use the datastreamer.

Thanks,
Pavel

2018-07-13 9:50 GMT+03:00 Ray <[hidden email]>:
Here's the full log and thread dump for three nodes and client to ingest
data.

node1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.zip
node2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.zip
node3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.zip
client_log.client_log
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/client_log.client_log

And I can only query Ignite cluster using sqlline.
When when I try to launch a Java client, it failed.
The log is similar with client_log I attached.
These two logs is printed again and again
18/07/13 01:33:18 WARN cache.GridCachePartitionExchangeManager: Failed to
wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
18/07/13 01:33:18 WARN internal.diagnostic: Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=25308, minorTopVer=0],
node=3c164ab8-0cf1-4451-8bfe-0c415ac932cd]. Dumping pending objects that
might be the cause:

Can anybody advise me why the topology version keeps increasing so I can do
a preliminary research?
From my prior experience with Ignite, the topology version shouldn't be
increasing when there's no data ingested into cluster.



--

Regards

Pavel Vinokurov

Ray Ray
Reply | Threaded
Open this post in threaded view
|

Re: Can't write to Ignite cluster

Hi Pavel,

I started Ignite using this command.
/usr/bin/java -Xmx128g -Xms128g -XX:+UseG1GC -XX:+PrintAdaptiveSizePolicy
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+AlwaysPreTouch
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-Xloggc:/spare/ignite/log/ignitegc-2018_07_14-02_38.log -DIGNITE_QUIET=true
-DIGNITE_SUCCESS_FILE=/spare/ignite/work/ignite_success_c6a52b2c-db0d-4eb7-a0e6-64df5ec615c4
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=18999
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=node1
-DIGNITE_HOME=/opt/apache-ignite
-DIGNITE_PROG_NAME=/opt/apache-ignite/bin/ignite.sh -cp
/opt/apache-ignite/libs/*:/opt/apache-ignite/libs/ignite-indexing/*:/opt/apache-ignite/libs/ignite-log4j2/*:/opt/apache-ignite/libs/ignite-rest-http/*:/opt/apache-ignite/libs/ignite-spring/*:/opt/apache-ignite/libs/licenses/*
org.apache.ignite.startup.cmdline.CommandLineStartup
/opt/apache-ignite/config/persistent-config.xml

And I use Spark dataframe API to ingest data into Ignite.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Pavel Vinokurov Pavel Vinokurov
Reply | Threaded
Open this post in threaded view
|

Re: Can't write to Ignite cluster

Ray,

As I see from following logs: 
18/07/13 01:33:18 WARN cache.GridCachePartitionExchangeManager: >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=25309, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=0caad476-3652-453f-8fc8-e8880c12eea9, addrs=[10.252.218.225, 127.0.0.1], sockAddrs=[/127.0.0.1:49501, rpbt1ign003.webex.com/10.252.218.225:49501], discPort=49501, order=25309, intOrder=12657, lastExchangeTime=1531445583607, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]

A new node from 10.252.218.225 was joined to the cluster with discovery port  49501.
It means that port 49500 was busy and 49501 was chosen according to TcpDiscoverySpi#locPortRange parameter. But this port isn't included in TcpDiscoveryVmIpFinder#addresses.
Please check that you start only one node on host or/and configure locPortRange and addresses parameters.

Thanks,
Pavel




2018-07-16 6:37 GMT+03:00 Ray <[hidden email]>:
Hi Pavel,

I started Ignite using this command.
/usr/bin/java -Xmx128g -Xms128g -XX:+UseG1GC -XX:+PrintAdaptiveSizePolicy
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+AlwaysPreTouch
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-Xloggc:/spare/ignite/log/ignitegc-2018_07_14-02_38.log -DIGNITE_QUIET=true
-DIGNITE_SUCCESS_FILE=/spare/ignite/work/ignite_success_c6a52b2c-db0d-4eb7-a0e6-64df5ec615c4
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=18999
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=node1
-DIGNITE_HOME=/opt/apache-ignite
-DIGNITE_PROG_NAME=/opt/apache-ignite/bin/ignite.sh -cp
/opt/apache-ignite/libs/*:/opt/apache-ignite/libs/ignite-indexing/*:/opt/apache-ignite/libs/ignite-log4j2/*:/opt/apache-ignite/libs/ignite-rest-http/*:/opt/apache-ignite/libs/ignite-spring/*:/opt/apache-ignite/libs/licenses/*
org.apache.ignite.startup.cmdline.CommandLineStartup
/opt/apache-ignite/config/persistent-config.xml

And I use Spark dataframe API to ingest data into Ignite.



--

Regards

Pavel Vinokurov

Ray Ray
Reply | Threaded
Open this post in threaded view
|

Re: Can't write to Ignite cluster

Hello Pavel,

I have found out why the topology version keeps increasing.
It's because my colleague created a customized Ignite monitor system which
will fetches metrics from Ignite visor.
And this monitor system will launch a Visor client connects to the cluster
every minute, after fetching all the metrics from Visor it will shut down.
This behavior will cause the topology version increasing. (This is a bad
practice to do monitor)
This is also why you're seeing the NODE_JOINED log.

Now I've stooped this monitor system and restarted cluster.
This issue seems to be fixed.

But we need to think is launching a Visor client connects to the cluster
every minute will cause cluster freeze expected behavior?
As I observed launching a Visor client connects to the cluster is not a
simple operation, it usually takes more than 5 seconds to finish partition
exchange and other steps before Visor is connected.

Thanks for your help, Pavel.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/