Failed to reinitialize local partitions (rebalancing will be stopped)

classic Classic list List threaded Threaded
11 messages Options
ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Failed to reinitialize local partitions (rebalancing will be stopped)

I have three server node. Two of them with data-node attribute and remaining
one with worker-node attribute. Persistence is enabled on data-nodes. When I
try to activate the cluster I get bellow exception on worker node:

2019-12-27 18:22:58.181 ERROR 178084 --- [nge-worker-#103]
.c.d.d.p.GridDhtPartitionsExchangeFuture : Failed to reinitialize local
partitions (rebalancing will be stopped): GridDhtPartitionExchangeId
[topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1],
discoEvt=DiscoveryCustomEvent [customMsg=ChangeGlobalStateMessage
[id=cab3a674f61-ba691b6c-49a7-4148-ac76-1cdb353595b6,
reqId=28e7c9d2-de6e-4e44-9394-142d1d4743aa,
initiatingNodeId=e75abdeb-3a9b-4217-8596-dfb764d14a8e, activate=true,
baselineTopology=BaselineTopology [id=0, branchingHash=-539144375,
branchingType='New BaselineTopology',
baselineNodes=[4b809903-efc7-45f6-986e-584095bb96c5,
127.0.0.1,172.17.241.80,172.17.5.36:42502,
141cd0a2-e30f-441c-a54f-0be995ff1a41]], forceChangeBaselineTopology=false,
timestamp=1577451177862], affTopVer=AffinityTopologyVersion [topVer=3,
minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=e75abdeb-3a9b-4217-8596-dfb764d14a8e, addrs=[127.0.0.1, 172.17.241.80,
172.17.5.36], sockAddrs=[hdpdev6/172.17.5.36:42500, /127.0.0.1:42500,
hdpdev6_oob.nseroot.com/172.17.241.80:42500], discPort=42500, order=1,
intOrder=1, lastExchangeTime=1577451154397, loc=false,
ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], topVer=3,
nodeId8=53126272, msg=null, type=DISCOVERY_CUSTOM_EVT,
tstamp=1577451177925]], nodeId=e75abdeb, evt=DISCOVERY_CUSTOM_EVT]

java.lang.NullPointerException: null
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.<init>(GridDhtLocalPartition.java:224)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.getOrCreatePartition(GridDhtPartitionTopologyImpl.java:853)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:406)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:585)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1473)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:809)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2681)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2553)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)


OrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException:
null]]

org.apache.ignite.IgniteCheckedException: null
        at
org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7432)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:261)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:209)
        at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:160)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2709)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2553)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.<init>(GridDhtLocalPartition.java:224)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.getOrCreatePartition(GridDhtPartitionTopologyImpl.java:853)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:406)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:585)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1473)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:809)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2681)
        ... 3 common frames omitted




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akurbanov akurbanov
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello,

Is it possible to provide full log?

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Did anyone see the logs?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello!

That's weird! I think that ctx.wal() is null, perhaps because some mismatch between persistent and non-persistent regions.

Can you throw together a reproducer project for this issue? I'll surely check it.

Regards,
--
Ilya Kasnacheev


чт, 2 янв. 2020 г. в 09:06, ashishb888 <[hidden email]>:
Did anyone see the logs?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello!

What are steps to reproduce?

I tried to start 2 data nodes, 1 worker node and 1 client, then activated cluster with control.sh. I didn't see any errors.

Have you tried clearing persistence dirs before a run?

Regards,
--
Ilya Kasnacheev


ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Yes, I cleared the directories.

Case 1: Exception occurs worker node and the node get killed
Steps:
    -start data node
    -worker node
    -start data node
    -activate the cluster by client node (or by control script)


Case 2: All nodes comes under baseline topology instead of 2 nodes (data
nodes)
Steps:
    -worker node
    -start data node
    -start data node
    -activate the cluster by client node (or by control script)






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello!

Yes, I think this is a new issue. Can you please file a ticket against Apache Ignite JIRA? I'll add some development details.

As for work-around, please try to stick to some working algorithm (such as, add data nodes, activate, add worker node, do not add it to baseline).

Regards,
--
Ilya Kasnacheev


пн, 13 янв. 2020 г. в 14:26, ashishb888 <[hidden email]>:
Yes, I cleared the directories.

Case 1: Exception occurs worker node and the node get killed
Steps:
    -start data node
    -worker node
    -start data node
    -activate the cluster by client node (or by control script)


Case 2: All nodes comes under baseline topology instead of 2 nodes (data
nodes)
Steps:
    -worker node
    -start data node
    -start data node
    -activate the cluster by client node (or by control script)






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ashishb888 ashishb888
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello Ilya,

Okay, will file a ticket.

BR
Ashish



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Failed to reinitialize local partitions (rebalancing will be stopped)

Hello!

Actually, it turns out you needn't file a ticket, rather you should fix your data region configuration, as seen when your reproducer is ran against 2.8 builds:
Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to join node (Incompatible data region configuration [region=Data_Region, locNodeId=fd66182a-89a6-4443-8f5e-2fba7d4ad36b, isPersistenceEnabled=true, rmtNodeId=16825cee-57cf-454e-94ce-46daad4f6b93, isPersistenceEnabled=false])

You should not have data regions with same name but different persistence flag settings in your cluster, it turns out.


Regards,
--
Ilya Kasnacheev


ср, 15 янв. 2020 г. в 09:51, ashishb888 <[hidden email]>:
Hello Ilya,

Okay, will file a ticket.

BR
Ashish



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/