Block until partition map exchange is complete

classic Classic list List threaded Threaded
11 messages Options
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Block until partition map exchange is complete

Hi Ignite users,

I have 3 nodes running, with a cache with the following configuration:

cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
cacheConfiguration.setBackups(1);
cacheConfiguration.setRebalanceMode(CacheRebalanceMode.SYNC);
cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

E.g. a partitioned cache with 1 backup - so if one of the three nodes goes
down, all the data is still available across the remaining 2 nodes.

I also have some custom code that runs on the current "leader". E.g. the
server code runs some tasks if it is the leader node - defined as being the
"oldest node".
The code running on each server registers a listener for

{EventType.EVT_NODE_SEGMENTED,
EventType.EVT_NODE_FAILED,EventType.EVT_NODE_LEFT}

And if it discovers that it is now the new leader, the tasks restart on the
new "oldest node".

This works fine. The issue I am having is that one of these tasks that runs
on the leader, needs to issue a cache query to do some work.

I am finding, if one of my three nodes drops off, when one of the remaining
two nodes becomes the leader and resumes the work, the records it gets back
from the cache are incomplete. E.g. there may be 400 entries in the cache,
but when node 1 drops off and node 2 takes over - it only sees 250, or some
other number less than 400. A little later, this does correctly return to
400 - I expect because the exchange process has completed behind the scenes
and the node now has all the data it needs.

I am a little surprised by this however, because I am using
CacheRebalanceMode.SYNC which suggests from the docs here:
https://apacheignite.readme.io/docs/rebalancing that

"This means that any call to cache public API will be blocked until
rebalancing is finished."

E.g. if I call ignite.cache("MYCACHE").size() (a public cache method) then
this should not return an incomplete number, but rather block until the
underlying rebalance has completes and then only return 400.

Does anyone have any pointers to what I might be doing wrong here? Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hello!

Do you issue your cache operations from event listener thread? This might be unsafe and also not return the expected results. Event listeners are invoked from internal threads.

Consider issuing a task to public pool from event listener, and then returning. I would expect that task will run when rebalance already takes place.

Regards,
--
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 11:17, ssansoy <[hidden email]>:
Hi Ignite users,

I have 3 nodes running, with a cache with the following configuration:

cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
cacheConfiguration.setBackups(1);
cacheConfiguration.setRebalanceMode(CacheRebalanceMode.SYNC);
cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);

E.g. a partitioned cache with 1 backup - so if one of the three nodes goes
down, all the data is still available across the remaining 2 nodes.

I also have some custom code that runs on the current "leader". E.g. the
server code runs some tasks if it is the leader node - defined as being the
"oldest node".
The code running on each server registers a listener for

{EventType.EVT_NODE_SEGMENTED,
EventType.EVT_NODE_FAILED,EventType.EVT_NODE_LEFT}

And if it discovers that it is now the new leader, the tasks restart on the
new "oldest node".

This works fine. The issue I am having is that one of these tasks that runs
on the leader, needs to issue a cache query to do some work.

I am finding, if one of my three nodes drops off, when one of the remaining
two nodes becomes the leader and resumes the work, the records it gets back
from the cache are incomplete. E.g. there may be 400 entries in the cache,
but when node 1 drops off and node 2 takes over - it only sees 250, or some
other number less than 400. A little later, this does correctly return to
400 - I expect because the exchange process has completed behind the scenes
and the node now has all the data it needs.

I am a little surprised by this however, because I am using
CacheRebalanceMode.SYNC which suggests from the docs here:
https://apacheignite.readme.io/docs/rebalancing that

"This means that any call to cache public API will be blocked until
rebalancing is finished."

E.g. if I call ignite.cache("MYCACHE").size() (a public cache method) then
this should not return an incomplete number, but rather block until the
underlying rebalance has completes and then only return 400.

Does anyone have any pointers to what I might be doing wrong here? Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hi Ilya, thanks for the quick help!
Within the local listen, I am adding a task to an executor - so the cache
operations happen in a different thread. However, is the key thing here that
the local listen handler metho needs to have returned?
E.g. the local listen may not have fully completed by the time the task on
the executor has been started - so perhaps there is a transaction still open
somewhere by the time the cache operation occurs?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hello!

Yes, you need to return from event listener as soon as you can.

Regards,
--
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 12:03, ssansoy <[hidden email]>:
Hi Ilya, thanks for the quick help!
Within the local listen, I am adding a task to an executor - so the cache
operations happen in a different thread. However, is the key thing here that
the local listen handler metho needs to have returned?
E.g. the local listen may not have fully completed by the time the task on
the executor has been started - so perhaps there is a transaction still open
somewhere by the time the cache operation occurs?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Thanks - the issue I have now is how can I confirm that the local listen has
returned before executing my code?
e.g. in the local listen I can set a flag, and then the local listen returns
- but the thread that detects this flag and runs the task could still be
scheduled to run before the local listen has returned.
Is there a callback I can register which is triggered after the local listen
returns so I can guarantee I am executing in the correct order (e.g. after
whatever needs to be committed has been committed)?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hello!

Can you throw together a reproducer project which shows this behavior? I would check.

Regards,
--
Ilya Kasnacheev


пт, 3 июл. 2020 г. в 13:14, ssansoy <[hidden email]>:
Thanks - the issue I have now is how can I confirm that the local listen has
returned before executing my code?
e.g. in the local listen I can set a flag, and then the local listen returns
- but the thread that detects this flag and runs the task could still be
scheduled to run before the local listen has returned.
Is there a callback I can register which is triggered after the local listen
returns so I can guarantee I am executing in the correct order (e.g. after
whatever needs to be committed has been committed)?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hi, the following setup should reproduce the issue:

A server class starts up a server node with the config in my original mail
(eg 3 servers, partitioned with 1 backup). In that class, at the end do
something like:

ignite.events(ignite.cluster().forServers()).localListen(ignitePredicate,
            EventType.EVT_NODE_SEGMENTED, EventType.EVT_NODE_FAILED,
            EventType.EVT_NODE_LEFT);

in that ignitePredicate - "if the current node is the oldest node in the
cluster", then do a ScanQuery on some cache MYCACHE and print out the
records.

The first server to start up will print out all the records.
If you kill the first server, the next oldest server will perform the scan
query upon receiving the NODE_LEFT event and will not print out all the
records (because the localListen runs before the exchange has happened).

Is that enough information?

The issue has gone away now that I have updated my ignitePredicate to merely
set a flag if this node is the oldest, and have a seperate scheduledExecutor
to periodically check that flag, and if true then do the scan query. This
seems to work - probably because there is a sufficient delay before
performing the scan query. My worry is, we could get unlucky with scheduling
and the Scan query could still occur after the flag is set, but before the
locallisten has returned. E.g. ideally it would seem sensible to either
support @IgniteAsyncCallback in this local listen (which hopefully takes
care of the ordering) or have a callback that can be executed after the
localListen has returned (if that is indeed cause of the issue here)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

By the way, just referring back to the original question - is there such a
callback that can be used to wait for the partition exchange to complete, in
any version of ignite? We are using ignite 2.7.6 (which I acknowledge is
slightly behind - but we planning to upgrade)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hello!

I'm not actually sure. Do you have a reproducer where you see decreased count() result? What is PartitionLossPolicy, have you tried tweaking it?

I can see a method in our tests for doing that, and it is very raw: it checks every cache to make sure that all partitions are OWNING. This is org.apache.ignite.testframework.junits.common.GridCommonAbstractTest#awaitPartitionMapExchange(boolean, boolean, java.util.Collection<org.apache.ignite.cluster.ClusterNode>, boolean, java.util.Set<java.lang.String>)

Regards,
--
Ilya Kasnacheev


ср, 15 июл. 2020 г. в 12:03, ssansoy <[hidden email]>:
By the way, just referring back to the original question - is there such a
callback that can be used to wait for the partition exchange to complete, in
any version of ignite? We are using ignite 2.7.6 (which I acknowledge is
slightly behind - but we planning to upgrade)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ssansoy ssansoy
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hi, could the behaviour I have observed be captured by this bug:

https://issues.apache.org/jira/browse/IGNITE-9841

"Note, ScanQuery exhibits the same behavior - returns partial results when
some partitions are lost.  Not sure if solution would be related or needs to
be tracked and fixed under a separate ticket."





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Block until partition map exchange is complete

Hello!

It is supposed to be fixed in 2.8. Did you check that?

Thanks.
--
Ilya Kasnacheev


ср, 22 июл. 2020 г. в 12:24, ssansoy <[hidden email]>:
Hi, could the behaviour I have observed be captured by this bug:

https://issues.apache.org/jira/browse/IGNITE-9841

"Note, ScanQuery exhibits the same behavior - returns partial results when
some partitions are lost.  Not sure if solution would be related or needs to
be tracked and fixed under a separate ticket."





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/