How do I know the cache rebalance is finished?

classic Classic list List threaded Threaded
20 messages Options
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

How do I know the cache rebalance is finished?

Hi,

I was using REBALANCE_STOPPED event to recognize which primary partitions are assigned to local with Ignite v1.5.
As the REBALANCE_STOPPED event occurs a couple of times when a server node has been failed,
I made my code to check discoveryEvent of received events because I want my code be executed only once per a node failed or joined.

It worked well but after upgrade to Ignite v1.6 my code does not perform correctly.
I found the partitions are not rebalanced yet, although the event listener received a REBALANCE_STOPPED event.

Here is my code.
How do I fix my code? I need your advice.

-------------------------------------------------------------------------------------------------------------
                Ignite ignite = IgniteUtil.getIgnite();
               
                IgnitePredicate<EventAdapter> locLsnr = new IgnitePredicate<EventAdapter>() {
                        @Override
                        public boolean apply(EventAdapter evt) {
                                Boolean isClient = null;

                                switch (evt.type()) {
                                case EventType.EVT_CACHE_REBALANCE_STOPPED:
                                        if (evt instanceof CacheRebalancingEvent) {
                                                CacheRebalancingEvent cacheEvt = (CacheRebalancingEvent) evt;
                                                isClient = cacheEvt.discoveryNode().attribute(IgniteNodeAttributes.ATTR_CLIENT_MODE);
                                                if (isClient != null && !isClient
                                                                && cacheEvt.cacheName().equals(CacheConfig.messageCache().getName())
                                                                && (cacheEvt.discoveryEventType() == EventType.EVT_NODE_JOINED
                                                                                || cacheEvt.discoveryEventType() == EventType.EVT_NODE_LEFT
                                                                                || cacheEvt.discoveryEventType() == EventType.EVT_NODE_FAILED)) {
                                                       
                                                        //Rebalance finished
                                                        Affinity aff = ignite.affinity(CacheConfig.messageCache().getName());
                                                        int[] partitions = aff.primaryPartitions(ignite.cluster().localNode());
                                                       
                                                }
                                        }
                                        break;
                                default:
                                        break;
                                }

                                return true;
                        }
        };
       
        ignite.events().localListen(locLsnr, EventType.EVT_CACHE_REBALANCE_STOPPED);
-------------------------------------------------------------------------------------------------------------

Sincerely,
Andrew.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi,

There were some changes in affinity assignment that could cause the difference in behavior, but actually rebalancing in a separate process and there was never a guarantee that partitions will be assigned prior to EVT_CACHE_REBALANCE_STOPPED is fired.

Can you provide more details about your use case? What are you trying to achieve?

-Val
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Thanks, Val.

I want make each server node execute the cache data only for local primary partitions.
If a node failed or joined then local primary partitions start changing.
I made the each server node holds processing during cache rebalance, after cache rebalance is finished, check the changed local primary partitions then starts new threads for newly added partitions and sweep objects out related to missed partitions.

So, I wish I could detect the point of moment the cache rebalance has been finished.
Is there any way for me to achieve it?

Sincerely,
Andrew.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi Andrew,

Can you clarify what you mean by "execute the cache data"?

Currently we're working on enhancement in the compute grid, which will allow to execute computations using affinityRun and affinityCall methods having a guarantee that the partition is locked and the data is always local. See [1] for details. Is this something that can help you?

[1] https://issues.apache.org/jira/browse/IGNITE-2310

-Val
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi, val.

The "execute the cache data" means that my application executes a data by getting from grid cache.
I had been considered to use compute grid with my application but it was not a proper solution in case of my application.
Of course I can check it again because the Ignite is improving, but it takes time. I have to re-design whole my application.

I guess the meaning of the 'EVT_CACHE_REBALANCE_STOPPED' event is that all partitions are rebalanced.
If I am right, could you make the event be triggered after the rebalance is finished?
If I am wrong, could you make another event something like 'EVT_CACHE_REBALANCE_FINISHED'?
If neither can be acceptable then,
I just want to know the way how I can detect the cache rebalance is finished.

Sincerely,
Andrew.

P.S. I'm not a native English user and unfamiliar with your culture, I'm not sure I've wrote rudely.
If I did, that's not I've intended to.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi Andrew,

Can you please describe your use case in more detail? If you just get the data from cache, why do you care about rebalancing? This is a background process and Ignite guarantees consistency even if you read or write data concurrently with this process.

-Val
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

OK.

I'm making a system that detects fault from continuous sensor data message.

A Ignite client node named 'Distributor' receives messages from senders and distributes messages to Ignite server nodes by using Ignite (ordered) messaging. Each senders have their own ID(Integer) so they sends ID attached messages. I'm using the sender's ID as a partition number of Grid cache.
I'm using each server node's UUID as a messaging topic and the Distributor decides a topic for message by checking which server node has the primary local partitions(= sender's ID).

If server node receives message they are queuing messages to parse and execute, therefore a server node fails, queued messages are missing.
My customer needs for my system to guarantee no missing message, the Distributor also puts messages to Grid cache by using the IgniteDataStreamer for backup.
If a node fails then the cache rebalance will occur, each server node checks their local primary partitions  again for what partitions are added to them, then recover missed messages by getting from Grid cache only for locally added primary partitions. At the same time, the Distributor detects server node's primary partitions are changed then remap the destination topics for sender's ID(=partition number).

Actually, getting from grid cache is needed only for a time of fail-over.
I just need that server nodes can detect which partitions are added as local primary partition to get messages from grid cache with affinity(partition number) and the Distributor can map sender's ID to right topic.

This is my story,
Anyway,
The thing I want to ask you is why the Ignite event 'EVT_CACHE_REBALANCE_STOPPED' is triggered although the actual rebalance is not finished yet. What does this event stands for?
The another events like 'EVT_TASK_FINISHED', 'EVT_CACHE_OBJECT_PUT', 'EVT_CACHE_OBJECT_REMOVED' are working well as they named.

Sincerely,
Andrew.
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi, igniters.

I've been waiting the answer but there is no reply.
Is anybody who can answer my question here?

Sincerely,
Andrew.
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi Andrew,

Thanks for reminding regarding this. It may happen that a message is “missed” because community members don’t have time to answer on all the questions.

In regards to your use case I would implement it int the following way.

1) Put all the messages into a special cache (let’s call it “messages” cache);

2) each server node will implement a CacheInterceptor’s [1] “onAfterPut” method;

3) when the new message arrives to the ‘Distributor’ it puts it into the “messages” cache;

4) after the new message is inserted CacheInterceptor.onAfterPut will be triggered on a primary and backup. Only primary node will move the message to its internal queue for further processing, backup node will ignore the update;

5) when time to process the message has come the primary node changes message’s state to “processing” in the cache and sets its UUID to message’s “processingNodeUUID” field (also storing this in the cache). Only after that the primary starts processing the message;

6) when the message is processed the message is deleted from the cache by primary.

7) if the primary fails somewhere between 5) and 6) then the rest of the nodes will process NODE_LEFT_EVENT and will execute SQL query like “SELECT * FROM messages WHERE state != processing and processingNodeUUID != failedNodeUUID”. After that the node will iterate over the result set and will process only those messages for which it’s primary according to the new topology version.

Finally, answering on your additional questions

The thing I want to ask you is why the Ignite event
'EVT_CACHE_REBALANCE_STOPPED' is triggered although the actual rebalance is
not finished yet. What does this event stands for?


This event is fired by each server node where the data is being rebalanced. After a server node preloads all the partitions that are assigned to it, it fires this event. The main point is that every server node that will be preloading partitions as a part of rebalancing process will fire this event.

Denis

[1] https://ignite.apache.org/releases/1.6.0/javadoc/org/apache/ignite/cache/CacheInterceptor.html
On Jun 15, 2016, at 5:23 AM, Andrew <[hidden email]> wrote:

Hi, igniters.

I've been waiting the answer but there is no reply.
Is anybody who can answer my question here?

Sincerely,
Andrew.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-do-I-know-the-cache-rebalance-is-finished-tp5219p5634.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi, Denis.

Thanks for your suggestion.
But that way is already I tried before.
Cache API's put method can not satisfy the performance and DataStreamer can not keep messages ordered, So I chose the IgniteMessaging(someone of igniters recommended to me a couple of months ago).
Anyway, 7th sequence of your suggestion, when should I trigger SQL query like “SELECT * FROM messages WHERE state != processing and processingNodeUUID != failedNodeUUID” ?
I used the event 'EVT_CACHE_REBALANCE_STOPPED' as a trigger to check primary partition,
it worked well on Ignite v1.5 but not any more with Ignite v1.6.
That's why I registered this issue.

Is there any workaround to be notified rebalance is really finished?
There is a log like
"[10:27:05,721][INFO ][sys-#24%null%][GridDhtPartitionDemander] <ON_HEAP_CACHE> Completed (final) rebalancing [cache=ON_HEAP_CACHE, fromNode=bdd66bca-ad0d-498a-9837-205cba15a91e, topology=AffinityTopologyVersion [topVer=11, minorTopVer=0], time=15 ms]"
on ignite log.


Sincerely,
Andrew.
Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Could this be https://issues.apache.org/jira/browse/IGNITE-3305 ?

Kristian

16. jun. 2016 04.59 skrev "Andrew" <[hidden email]>:
Hi, Denis.

Thanks for your suggestion.
But that way is already I tried before.
Cache API's put method can not satisfy the performance and DataStreamer can
not keep messages ordered, So I chose the IgniteMessaging(someone of
igniters recommended to me a couple of months ago).
Anyway, 7th sequence of your suggestion, when should I trigger SQL query
like “SELECT * FROM messages WHERE state != processing and
processingNodeUUID != failedNodeUUID” ?
I used the event 'EVT_CACHE_REBALANCE_STOPPED' as a trigger to check primary
partition,
it worked well on Ignite v1.5 but not any more with Ignite v1.6.
That's why I registered this issue.

Is there any workaround to be notified rebalance is really finished?
There is a log like
"[10:27:05,721][INFO ][sys-#24%null%][GridDhtPartitionDemander]
<ON_HEAP_CACHE> /*Completed (final) rebalancing*/ [cache=ON_HEAP_CACHE,
fromNode=bdd66bca-ad0d-498a-9837-205cba15a91e,
topology=AffinityTopologyVersion [topVer=11, minorTopVer=0], time=15 ms]"
on ignite log.


Sincerely,
Andrew.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-do-I-know-the-cache-rebalance-is-finished-tp5219p5658.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

In reply to this post by Andrew
Hi Andrew,

> Cache API's put method can not satisfy the performance and DataStreamer can
> not keep messages ordered, So I chose the IgniteMessaging(someone of
> igniters recommended to me a couple of months ago).

You should keep in mind that IgniteMessaging.sendOrdered is not synchronized meaning that if you execute this method from several Threads on the same node then the order is undefined. Also on a receiver side message are ordered per sender meaning if you send ordered messages of the same topic but from different nodes the messages will be ordered by sender on the receiver side.

> Anyway, 7th sequence of your suggestion, when should I trigger SQL query
> like “SELECT * FROM messages WHERE state != processing and
> processingNodeUUID != failedNodeUUID” ?

You should trigger it when topology changes. Every node should subscribe for EventType.EVT_NODE_FAILED and EventType.EVT_NODE_LEFT execute the query when on of these events happen.

> I used the event 'EVT_CACHE_REBALANCE_STOPPED' as a trigger to check primary
> partition,
> it worked well on Ignite v1.5 but not any more with Ignite v1.6.
> That's why I registered this issue.

Could you please elaborate a bit more on what you mean under ‘it worked well’? Presently as I understand from your previous reply this event is triggered before the rebalancing happens. Could you prepare a test/reproducer for us so that we can take a look what exactly happens on your side.


Denis

> On Jun 16, 2016, at 5:43 AM, Andrew <[hidden email]> wrote:
>
> Hi, Denis.
>
> Thanks for your suggestion.
> But that way is already I tried before.
> Cache API's put method can not satisfy the performance and DataStreamer can
> not keep messages ordered, So I chose the IgniteMessaging(someone of
> igniters recommended to me a couple of months ago).
> Anyway, 7th sequence of your suggestion, when should I trigger SQL query
> like “SELECT * FROM messages WHERE state != processing and
> processingNodeUUID != failedNodeUUID” ?
> I used the event 'EVT_CACHE_REBALANCE_STOPPED' as a trigger to check primary
> partition,
> it worked well on Ignite v1.5 but not any more with Ignite v1.6.
> That's why I registered this issue.
>
> Is there any workaround to be notified rebalance is really finished?
> There is a log like
> "[10:27:05,721][INFO ][sys-#24%null%][GridDhtPartitionDemander]
> <ON_HEAP_CACHE> /*Completed (final) rebalancing*/ [cache=ON_HEAP_CACHE,
> fromNode=bdd66bca-ad0d-498a-9837-205cba15a91e,
> topology=AffinityTopologyVersion [topVer=11, minorTopVer=0], time=15 ms]"
> on ignite log.
>
>
> Sincerely,
> Andrew.
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-do-I-know-the-cache-rebalance-is-finished-tp5219p5658.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

In reply to this post by Kristian Rosenvold
Thank you, Kristian.

I think this issue is related to me.
Thanks for good information, again.

Sincerely,
Andrew.
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

In reply to this post by Denis Magda
Hi, Denis.

Here is a sample code.
Whenever the 'EVT_CACHE_REBALANCE_STOPPED' event is sent, server nodes prints their primary partitions to console.
Or you can print current primary partitions to console by pressing enter key.
In this case, it only happens when a new node is joined then it prints wrong partitions. I mean list of old partitions.
But in my project, it also happens when a node has been left.
It is tested with 'example-default.xml' configuration file.

RebalanceTest.java

Sincerely,
Andrew.
Andrew Andrew
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hello, Denis.

Did you check my sample code?

And I have another question.
On testing with my sample code above,
when a new node is joined, as I don't complete executing received event, ignite does not finish rebalance.
For example,
If I put a line of code to sample test code which I attached before like below, then ignite does not complete rebalancing(more exactly, getting affinity returns old partition information) until the event listener thread wakes up.

I expected the event listener thread is a separated behavior against rebalance process.
So, I'm confused now.
Am I thinking in a wrong way?
I need some advice.


=====================================
        if (isClient != null && !isClient
                && cacheEvt.cacheName().equals(CACHE_NAME)
                && (cacheEvt.discoveryEventType() == EventType.EVT_NODE_JOINED
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_LEFT
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_FAILED)) {
       
                //Rebalance has been finished?
                System.out.println("Rebalance stopped.");

               

  //Put thread sleep
                Thread.sleep(10000);


                Affinity aff = ignite.affinity(CACHE_NAME);
                int[] partitions = aff.primaryPartitions(ignite.cluster().localNode());
                System.out.println("Current partitions : " + Arrays.toString(partitions));

===================================

Sincerely,
Andrew

vdpyatkov vdpyatkov
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hello, Andrew.

You are right.

You can track issue in the JIRA ticket.

On Fri, Jun 24, 2016 at 7:34 AM, Andrew <[hidden email]> wrote:
Hello, Denis.

Did you check my sample code?

And I have another question.
On testing with my sample code above,
when a new node is joined, as I don't complete executing received event,
ignite does not finish rebalance.
For example,
If I put a line of code to sample test code which I attached before like
below, then ignite does not complete rebalancing(more exactly, getting
affinity returns old partition information) until the event listener thread
wakes up.

I expected the event listener thread is a separated behavior against
rebalance process.
So, I'm confused now.
Am I thinking in a wrong way?
I need some advice.


=====================================
        if (isClient != null && !isClient
                && cacheEvt.cacheName().equals(CACHE_NAME)
                && (cacheEvt.discoveryEventType() == EventType.EVT_NODE_JOINED
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_LEFT
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_FAILED)) {

                //Rebalance has been finished?
                System.out.println("Rebalance stopped.");


/*  //Put thread sleep
                Thread.sleep(10000);*/


                Affinity aff = ignite.affinity(CACHE_NAME);
                int[] partitions = aff.primaryPartitions(ignite.cluster().localNode());
                System.out.println("Current partitions : " + Arrays.toString(partitions));

===================================

Sincerely,
Andrew





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-do-I-know-the-cache-rebalance-is-finished-tp5219p5853.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Vladislav Pyatkov
Kamal Kamal
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Bumping up this thread. I've a similar requirement. What is the outcome of this thread ?

My cache re-balance mode is Synchronous. How to know the cache re-balance is completed ?

-- Kamal

On Fri, Jun 24, 2016 at 2:25 PM, Vladislav Pyatkov <[hidden email]> wrote:
Hello, Andrew.

You are right.

You can track issue in the JIRA ticket.

On Fri, Jun 24, 2016 at 7:34 AM, Andrew <[hidden email]> wrote:
Hello, Denis.

Did you check my sample code?

And I have another question.
On testing with my sample code above,
when a new node is joined, as I don't complete executing received event,
ignite does not finish rebalance.
For example,
If I put a line of code to sample test code which I attached before like
below, then ignite does not complete rebalancing(more exactly, getting
affinity returns old partition information) until the event listener thread
wakes up.

I expected the event listener thread is a separated behavior against
rebalance process.
So, I'm confused now.
Am I thinking in a wrong way?
I need some advice.


=====================================
        if (isClient != null && !isClient
                && cacheEvt.cacheName().equals(CACHE_NAME)
                && (cacheEvt.discoveryEventType() == EventType.EVT_NODE_JOINED
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_LEFT
                        || cacheEvt.discoveryEventType() == EventType.EVT_NODE_FAILED)) {

                //Rebalance has been finished?
                System.out.println("Rebalance stopped.");


/*  //Put thread sleep
                Thread.sleep(10000);*/


                Affinity aff = ignite.affinity(CACHE_NAME);
                int[] partitions = aff.primaryPartitions(ignite.cluster().localNode());
                System.out.println("Current partitions : " + Arrays.toString(partitions));

===================================

Sincerely,
Andrew





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-do-I-know-the-cache-rebalance-is-finished-tp5219p5853.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Vladislav Pyatkov

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Kamal,

From what I see, the ticket is still Open but it's assigned. I recommend to ask Semen on the dev@ list about the status.

-Val
Humphrey Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

In reply to this post by vdpyatkov
Rebouncing this topic, the ticket is still open (almost 4 years).
Any progress / priority to this ticket or work around?
https://issues.apache.org/jira/browse/IGNITE-3362.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
vdpyatkov vdpyatkov
Reply | Threaded
Open this post in threaded view
|

Re: How do I know the cache rebalance is finished?

Hi,
I think it is not a priority issue, because in general it is right. EVT_CACHE_REBALANCE_STOPPED event is received when all data loaded to a node, but switch of affinity happens after all cache will be rebalanced.
At first, why do you need to know, when affinity change after rebalance? In my point of view, rebalance is a process which not influence on user load.
Another point, you can wait all caches that are rebalancing and be sure all data was transferred.

In log you can see messages:

Rebalancing scheduled [order=[ignite-sys-cache, ON_HEAP_CACHE], top=AffinityTopologyVersion [topVer=2, minorTopVer=0], rebalanceId=1, evt=NODE_JOINED, node=8138d15d-1606-4eb1-8359-d5637d500002]

This means: ignite-sys-cache will rebalance first and ON_HEAP_CACHE after.

After all future completed

Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=ignite-sys-cache] ...

Here a code receive a message about rebalance stopped on ignite-sys-cache.

Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=ON_HEAP_CACHE] ...

Here rebalance stopped on ON_HEAP_CACHE.

You will see a topology switch on minor version

Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]
...
Completed partition exchange [localNode=8138d15d-1606-4eb1-8359-d5637d500002, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]...

And only after this exchange completed you can see a new primary partition in joined node.

It is what happens now.
I really don’t know how to change this behavior that it will more convenient to user.
If you still have use case where needs to know exactly moment of switching affinity, could you move this discussion to developer list?
I hope developers can help us.

On 2020/07/08 21:21:37, Humphrey <[hidden email]> wrote:
> Rebouncing this topic, the ticket is still open (almost 4 years).
> Any progress / priority to this ticket or work around?
> https://issues.apache.org/jira/browse/IGNITE-3362.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>