State of initially started cache with CacheRebalanceMode.SYNC ?

classic Classic list List threaded Threaded
12 messages Options
Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

State of initially started cache with CacheRebalanceMode.SYNC ?

The javadoc on CacheRebalanceMode.SYNC seems to indicate that the
cache should block until rebalancing is complete.
When I run the code below, the assert statement fails unless I add the
explicit call to cache.rebalance().get().

Am I doing something wrong ?

Kristian


CacheConfiguration config = new CacheConfiguration();
....
config.setCacheMode(CacheMode.REPLICATED);
config.setRebalanceMode(CacheRebalanceMode.SYNC);
final IgniteCache cache = ignite.getOrCreateCache(config);
// cache.rebalance().get();
assertThat(cache.localSize(CachePeekMode.ALL)).isEqualTo(knownRemoteSize);
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Hi Kristian,

This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.

In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.


Denis

> On Jun 9, 2016, at 3:28 PM, Kristian Rosenvold <[hidden email]> wrote:
>
> The javadoc on CacheRebalanceMode.SYNC seems to indicate that the
> cache should block until rebalancing is complete.
> When I run the code below, the assert statement fails unless I add the
> explicit call to cache.rebalance().get().
>
> Am I doing something wrong ?
>
> Kristian
>
>
> CacheConfiguration config = new CacheConfiguration();
> ....
> config.setCacheMode(CacheMode.REPLICATED);
> config.setRebalanceMode(CacheRebalanceMode.SYNC);
> final IgniteCache cache = ignite.getOrCreateCache(config);
> // cache.rebalance().get();
> assertThat(cache.localSize(CachePeekMode.ALL)).isEqualTo(knownRemoteSize);

Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

2016-06-13 9:14 GMT+02:00 Denis Magda <[hidden email]>:
> This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.
>
> In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.

Unfortunately it does not seem to work that way. I was also
considering that it might only apply to put/get operations, so I tried
adding a bogus "get" for a non-existing member to see if it would then
exhibit any kind of blocking behaviour (and hence make my assert pass
100% of the time). This does not seem to be the case either. Running
an explicit rebalance seems to do the trick. This would appear to be a
bug unless I misunderstand something.

Kristian
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Kristian,

How many nodes do you have in the cluster? If there are more than two server nodes then there shouldn’t be any blocking because while rebalancing is happening on one node the other node can accept and fulfill cache related operations. The main point is that the cluster won’t stuck until data is being rebalanced on some node.


Denis

> On Jun 13, 2016, at 10:51 AM, Kristian Rosenvold <[hidden email]> wrote:
>
> 2016-06-13 9:14 GMT+02:00 Denis Magda <[hidden email]>:
>> This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.
>>
>> In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.
>
> Unfortunately it does not seem to work that way. I was also
> considering that it might only apply to put/get operations, so I tried
> adding a bogus "get" for a non-existing member to see if it would then
> exhibit any kind of blocking behaviour (and hence make my assert pass
> 100% of the time). This does not seem to be the case either. Running
> an explicit rebalance seems to do the trick. This would appear to be a
> bug unless I misunderstand something.
>
> Kristian

Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

This is a replicated cache and I see the unexpected behaviour with 2
nodes. All I'm trying to do is to make sure the newly started server
is not processing requests before its cache is fully populated. It
seems to me you're saying the "get" request will actually be served by
the other node before rebalancing is complete ?

Kristian


2016-06-13 9:55 GMT+02:00 Denis Magda <[hidden email]>:

> Kristian,
>
> How many nodes do you have in the cluster? If there are more than two server nodes then there shouldn’t be any blocking because while rebalancing is happening on one node the other node can accept and fulfill cache related operations. The main point is that the cluster won’t stuck until data is being rebalanced on some node.
>
> —
> Denis
>
>> On Jun 13, 2016, at 10:51 AM, Kristian Rosenvold <[hidden email]> wrote:
>>
>> 2016-06-13 9:14 GMT+02:00 Denis Magda <[hidden email]>:
>>> This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.
>>>
>>> In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.
>>
>> Unfortunately it does not seem to work that way. I was also
>> considering that it might only apply to put/get operations, so I tried
>> adding a bogus "get" for a non-existing member to see if it would then
>> exhibit any kind of blocking behaviour (and hence make my assert pass
>> 100% of the time). This does not seem to be the case either. Running
>> an explicit rebalance seems to do the trick. This would appear to be a
>> bug unless I misunderstand something.
>>
>> Kristian
>
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Yes, the newly started node won’t be considered for cache related operations until the rebalancing has finished. The rest of the nodes will be processing all the cache related operations like there is no new node at all.


Denis

> On Jun 13, 2016, at 10:59 AM, Kristian Rosenvold <[hidden email]> wrote:
>
> This is a replicated cache and I see the unexpected behaviour with 2
> nodes. All I'm trying to do is to make sure the newly started server
> is not processing requests before its cache is fully populated. It
> seems to me you're saying the "get" request will actually be served by
> the other node before rebalancing is complete ?
>
> Kristian
>
>
> 2016-06-13 9:55 GMT+02:00 Denis Magda <[hidden email]>:
>> Kristian,
>>
>> How many nodes do you have in the cluster? If there are more than two server nodes then there shouldn’t be any blocking because while rebalancing is happening on one node the other node can accept and fulfill cache related operations. The main point is that the cluster won’t stuck until data is being rebalanced on some node.
>>
>> —
>> Denis
>>
>>> On Jun 13, 2016, at 10:51 AM, Kristian Rosenvold <[hidden email]> wrote:
>>>
>>> 2016-06-13 9:14 GMT+02:00 Denis Magda <[hidden email]>:
>>>> This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.
>>>>
>>>> In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.
>>>
>>> Unfortunately it does not seem to work that way. I was also
>>> considering that it might only apply to put/get operations, so I tried
>>> adding a bogus "get" for a non-existing member to see if it would then
>>> exhibit any kind of blocking behaviour (and hence make my assert pass
>>> 100% of the time). This does not seem to be the case either. Running
>>> an explicit rebalance seems to do the trick. This would appear to be a
>>> bug unless I misunderstand something.
>>>
>>> Kristian
>>

Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

> Yes, the newly started node won’t be considered for cache related operations until the rebalancing has finished. The rest of the nodes will be processing all the cache related operations like there is no new node at all.

Sweet ! I'm not really sure if I missed this in the documentation or
it needs to be added....

Thanks a lot,

Kristian



2016-06-13 10:02 GMT+02:00 Denis Magda <[hidden email]>:

>
> —
> Denis
>
>> On Jun 13, 2016, at 10:59 AM, Kristian Rosenvold <[hidden email]> wrote:
>>
>> This is a replicated cache and I see the unexpected behaviour with 2
>> nodes. All I'm trying to do is to make sure the newly started server
>> is not processing requests before its cache is fully populated. It
>> seems to me you're saying the "get" request will actually be served by
>> the other node before rebalancing is complete ?
>>
>> Kristian
>>
>>
>> 2016-06-13 9:55 GMT+02:00 Denis Magda <[hidden email]>:
>>> Kristian,
>>>
>>> How many nodes do you have in the cluster? If there are more than two server nodes then there shouldn’t be any blocking because while rebalancing is happening on one node the other node can accept and fulfill cache related operations. The main point is that the cluster won’t stuck until data is being rebalanced on some node.
>>>
>>> —
>>> Denis
>>>
>>>> On Jun 13, 2016, at 10:51 AM, Kristian Rosenvold <[hidden email]> wrote:
>>>>
>>>> 2016-06-13 9:14 GMT+02:00 Denis Magda <[hidden email]>:
>>>>> This property means that a node that is being started and where a part of cache data is being rebalanced won’t be considered for any cache related operations until the rebalancing has finished.
>>>>>
>>>>> In my understanding such a node won’t be considered for cache operations like put, get, etc. until all the data is fully rebalanced on it. However this doesn’t prevent from getting a cache reference on this node and ask for current cache size.
>>>>
>>>> Unfortunately it does not seem to work that way. I was also
>>>> considering that it might only apply to put/get operations, so I tried
>>>> adding a bogus "get" for a non-existing member to see if it would then
>>>> exhibit any kind of blocking behaviour (and hence make my assert pass
>>>> 100% of the time). This does not seem to be the case either. Running
>>>> an explicit rebalance seems to do the trick. This would appear to be a
>>>> bug unless I misunderstand something.
>>>>
>>>> Kristian
>>>
>
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Kristian,

I am a little bit confused by the example you provided in your first e-mail. From the code I see that you create a cache dynamically by calling getOrCreateCache, and the next line asserts that cache size is equal to a knownRemoteCacheSize. This does not make sense to me because cache creation is a distributed operation and it is created on all nodes at once. So either a cache was created by this call and it's size is equal to zero, or it was created prior to this call and cache size must be the same on all nodes in SYNC mode.

More specifically, SYNC rebalance mode means that Ignition.start() and all public Cache API calls will be blocked until after rebalancing for such a cache is finished.

--AG
Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Alexey,

we were discussing what was happening in the 10-20 seconds while the
cache was being replicated, to find out if any inconsistencies could
occur in this window. So I started a first node with a known number of
elements, say 1 million. The testcase I showed in the first code was
then started as a second replicated node, also with SYNC mode. The
assertion failed miserably, and neither did a "get" operation block.
But I did not check what Denis said, that a "get" of an element
existing in the other node would simply be satisfied by the remote; I
only did a get for a non-existant element. And the size grew in the
10-20 seonds after starting the cache until it reached 1 million.

>So either a cache was created by this call and it's size is equal to zero, or it was created prior to this call >and cache size must be the same on all nodes in SYNC mode.

This is the latter  case, but it does not appear to behave the way you
describe. So the second node is basically *started* when the first
node is up and running with its 1 million cache nodes. The only way I
could ensure the consistency (I think) you're describing was by doing
an explicit call to cache.rebalance().get() on the new node.

Kristian


2016-06-13 20:03 GMT+02:00 Alexey Goncharuk <[hidden email]>:

> Kristian,
>
> I am a little bit confused by the example you provided in your first e-mail.
> From the code I see that you create a cache dynamically by calling
> getOrCreateCache, and the next line asserts that cache size is equal to a
> knownRemoteCacheSize. This does not make sense to me because cache creation
> is a distributed operation and it is created on all nodes at once. So either
> a cache was created by this call and it's size is equal to zero, or it was
> created prior to this call and cache size must be the same on all nodes in
> SYNC mode.
>
> More specifically, SYNC rebalance mode means that Ignition.start() and all
> public Cache API calls will be blocked until after rebalancing for such a
> cache is finished.
>
> --AG
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Kristian,

Got it, thank you for reporting this. Your expectations are correct, there is an issue with handling of SYNC rebalance mode for dynamically started cache. If your cache was set in IgniteConfiguration, you would get correct size without calling rebalance().get();

I created an issue [1] with a fix and currently waiting for CI results. If all ok, the fix will be merged to master shortly.


--AG

2016-06-13 11:38 GMT-07:00 Kristian Rosenvold <[hidden email]>:
Alexey,

we were discussing what was happening in the 10-20 seconds while the
cache was being replicated, to find out if any inconsistencies could
occur in this window. So I started a first node with a known number of
elements, say 1 million. The testcase I showed in the first code was
then started as a second replicated node, also with SYNC mode. The
assertion failed miserably, and neither did a "get" operation block.
But I did not check what Denis said, that a "get" of an element
existing in the other node would simply be satisfied by the remote; I
only did a get for a non-existant element. And the size grew in the
10-20 seonds after starting the cache until it reached 1 million.

>So either a cache was created by this call and it's size is equal to zero, or it was created prior to this call >and cache size must be the same on all nodes in SYNC mode.

This is the latter  case, but it does not appear to behave the way you
describe. So the second node is basically *started* when the first
node is up and running with its 1 million cache nodes. The only way I
could ensure the consistency (I think) you're describing was by doing
an explicit call to cache.rebalance().get() on the new node.

Kristian


2016-06-13 20:03 GMT+02:00 Alexey Goncharuk <[hidden email]>:
> Kristian,
>
> I am a little bit confused by the example you provided in your first e-mail.
> From the code I see that you create a cache dynamically by calling
> getOrCreateCache, and the next line asserts that cache size is equal to a
> knownRemoteCacheSize. This does not make sense to me because cache creation
> is a distributed operation and it is created on all nodes at once. So either
> a cache was created by this call and it's size is equal to zero, or it was
> created prior to this call and cache size must be the same on all nodes in
> SYNC mode.
>
> More specifically, SYNC rebalance mode means that Ignition.start() and all
> public Cache API calls will be blocked until after rebalancing for such a
> cache is finished.
>
> --AG

Kristian Rosenvold Kristian Rosenvold
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Great stuff ! I looked at your branch and the fix and the testcases
look really neat :)

This community rocks !

Kristian


2016-06-14 2:57 GMT+02:00 Alexey Goncharuk <[hidden email]>:

> Kristian,
>
> Got it, thank you for reporting this. Your expectations are correct, there
> is an issue with handling of SYNC rebalance mode for dynamically started
> cache. If your cache was set in IgniteConfiguration, you would get correct
> size without calling rebalance().get();
>
> I created an issue [1] with a fix and currently waiting for CI results. If
> all ok, the fix will be merged to master shortly.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3305
>
> --AG
>
> 2016-06-13 11:38 GMT-07:00 Kristian Rosenvold <[hidden email]>:
>>
>> Alexey,
>>
>> we were discussing what was happening in the 10-20 seconds while the
>> cache was being replicated, to find out if any inconsistencies could
>> occur in this window. So I started a first node with a known number of
>> elements, say 1 million. The testcase I showed in the first code was
>> then started as a second replicated node, also with SYNC mode. The
>> assertion failed miserably, and neither did a "get" operation block.
>> But I did not check what Denis said, that a "get" of an element
>> existing in the other node would simply be satisfied by the remote; I
>> only did a get for a non-existant element. And the size grew in the
>> 10-20 seonds after starting the cache until it reached 1 million.
>>
>> >So either a cache was created by this call and it's size is equal to
>> > zero, or it was created prior to this call >and cache size must be the same
>> > on all nodes in SYNC mode.
>>
>> This is the latter  case, but it does not appear to behave the way you
>> describe. So the second node is basically *started* when the first
>> node is up and running with its 1 million cache nodes. The only way I
>> could ensure the consistency (I think) you're describing was by doing
>> an explicit call to cache.rebalance().get() on the new node.
>>
>> Kristian
>>
>>
>> 2016-06-13 20:03 GMT+02:00 Alexey Goncharuk <[hidden email]>:
>> > Kristian,
>> >
>> > I am a little bit confused by the example you provided in your first
>> > e-mail.
>> > From the code I see that you create a cache dynamically by calling
>> > getOrCreateCache, and the next line asserts that cache size is equal to
>> > a
>> > knownRemoteCacheSize. This does not make sense to me because cache
>> > creation
>> > is a distributed operation and it is created on all nodes at once. So
>> > either
>> > a cache was created by this call and it's size is equal to zero, or it
>> > was
>> > created prior to this call and cache size must be the same on all nodes
>> > in
>> > SYNC mode.
>> >
>> > More specifically, SYNC rebalance mode means that Ignition.start() and
>> > all
>> > public Cache API calls will be blocked until after rebalancing for such
>> > a
>> > cache is finished.
>> >
>> > --AG
>
>
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: State of initially started cache with CacheRebalanceMode.SYNC ?

Kristian,

Just letting you know - I've merged the fix to master branch.