Exception handling for asynchronous backup

classic Classic list List threaded Threaded
3 messages Options
李玉珏@163 李玉珏@163
Reply | Threaded
Open this post in threaded view
|

Exception handling for asynchronous backup

Hi community,

In the case of CacheWriteSynchronizationMode being asynchronous, if the
asynchronous writing of data fails, leading to inconsistency between
primary and backup data, what is the subsequent processing?

Denis Mekhanikov Denis Mekhanikov
Reply | Threaded
Open this post in threaded view
|

Re: Exception handling for asynchronous backup

Hi!

If the cache is transactional, then no inconsistencies are possible, since two-phase commit guarantees, that all nodes have data records of the same version.

In case of an atomic cache, primary node failure can indeed lead to an inconsistency between different versions of the same partitions.

There is a tool called idle_verify, that can validate consistency of data between nodes: https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
You can run it to find copies of the same partition with different state. After that restarting the problematic node or iterating through all entries in the partitions and setting them again will fix the consistency. 
In case of enabled persistence you will need to remove problematic partitions from disk. If you leave one copy, that you believe is valid, then it will be rebalanced to other nodes when they are started again.

Denis
On 30 Aug 2019, 04:42 +0300, liyuj <[hidden email]>, wrote:
Hi community,

In the case of CacheWriteSynchronizationMode being asynchronous, if the
asynchronous writing of data fails, leading to inconsistency between
primary and backup data, what is the subsequent processing?

李玉珏@163 李玉珏@163
Reply | Threaded
Open this post in threaded view
|

Re: Exception handling for asynchronous backup

Thank you very much for your reply!

在 2019/8/30 下午3:15, Denis Mekhanikov 写道:
Hi!

If the cache is transactional, then no inconsistencies are possible, since two-phase commit guarantees, that all nodes have data records of the same version.

In case of an atomic cache, primary node failure can indeed lead to an inconsistency between different versions of the same partitions.

There is a tool called idle_verify, that can validate consistency of data between nodes: https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
You can run it to find copies of the same partition with different state. After that restarting the problematic node or iterating through all entries in the partitions and setting them again will fix the consistency. 
In case of enabled persistence you will need to remove problematic partitions from disk. If you leave one copy, that you believe is valid, then it will be rebalanced to other nodes when they are started again.

Denis
On 30 Aug 2019, 04:42 +0300, liyuj [hidden email], wrote:
Hi community,

In the case of CacheWriteSynchronizationMode being asynchronous, if the
asynchronous writing of data fails, leading to inconsistency between
primary and backup data, what is the subsequent processing?