Intermittent "Partition states validation has failed for group" issues

classic Classic list List threaded Threaded
4 messages Options
gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

Intermittent "Partition states validation has failed for group" issues

In my otherwise stably running grid (on 2.7.5) I sometimes see intermittent GridDhtPartitionsExchangeFuture warning. This warning the occurs periodically and then goes away after some time. I couldn't find any documentation or other threads about this warning and its implications.
* What is the trigger for this warning?
* What are the implications?
* Is there any recommendation around fixing this issue?




2019-10-21 16:09:44.378 [WARN ] [sys-#26240] GridDhtPartitionsExchangeFuture - Partition states validation has failed for group: mainCache. Partitions cache sizes are inconsistent for Part 0: [id-dgcasp-ob-398-csp-drp-ny-1=43417 id-dgcasp-ob-080-csp-drp-ny-1=43416 ] Part 1: [id-dgcasp-ob-080-csp-drp-ny-1=43720 id-dgcasp-ob-471-csp-drp-ny-1=43724 ] Part 2: [id-dgcasp-ob-762-csp-drp-ny-1=43388 id-dgcasp-ob-471-csp-drp-ny-1=43376 ] Part 3: [id-dgcasp-ob-775-csp-drp-ny-1=43488 id-dgcasp-ob-403-csp-drp-ny-1=43484 ] Part 4: [id-dgcasp-ob-080-csp-drp-ny-1=43338 id-dgcasp-ob-471-csp-drp-ny-1=43339 ] Part 5: [id-dgcasp-ob-398-csp-drp-ny-1=43105 id-dgcasp-ob-471-csp-drp-ny-1=43106 ] Part 7: [id-dgcasp-ob-775-csp-drp-ny-1=43151 id-dgcasp-ob-762-csp-drp-ny-1=43157 ] Part 8: [id-dgcasp-ob-398-csp-drp-ny-1=42975 id-dgcasp-ob-471-csp-drp-ny-1=42976 ] Part 10: [id-dgcasp-ob-775-csp-drp-ny-1=43033 id-dgcasp-ob-471-csp-drp-ny-1=43036 ] Part 11: [id-dgcasp-ob-762-csp-drp-ny-1=43303 id-dgcasp-ob-471-csp-drp-ny-1=43299 ] Part 12: [id-dgcasp-ob-398-csp-drp-ny-1=43262 id-dgcasp-ob-471-csp-drp-ny-1=43265 ] Part 13: [id-dgcasp-ob-762-csp-drp-ny-1=43123 id-dgcasp-ob-471-csp-drp-ny-1=43120 ] Part 15: [id-dgcasp-ob-775-csp-drp-ny-1=43412 id-dgcasp-ob-398-csp-drp-ny-1=43413 ] Part 16: [id-dgcasp-ob-471-csp-drp-ny-1=43934 id-dgcasp-ob-403-csp-drp-ny-1=43933 ] Part 20: [id-dgcasp-ob-080-csp-drp-ny-1=43146 id-dgcasp-ob-471-csp-drp-ny-1=43148 ] Part 21: [id-dgcasp-ob-762-csp-drp-ny-1=43196 id-dgcasp-ob-080-csp-drp-ny-1=43197 ] Part 22: [id-dgcasp-ob-398-csp-drp-ny-1=43233 id-dgcasp-ob-762-csp-drp-ny-1=43234 ] Part 23: [id-dgcasp-ob-398-csp-drp-ny-1=43127 id-dgcasp-ob-471-csp-drp-ny-1=43128 ] Part 24: [id-dgcasp-ob-775-csp-drp-ny-1=43144 id-dgcasp-ob-398-csp-drp-ny-1=43142 ] ... TRUNCATED


Thanks,
Abhishek

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Intermittent "Partition states validation has failed for group" issues

Hello!

I think this means that backup/primary contents are inconsistent.

The implications is that in case of node failure there will be data inconsistency (or maybe it's already there).

The recommendation is to a) check logs for any oddities/exceptions, and b) maybe remove problematic partitions' files from persistence and/or restart problematic nodes.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 23:17, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
In my otherwise stably running grid (on 2.7.5) I sometimes see intermittent GridDhtPartitionsExchangeFuture warning. This warning the occurs periodically and then goes away after some time. I couldn't find any documentation or other threads about this warning and its implications.
* What is the trigger for this warning?
* What are the implications?
* Is there any recommendation around fixing this issue?




2019-10-21 16:09:44.378 [WARN ] [sys-#26240] GridDhtPartitionsExchangeFuture - Partition states validation has failed for group: mainCache. Partitions cache sizes are inconsistent for Part 0: [id-dgcasp-ob-398-csp-drp-ny-1=43417 id-dgcasp-ob-080-csp-drp-ny-1=43416 ] Part 1: [id-dgcasp-ob-080-csp-drp-ny-1=43720 id-dgcasp-ob-471-csp-drp-ny-1=43724 ] Part 2: [id-dgcasp-ob-762-csp-drp-ny-1=43388 id-dgcasp-ob-471-csp-drp-ny-1=43376 ] Part 3: [id-dgcasp-ob-775-csp-drp-ny-1=43488 id-dgcasp-ob-403-csp-drp-ny-1=43484 ] Part 4: [id-dgcasp-ob-080-csp-drp-ny-1=43338 id-dgcasp-ob-471-csp-drp-ny-1=43339 ] Part 5: [id-dgcasp-ob-398-csp-drp-ny-1=43105 id-dgcasp-ob-471-csp-drp-ny-1=43106 ] Part 7: [id-dgcasp-ob-775-csp-drp-ny-1=43151 id-dgcasp-ob-762-csp-drp-ny-1=43157 ] Part 8: [id-dgcasp-ob-398-csp-drp-ny-1=42975 id-dgcasp-ob-471-csp-drp-ny-1=42976 ] Part 10: [id-dgcasp-ob-775-csp-drp-ny-1=43033 id-dgcasp-ob-471-csp-drp-ny-1=43036 ] Part 11: [id-dgcasp-ob-762-csp-drp-ny-1=43303 id-dgcasp-ob-471-csp-drp-ny-1=43299 ] Part 12: [id-dgcasp-ob-398-csp-drp-ny-1=43262 id-dgcasp-ob-471-csp-drp-ny-1=43265 ] Part 13: [id-dgcasp-ob-762-csp-drp-ny-1=43123 id-dgcasp-ob-471-csp-drp-ny-1=43120 ] Part 15: [id-dgcasp-ob-775-csp-drp-ny-1=43412 id-dgcasp-ob-398-csp-drp-ny-1=43413 ] Part 16: [id-dgcasp-ob-471-csp-drp-ny-1=43934 id-dgcasp-ob-403-csp-drp-ny-1=43933 ] Part 20: [id-dgcasp-ob-080-csp-drp-ny-1=43146 id-dgcasp-ob-471-csp-drp-ny-1=43148 ] Part 21: [id-dgcasp-ob-762-csp-drp-ny-1=43196 id-dgcasp-ob-080-csp-drp-ny-1=43197 ] Part 22: [id-dgcasp-ob-398-csp-drp-ny-1=43233 id-dgcasp-ob-762-csp-drp-ny-1=43234 ] Part 23: [id-dgcasp-ob-398-csp-drp-ny-1=43127 id-dgcasp-ob-471-csp-drp-ny-1=43128 ] Part 24: [id-dgcasp-ob-775-csp-drp-ny-1=43144 id-dgcasp-ob-398-csp-drp-ny-1=43142 ] ... TRUNCATED


Thanks,
Abhishek

gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

Re: Intermittent "Partition states validation has failed for group" issues

Thanks Ilya. The thing is, I've seen these exceptions without any errors occurring before them. Also I'm not using persistence. Also, I've seen this happen on multiple nodes at the same time. If I bounce multiple nodes, I would loose data (since I have only 1 backup). Anything else I could do?


-Abhishek


From: [hidden email] At: 10/28/19 12:47:23
Cc: [hidden email]
Subject: Re: Intermittent "Partition states validation has failed for group" issues

Hello!

I think this means that backup/primary contents are inconsistent.

The implications is that in case of node failure there will be data inconsistency (or maybe it's already there).

The recommendation is to a) check logs for any oddities/exceptions, and b) maybe remove problematic partitions' files from persistence and/or restart problematic nodes.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 23:17, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
In my otherwise stably running grid (on 2.7.5) I sometimes see intermittent GridDhtPartitionsExchangeFuture warning. This warning the occurs periodically and then goes away after some time. I couldn't find any documentation or other threads about this warning and its implications.
* What is the trigger for this warning?
* What are the implications?
* Is there any recommendation around fixing this issue?




2019-10-21 16:09:44.378 [WARN ] [sys-#26240] GridDhtPartitionsExchangeFuture - Partition states validation has failed for group: mainCache. Partitions cache sizes are inconsistent for Part 0: [id-dgcasp-ob-398-csp-drp-ny-1=43417 id-dgcasp-ob-080-csp-drp-ny-1=43416 ] Part 1: [id-dgcasp-ob-080-csp-drp-ny-1=43720 id-dgcasp-ob-471-csp-drp-ny-1=43724 ] Part 2: [id-dgcasp-ob-762-csp-drp-ny-1=43388 id-dgcasp-ob-471-csp-drp-ny-1=43376 ] Part 3: [id-dgcasp-ob-775-csp-drp-ny-1=43488 id-dgcasp-ob-403-csp-drp-ny-1=43484 ] Part 4: [id-dgcasp-ob-080-csp-drp-ny-1=43338 id-dgcasp-ob-471-csp-drp-ny-1=43339 ] Part 5: [id-dgcasp-ob-398-csp-drp-ny-1=43105 id-dgcasp-ob-471-csp-drp-ny-1=43106 ] Part 7: [id-dgcasp-ob-775-csp-drp-ny-1=43151 id-dgcasp-ob-762-csp-drp-ny-1=43157 ] Part 8: [id-dgcasp-ob-398-csp-drp-ny-1=42975 id-dgcasp-ob-471-csp-drp-ny-1=42976 ] Part 10: [id-dgcasp-ob-775-csp-drp-ny-1=43033 id-dgcasp-ob-471-csp-drp-ny-1=43036 ] Part 11: [id-dgcasp-ob-762-csp-drp-ny-1=43303 id-dgcasp-ob-471-csp-drp-ny-1=43299 ] Part 12: [id-dgcasp-ob-398-csp-drp-ny-1=43262 id-dgcasp-ob-471-csp-drp-ny-1=43265 ] Part 13: [id-dgcasp-ob-762-csp-drp-ny-1=43123 id-dgcasp-ob-471-csp-drp-ny-1=43120 ] Part 15: [id-dgcasp-ob-775-csp-drp-ny-1=43412 id-dgcasp-ob-398-csp-drp-ny-1=43413 ] Part 16: [id-dgcasp-ob-471-csp-drp-ny-1=43934 id-dgcasp-ob-403-csp-drp-ny-1=43933 ] Part 20: [id-dgcasp-ob-080-csp-drp-ny-1=43146 id-dgcasp-ob-471-csp-drp-ny-1=43148 ] Part 21: [id-dgcasp-ob-762-csp-drp-ny-1=43196 id-dgcasp-ob-080-csp-drp-ny-1=43197 ] Part 22: [id-dgcasp-ob-398-csp-drp-ny-1=43233 id-dgcasp-ob-762-csp-drp-ny-1=43234 ] Part 23: [id-dgcasp-ob-398-csp-drp-ny-1=43127 id-dgcasp-ob-471-csp-drp-ny-1=43128 ] Part 24: [id-dgcasp-ob-775-csp-drp-ny-1=43144 id-dgcasp-ob-398-csp-drp-ny-1=43142 ] ... TRUNCATED


Thanks,
Abhishek


ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Intermittent "Partition states validation has failed for group" issues

Hello!

You could try to rolling-restart multiple nodes, waiting for the data to rebalance after each restart, to avoid data loss.

Regards,
--
Ilya Kasnacheev


пн, 28 окт. 2019 г. в 21:51, Abhishek Gupta (BLOOMBERG/ 919 3RD A) <[hidden email]>:
Thanks Ilya. The thing is, I've seen these exceptions without any errors occurring before them. Also I'm not using persistence. Also, I've seen this happen on multiple nodes at the same time. If I bounce multiple nodes, I would loose data (since I have only 1 backup). Anything else I could do?


-Abhishek


From: [hidden email] At: 10/28/19 12:47:23
Cc: [hidden email]
Subject: Re: Intermittent "Partition states validation has failed for group" issues

Hello!

I think this means that backup/primary contents are inconsistent.

The implications is that in case of node failure there will be data inconsistency (or maybe it's already there).

The recommendation is to a) check logs for any oddities/exceptions, and b) maybe remove problematic partitions' files from persistence and/or restart problematic nodes.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 23:17, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
In my otherwise stably running grid (on 2.7.5) I sometimes see intermittent GridDhtPartitionsExchangeFuture warning. This warning the occurs periodically and then goes away after some time. I couldn't find any documentation or other threads about this warning and its implications.
* What is the trigger for this warning?
* What are the implications?
* Is there any recommendation around fixing this issue?




2019-10-21 16:09:44.378 [WARN ] [sys-#26240] GridDhtPartitionsExchangeFuture - Partition states validation has failed for group: mainCache. Partitions cache sizes are inconsistent for Part 0: [id-dgcasp-ob-398-csp-drp-ny-1=43417 id-dgcasp-ob-080-csp-drp-ny-1=43416 ] Part 1: [id-dgcasp-ob-080-csp-drp-ny-1=43720 id-dgcasp-ob-471-csp-drp-ny-1=43724 ] Part 2: [id-dgcasp-ob-762-csp-drp-ny-1=43388 id-dgcasp-ob-471-csp-drp-ny-1=43376 ] Part 3: [id-dgcasp-ob-775-csp-drp-ny-1=43488 id-dgcasp-ob-403-csp-drp-ny-1=43484 ] Part 4: [id-dgcasp-ob-080-csp-drp-ny-1=43338 id-dgcasp-ob-471-csp-drp-ny-1=43339 ] Part 5: [id-dgcasp-ob-398-csp-drp-ny-1=43105 id-dgcasp-ob-471-csp-drp-ny-1=43106 ] Part 7: [id-dgcasp-ob-775-csp-drp-ny-1=43151 id-dgcasp-ob-762-csp-drp-ny-1=43157 ] Part 8: [id-dgcasp-ob-398-csp-drp-ny-1=42975 id-dgcasp-ob-471-csp-drp-ny-1=42976 ] Part 10: [id-dgcasp-ob-775-csp-drp-ny-1=43033 id-dgcasp-ob-471-csp-drp-ny-1=43036 ] Part 11: [id-dgcasp-ob-762-csp-drp-ny-1=43303 id-dgcasp-ob-471-csp-drp-ny-1=43299 ] Part 12: [id-dgcasp-ob-398-csp-drp-ny-1=43262 id-dgcasp-ob-471-csp-drp-ny-1=43265 ] Part 13: [id-dgcasp-ob-762-csp-drp-ny-1=43123 id-dgcasp-ob-471-csp-drp-ny-1=43120 ] Part 15: [id-dgcasp-ob-775-csp-drp-ny-1=43412 id-dgcasp-ob-398-csp-drp-ny-1=43413 ] Part 16: [id-dgcasp-ob-471-csp-drp-ny-1=43934 id-dgcasp-ob-403-csp-drp-ny-1=43933 ] Part 20: [id-dgcasp-ob-080-csp-drp-ny-1=43146 id-dgcasp-ob-471-csp-drp-ny-1=43148 ] Part 21: [id-dgcasp-ob-762-csp-drp-ny-1=43196 id-dgcasp-ob-080-csp-drp-ny-1=43197 ] Part 22: [id-dgcasp-ob-398-csp-drp-ny-1=43233 id-dgcasp-ob-762-csp-drp-ny-1=43234 ] Part 23: [id-dgcasp-ob-398-csp-drp-ny-1=43127 id-dgcasp-ob-471-csp-drp-ny-1=43128 ] Part 24: [id-dgcasp-ob-775-csp-drp-ny-1=43144 id-dgcasp-ob-398-csp-drp-ny-1=43142 ] ... TRUNCATED


Thanks,
Abhishek