Re: Ignite Segmentation Behaviour

classic Classic list List threaded Threaded
6 messages Options
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

Hi Samuel,

With the current behavior, the segments will not rejoin automatically. Once the network is recovered from a network partitioning event, you need to restart all the nodes of one of the segments. Those nodes will join the other nodes and the cluster will become fully operational.

Let me know if you have any other questions or guidance with this. 

-
Denis


On Fri, Sep 11, 2020 at 7:38 AM Samuel Ueltschi <[hidden email]> wrote:

Hi

 

I've been testing Ignite (2.8.1) and it's behaviour under network segmentation.

According to the docs, Ignite nodes should be able to detect network segmentation and apply the configured SegmentationPolicy.

 

However the segmentation handling didn't trigger as I would have expected it to do.

For my tests, I setup three cluster nodes c1, c2 and c3 running in docker containers, all competing for a shared IgniteLock instance in a loop.

Then I used iptables in container c2 to drop all incoming and outgoing packages on that node.

After a few seconds I got the following events:

 

c1:

- EVT_NODE_FAILED for c2

 

c2:

- EVT_NODE_FAILED for c1

- EVT_NODE_FAILED for c3

 

c3:

- EVT_NODE_FAILED for c2

 

Then I reset the iptables rules expecting that c2 would rejoin the cluster and detect segmentation.

However this didn't happen, c2 just keeps running as a second standalone cluster instance.

Only after restarting c2 it rejoined the cluster.

 

Eventually I was able to trigger the EVT_NODE_SEGMENTED event by pausing the c2 container for 1minute. After resuming, c2 detects the segmentation and runs the segmentation policy as excepcted.

 

Is this behaviour correct? Shouldn't the Ignite cluster be able to recover from the first scenario?

During a network segmentation no packages would be able to move between nodes, so the iptables approach should be realistic in my oppinion.

 

Maybe I have some wrong assumptions about network segmentation so any feedback would be greatly appreciated.

 

Cheers Sam

 

--
Software Engineer
BSI Business Systems Integration AG
Erlachstrasse 16B, CH-3012 Bern
Telefon +41 31 850 12 06

www.bsi-software.com

 

Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

Make sure first you stop all nodes in one segment and only then start them, rolling restart might not fix cluster segmentation.


пт, 11 сент. 2020 г. в 09:08, Denis Magda <[hidden email]>:
Hi Samuel,

With the current behavior, the segments will not rejoin automatically. Once the network is recovered from a network partitioning event, you need to restart all the nodes of one of the segments. Those nodes will join the other nodes and the cluster will become fully operational.

Let me know if you have any other questions or guidance with this. 

-
Denis


On Fri, Sep 11, 2020 at 7:38 AM Samuel Ueltschi <[hidden email]> wrote:

Hi

 

I've been testing Ignite (2.8.1) and it's behaviour under network segmentation.

According to the docs, Ignite nodes should be able to detect network segmentation and apply the configured SegmentationPolicy.

 

However the segmentation handling didn't trigger as I would have expected it to do.

For my tests, I setup three cluster nodes c1, c2 and c3 running in docker containers, all competing for a shared IgniteLock instance in a loop.

Then I used iptables in container c2 to drop all incoming and outgoing packages on that node.

After a few seconds I got the following events:

 

c1:

- EVT_NODE_FAILED for c2

 

c2:

- EVT_NODE_FAILED for c1

- EVT_NODE_FAILED for c3

 

c3:

- EVT_NODE_FAILED for c2

 

Then I reset the iptables rules expecting that c2 would rejoin the cluster and detect segmentation.

However this didn't happen, c2 just keeps running as a second standalone cluster instance.

Only after restarting c2 it rejoined the cluster.

 

Eventually I was able to trigger the EVT_NODE_SEGMENTED event by pausing the c2 container for 1minute. After resuming, c2 detects the segmentation and runs the segmentation policy as excepcted.

 

Is this behaviour correct? Shouldn't the Ignite cluster be able to recover from the first scenario?

During a network segmentation no packages would be able to move between nodes, so the iptables approach should be realistic in my oppinion.

 

Maybe I have some wrong assumptions about network segmentation so any feedback would be greatly appreciated.

 

Cheers Sam

 

--
Software Engineer
BSI Business Systems Integration AG
Erlachstrasse 16B, CH-3012 Bern
Telefon +41 31 850 12 06

www.bsi-software.com

 

Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

BTW,  you can try zookeeper discovery, I think it's the easier way to resolve split-brain problem: https://www.gridgain.com/docs/latest/developers-guide/clustering/zookeeper-discovery

пт, 11 сент. 2020 г. в 14:13, Michael Cherkasov <[hidden email]>:
Make sure first you stop all nodes in one segment and only then start them, rolling restart might not fix cluster segmentation.


пт, 11 сент. 2020 г. в 09:08, Denis Magda <[hidden email]>:
Hi Samuel,

With the current behavior, the segments will not rejoin automatically. Once the network is recovered from a network partitioning event, you need to restart all the nodes of one of the segments. Those nodes will join the other nodes and the cluster will become fully operational.

Let me know if you have any other questions or guidance with this. 

-
Denis


On Fri, Sep 11, 2020 at 7:38 AM Samuel Ueltschi <[hidden email]> wrote:

Hi

 

I've been testing Ignite (2.8.1) and it's behaviour under network segmentation.

According to the docs, Ignite nodes should be able to detect network segmentation and apply the configured SegmentationPolicy.

 

However the segmentation handling didn't trigger as I would have expected it to do.

For my tests, I setup three cluster nodes c1, c2 and c3 running in docker containers, all competing for a shared IgniteLock instance in a loop.

Then I used iptables in container c2 to drop all incoming and outgoing packages on that node.

After a few seconds I got the following events:

 

c1:

- EVT_NODE_FAILED for c2

 

c2:

- EVT_NODE_FAILED for c1

- EVT_NODE_FAILED for c3

 

c3:

- EVT_NODE_FAILED for c2

 

Then I reset the iptables rules expecting that c2 would rejoin the cluster and detect segmentation.

However this didn't happen, c2 just keeps running as a second standalone cluster instance.

Only after restarting c2 it rejoined the cluster.

 

Eventually I was able to trigger the EVT_NODE_SEGMENTED event by pausing the c2 container for 1minute. After resuming, c2 detects the segmentation and runs the segmentation policy as excepcted.

 

Is this behaviour correct? Shouldn't the Ignite cluster be able to recover from the first scenario?

During a network segmentation no packages would be able to move between nodes, so the iptables approach should be realistic in my oppinion.

 

Maybe I have some wrong assumptions about network segmentation so any feedback would be greatly appreciated.

 

Cheers Sam

 

--
Software Engineer
BSI Business Systems Integration AG
Erlachstrasse 16B, CH-3012 Bern
Telefon +41 31 850 12 06

www.bsi-software.com

 

sue sue
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

In reply to this post by dmagda
Hi Denis,

Thank you for your reply.

Restarting all the nodes in the partitioned segment would work for my
usecase.

Is there a way to detect such a scenario with TCP/IP Discovery mode in
Ignite?
In my test I didn't get any EVT_NODE_SEGMENTED events, only EVT_NODE_FAILED.
So the individual cluster nodes would not be able to distinguish between
failed nodes and network segmentation.

@Mikhail: Thank you for the tip with zookeeper, I'll check that out.

Cheers Sam



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

Hello!

After all, it is only you who can decide whether your cluster has segmented or not. The traditional solution is "the largest one wins", but if the parts of cluster can't communicate, it becomes undecidable.

Regards,
--
Ilya Kasnacheev


пн, 14 сент. 2020 г. в 12:01, sue <[hidden email]>:
Hi Denis,

Thank you for your reply.

Restarting all the nodes in the partitioned segment would work for my
usecase.

Is there a way to detect such a scenario with TCP/IP Discovery mode in
Ignite?
In my test I didn't get any EVT_NODE_SEGMENTED events, only EVT_NODE_FAILED.
So the individual cluster nodes would not be able to distinguish between
failed nodes and network segmentation.

@Mikhail: Thank you for the tip with zookeeper, I'll check that out.

Cheers Sam



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
sue sue
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Segmentation Behaviour

Hi,

Thank's for the clarificaiton! That's what I expected.
I looked into https://github.com/luqmanahmad/ignite-plugins and decided to
implement my own SegmentationResolver to deal with these cases.

Cheers Sam



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/