nodes in the baseline topology is going to OFFLINE state

classic Classic list List threaded Threaded
6 messages Options
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

nodes in the baseline topology is going to OFFLINE state

Hi all,
I have Ignite deployment on Kubernetes and I wanted to restart all nodes so
I am using "kill -k" command from the visor shell.
after running this command it is restarting all nodes, once all nodes join
the topology sometimes few nodes are going into OFFLINE state [eventhough
the nodes are up and running] and it looks like it is causing split-brain or
split cluster scenario.


[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 1

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=ONLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=OFFLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: nodes in the baseline topology is going to OFFLINE state

Hello!

Can you provide logs from nodes which are considered OFFLINE by cluster?

Please note that it is advised to start one node and then start all other when first one is up, as opposed to starting them all at the same moment.

Regards,
--
Ilya Kasnacheev


чт, 17 окт. 2019 г. в 17:54, shivakumar <[hidden email]>:
Hi all,
I have Ignite deployment on Kubernetes and I wanted to restart all nodes so
I am using "kill -k" command from the visor shell.
after running this command it is restarting all nodes, once all nodes join
the topology sometimes few nodes are going into OFFLINE state [eventhough
the nodes are up and running] and it looks like it is causing split-brain or
split cluster scenario.


[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 1

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=ONLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=OFFLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.
[ignite@ignite-shiva-visor-68cb697b5-qbccr bin]$ ./control.sh --user ignite
--password ignite --host ignite-service-br --baseline
Control utility [ver. 2.7.6#19700101-sha1:DEV]
2019 Copyright(C) Apache Software Foundation
User: ignite
--------------------------------------------------------------------------------
Cluster state: inactive
Current topology version: 2

Baseline nodes:
    ConsistentID=253094b4-877b-45ae-ad06-07e639befffc, STATE=OFFLINE
    ConsistentID=862e324b-f3d1-4198-92d0-0d1d2c4a2f88, STATE=ONLINE
    ConsistentID=86b5f451-ac4f-4479-9f6e-2db6ab5d11e7, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3

Other nodes not found.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: nodes in the baseline topology is going to OFFLINE state

Hi Ilya Kasnacheev,
Is there any other way of gracefully shutting down/restart the entire
cluster?

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: nodes in the baseline topology is going to OFFLINE state

Hello!

If cluster is persistent, you can deactivate it and then restart.

Regards,
--
Ilya Kasnacheev


пт, 18 окт. 2019 г. в 09:51, shivakumar <[hidden email]>:
Hi Ilya Kasnacheev,
Is there any other way of gracefully shutting down/restart the entire
cluster?

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: nodes in the baseline topology is going to OFFLINE state

Hi Ilya,
My goal is to deactivate the cluster and not restart !! There is an issue in deactivating the cluster in my deployment so I am going with restart.

I have the ignite deployment on kubernetes and during deactivation entire cluster and even request to deactivate (rest or control.sh) hangs because I have few applications which connected to this ignite  cluster over JDBC and try to run some queries and also inserts records to many tables parallelly. At this time if I issue a deactivate request it hangs for more than 25 minutes. I am in a impression that since there are many clients established TCP connections and running queries, this is causing the cluster to hang and thinking of restarting the cluster so that I can proceed with deactivation easily once restart is done.
Any suggestions is appreciated.

Regards,
Shiva


On Fri, 18 Oct, 2019, 6:37 PM Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

If cluster is persistent, you can deactivate it and then restart.

Regards,
--
Ilya Kasnacheev


пт, 18 окт. 2019 г. в 09:51, shivakumar <[hidden email]>:
Hi Ilya Kasnacheev,
Is there any other way of gracefully shutting down/restart the entire
cluster?

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: nodes in the baseline topology is going to OFFLINE state

Hello!

There is no supported way to gracefully restart cluster currently. You will have to stop all nodes, start them again and then activate (or auto-activate).

Regards,
--
Ilya Kasnacheev


пт, 18 окт. 2019 г. в 16:42, Shiva Kumar <[hidden email]>:
Hi Ilya,
My goal is to deactivate the cluster and not restart !! There is an issue in deactivating the cluster in my deployment so I am going with restart.

I have the ignite deployment on kubernetes and during deactivation entire cluster and even request to deactivate (rest or control.sh) hangs because I have few applications which connected to this ignite  cluster over JDBC and try to run some queries and also inserts records to many tables parallelly. At this time if I issue a deactivate request it hangs for more than 25 minutes. I am in a impression that since there are many clients established TCP connections and running queries, this is causing the cluster to hang and thinking of restarting the cluster so that I can proceed with deactivation easily once restart is done.
Any suggestions is appreciated.

Regards,
Shiva


On Fri, 18 Oct, 2019, 6:37 PM Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

If cluster is persistent, you can deactivate it and then restart.

Regards,
--
Ilya Kasnacheev


пт, 18 окт. 2019 г. в 09:51, shivakumar <[hidden email]>:
Hi Ilya Kasnacheev,
Is there any other way of gracefully shutting down/restart the entire
cluster?

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/