slow node discovering on Kubernetes

classic Classic list List threaded Threaded
4 messages Options
mvolkomorov mvolkomorov
Reply | Threaded
Open this post in threaded view
|

slow node discovering on Kubernetes

Hi Ignite team,

We're running an Ignite cluster with 10 server nodes at Kubernetes.
Using empty ignite configuration we can't run more than 5 nodes in a normal time.

Trying to deploy 10 nodes on our empty config leads to weird discovery problems caused by "IgniteSpiException: Node with the same ID was found". 

After increasing the AckTimeout=10000 and switching to G1GC, cluster was started, but time still long and getting TcpDiscoverySpi errors.
I checked connections and port availability between nodes and found nothing suspicious.

I've attached a log with one of 10 nodes deploying.
Our ignite config is default-config.xml with only TcpDiscoveryKubernetesIpFinder.

Would you like to take a look and give us some suggestions on how to reduce deploy time?

Thanks,
Maxim

ignite-10-nodes-long-start.log (752K) Download Attachment
Alexandr Shapkin Alexandr Shapkin
Reply | Threaded
Open this post in threaded view
|

RE: slow node discovering on Kubernetes

Hello Maxim,

 

Could you please share the current state of this issue? Have you managed to resolve it or it still exists?

 

What is your pods configuration in terms of resource usage?

 

From: [hidden email]
Sent: Thursday, March 11, 2021 11:44 PM
To: [hidden email]
Subject: slow node discovering on Kubernetes

 

Hi Ignite team,

We're running an Ignite cluster with 10 server nodes at Kubernetes.
Using empty ignite configuration we can't run more than 5 nodes in a normal time.

Trying to deploy 10 nodes on our empty config leads to weird discovery problems caused by "IgniteSpiException: Node with the same ID was found". 

 

After increasing the AckTimeout=10000 and switching to G1GC, cluster was started, but time still long and getting TcpDiscoverySpi errors.
I checked connections and port availability between nodes and found nothing suspicious.

I've attached a log with one of 10 nodes deploying.
Our ignite config is default-config.xml with only TcpDiscoveryKubernetesIpFinder.

Would you like to take a look and give us some suggestions on how to reduce deploy time?

Thanks,
Maxim

 

Alex Shapkin
mvolkomorov mvolkomorov
Reply | Threaded
Open this post in threaded view
|

RE: slow node discovering on Kubernetes

Hello, Alexandr!

The problem still actual, we deployed same 10 nodes on google kubernetes and
got a normal time.
For now we did not define any limits or requests, our ignite is the only
deployment on the kubernetes.
We use flannel network plugin (host-gw), are there any recommendations for
the network plugin?
Do you have any idea how to estimate network or hardware performance to find
possible bottlenecks?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Alexandr Shapkin Alexandr Shapkin
Reply | Threaded
Open this post in threaded view
|

RE: slow node discovering on Kubernetes

Hi Maxim,

 

Not really, I haven’t used flannel plugin and I’m not quite sure about network recommendations as well.

Well, besides something basic, like avoid stretching between multiple availability zones and some persistence tuning,

Like disabling MMAP for WAL (IGNITE_WAL_MMAP=false)

 

> The problem still actual, we deployed same 10 nodes on google kubernetes and

>got a normal time.

 

Do you mean, that switching on GKE makes it working or it was the initial setup and nothing has changed since that?

 

> For now we did not define any limits or requests, our ignite is the only

deployment on the kubernetes.

 

I was wondering cause if I remember correctly there might be some issues if your pods have insufficient resources,

but unfortunately, no direct numbers. Well, default GKE instances should works ok on a default cluster, anyway.

 

> Do you have any idea how to estimate network or hardware performance to find

>possible bottlenecks?

 

I think enabling DEBUG logs for discovery might help to see what’s really happening to the grid.

 

<category name="org.apache.ignite.spi.discovery">
    <
level value="DEBUG"/>
</
category>

 

Btw, is it a persistent cluster or pure in-memory one?

 

 

From: [hidden email]
Sent: Friday, April 9, 2021 10:26 AM
To: [hidden email]
Subject: RE: slow node discovering on Kubernetes

 

Hello, Alexandr!

 

The problem still actual, we deployed same 10 nodes on google kubernetes and

got a normal time.

For now we did not define any limits or requests, our ignite is the only

deployment on the kubernetes.

We use flannel network plugin (host-gw), are there any recommendations for

the network plugin?

Do you have any idea how to estimate network or hardware performance to find

possible bottlenecks?

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

Alex Shapkin