Ignite failing to start with linkerd?

classic Classic list List threaded Threaded
2 messages Options
jbmassicotte jbmassicotte
Reply | Threaded
Open this post in threaded view
|

Ignite failing to start with linkerd?

Hello team,

We use linkerd (linkerd.io) to provide inter-pod SSL encryption in our Azure
Kubernetes cluster, as required by our organization. When we enabled linkerd
in our namespace, we observed that the ignite pods were crashing at startup,
then restarting, and succeeding in connecting with the grid at the 2nd
attempt.  Once connected, all is well.

We suspect the connection failure is related to
TcpDiscoveryKubernetesIpFinder, which is responsible for communicating with
the Kubernetes API, and retrieving the grid nodes IPs. With linkerd enabled,
all outbound traffic from a grid pod goes out via a linkerd proxy, then out
to the destination (the API in this case). Since linkerd is not enabled at
the destination, traffic should go out unaffected by the proxy. But
obviously, something is not quite right.

Here is a log from an impacted pod we were able to capture:

[2020-09-14 18:22:09,045][ERROR][main][IgniteKernal] Got exception while
starting (will rollback startup routine).
class org.apache.ignite.IgniteException: Unable to establish secure
connection. Was remote cluster configured with SSL?
[rmtAddr=/10.244.6.100:47500, errMsg="Remote host terminated the handshake"]
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1487)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1220)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1032)
        at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
        at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
        at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943)
        at
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960)
        at
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703)
        at
org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117)
        at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659)
        at org.apache.ignite.Ignition.start(Ignition.java:346)
        at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300)

As an FYI, we do not use linkerd to encrypt grid node to grid node
connections; linkerd only encrypts HTTPS traffic. In our solution, linkerd
is used for the HTTP traffic between the frontend NGINX pods to the backend
ignite pods.

So my questions are:
* does anybody have experience using ignite with linkerd in a Kubernetes
cluster, and if so, have you observed this problem?
* what may cause the connection failure?
* what may be a fix to the above problem?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Ignite failing to start with linkerd?

Hi,
  The K8 IP finder does an equivalent of kubectl get endpoints <your
service>
  and then tries to discover the equivalent nodes based on the results.

  see:
https://github.com/apache/ignite/blob/513afe4dabbaa1c2853a76ff02e58f4a7db01076/modules/kubernetes/src/main/java/org/apache/ignite/spi/discovery/tcp/ipfinder/kubernetes/TcpDiscoveryKubernetesIpFinder.java#L139


   I would suggest debugging the relevant services to make sure that the
endpoints are correct from run to run -- and reflect relevant pods.

  The stack trace displayed shows that the actual communication message is
being intercepted and modified in some way.

   I would simplify the scenario to the bare minimum, one pod and one
external consumer, and then monitor all network traffic to see what happens
during the each connect.


  see:   https://apacheignite.readme.io/docs/ignite-service
           https://apacheignite.readme.io/docs/microsoft-azure-deployment

   Also take a look at the externalTrafficPolicy, to see whether it makes a
difference in your config,
   as K8 can mask the source IPs and in conjunction w/linkerd it might
affect your app.
 
https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip


Thanks, Alex





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/