We use linkerd (linkerd.io) to provide inter-pod SSL encryption in our Azure
Kubernetes cluster, as required by our organization. When we enabled linkerd
in our namespace, we observed that the ignite pods were crashing at startup,
then restarting, and succeeding in connecting with the grid at the 2nd
attempt. Once connected, all is well.
We suspect the connection failure is related to
TcpDiscoveryKubernetesIpFinder, which is responsible for communicating with
the Kubernetes API, and retrieving the grid nodes IPs. With linkerd enabled,
all outbound traffic from a grid pod goes out via a linkerd proxy, then out
to the destination (the API in this case). Since linkerd is not enabled at
the destination, traffic should go out unaffected by the proxy. But
obviously, something is not quite right.
Here is a log from an impacted pod we were able to capture:
[2020-09-14 18:22:09,045][ERROR][main][IgniteKernal] Got exception while
starting (will rollback startup routine).
class org.apache.ignite.IgniteException: Unable to establish secure
connection. Was remote cluster configured with SSL?
[rmtAddr=/10.244.6.100:47500, errMsg="Remote host terminated the handshake"]
As an FYI, we do not use linkerd to encrypt grid node to grid node
connections; linkerd only encrypts HTTPS traffic. In our solution, linkerd
is used for the HTTP traffic between the frontend NGINX pods to the backend
So my questions are:
* does anybody have experience using ignite with linkerd in a Kubernetes
cluster, and if so, have you observed this problem?
* what may cause the connection failure?
* what may be a fix to the above problem?
Thank you for your detailed suggestion. Ultimately I did not have to do
deep debugging. I consulted with the linkerd crew and they suggested a
linkerd config that restricted the linkerd encryption to outgoing port 8080,
that is the port used between our client app and the grid, leaving the grid
to k8s API connection unaltered. We are not seeing the mentioned failures,
and the grid startup is must faster.