Background: Running an Ignite 2.8.1 cluster. 3 node server configuration with
one persistent client and one or more ad hoc clients.
Problem: We ssh'ed onto one of the nodes and ran visor there to quickly
gather cache stats. Visor hung indefinitely and one of the 3 nodes had
their ignite process exited. We kill -9'ed Visor. We then attempted to
start the failed ignite process.
We tried unsuccessfully and saw the error "Node with the same ID was found
in node IDs history or existing node in topology has the same ID". We
waited and tried again and then it connected just fine.
To try and verify if the cluster was "healthy" we thought, ok, let's try
stopping that ignite process again and restart it just to verify things are
back to normal.
This put us in a situation where every single attempt to start resulted in
"Caused by: class org.apache.ignite.spi.IgniteSpiException: Node with the
same ID was found in node IDs history or existing node in topology has the
same ID (fix configuration and restart local node)"
We removed this node from the baseline. Then we deleted its work directory
and attempted to restart and see the same problem. We then destroyed the
machine entire and created a new machine with a fresh install of ignite and
that new machine won't start its ignite process either with the same error.
We are now in a state where we can't join any new nodes to the cluster at
all and every attempt whether it's a new machine reports the same error.
How can we repair our cluster to get rid of this error and get a new node to