Client reconnect problems

classic Classic list List threaded Threaded
2 messages Options
Mario Ivankovits Mario Ivankovits
Reply | Threaded
Open this post in threaded view
|

Client reconnect problems

Hi list!

I started to use Ignite (1.1.0-incubating) for a network message bus where I have a server node and several client nodes using the TcpClientDiscoverySpi.

On first startup, it does not matter in which order I start my Ignite sever or client. Each other waits as expected to have the grid running. But if I kill the server, the client fails to reconnect to it.

I have a simple test for this.
Just start the class TestIgniteClient with program parameter „s“ and again with „c“ so you have two instances running. Then you should see the message „I am here“ flowing from the server to the client.
Once you kill the server process („s“) and restart it, you will get a lot of exception on the client and it will not reconnect.

Is there something I do wrong, or should I file a JIRA about that?

Thanks for your help.


===exception==
SCHWERWIEGEND: Failed to refresh partition map [oldest=00000000-0000-0001-0000-000000000001, rmts=[], loc=445a949f-dd26-4998-8c1c-4faa05ceed81]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=00000000-0000-0001-0000-000000000001, addrs=[10.0.0.102, 0:0:0:0:0:0:0:1, 127.0.0.1], sockAddrs=[/10.0.0.102:8025, /0:0:0:0:0:0:0:1:8025, /127.0.0.1:8025], discPort=8025, order=1, intOrder=1, loc=false, ver=1.1.0#20150520-sha1:6da491f4, isClient=false], topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage [parts={-2100569601=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0], 689859866=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0], 1325947219=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0]}, super=GridDhtPartitionsAbstractMessage [exchId=null, lastVer=GridCacheVersion [topVer=0, nodeOrderDrId=0, globalTime=0, order=1434179630276], super=GridCacheMessage [msgId=4, depInfo=null, err=null, skipPrepare=false]]], policy=SYSTEM_POOL]
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:952)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1016)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:389)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:664)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:579)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:603)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1700(GridCachePartitionExchangeManager.java:57)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:967)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:108)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=00000000-0000-0001-0000-000000000001, addrs=[10.0.0.102, 0:0:0:0:0:0:0:1, 127.0.0.1], sockAddrs=[/10.0.0.102:8025, /0:0:0:0:0:0:0:1:8025, /127.0.0.1:8025], discPort=8025, order=1, intOrder=1, loc=false, ver=1.1.0#20150520-sha1:6da491f4, isClient=false]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1574)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:138)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:949)
        ... 9 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each GridComputeTask and GridCacheTransaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=00000000-0000-0001-0000-000000000001, addrs=[/0:0:0:0:0:0:0:1:47100, /127.0.0.1:47100, /10.0.0.102:47100]]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1842)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:1671)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1612)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$4000(TcpCommunicationSpi.java:140)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$RecoveryWorker.body(TcpCommunicationSpi.java:2452)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /0:0:0:0:0:0:0:1:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:1971)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1751)
                ... 5 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: java.net.SocketTimeoutException
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1749)
                ... 5 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /10.0.0.102:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:1971)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1751)
                ... 5 more

===test program===


import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteMessaging;
import org.apache.ignite.IgniteSystemProperties;
import org.apache.ignite.Ignition;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.spi.discovery.DiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.TcpClientDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;

import java.util.Arrays;
import java.util.UUID;

public class TestIgniteClient
{
    private final static UUID serverNodeId = new UUID(1, 1);
    private final static UUID clientNodeId = UUID.randomUUID();

    public static void main(String[] args)
    {
        if ("c".equalsIgnoreCase(args[0]))
        {
            System.err.println("starting client: " + clientNodeId);

            Ignition.setClientMode(true);

            IgniteConfiguration clientConfig = createConfig(false);
            Ignite igniteClient = Ignition.start(clientConfig);

            IgniteMessaging messagingClient = igniteClient.message();
            messagingClient.localListen("test", (uuid, msg) ->
            {
                System.err.println(msg);
                return true;
            });
        }
        else
        {
            System.err.println("starting server: " + clientNodeId);

            IgniteConfiguration serverConfig = createConfig(true);

            Ignite igniteServer = Ignition.start(serverConfig);

            IgniteMessaging messagingServer = igniteServer.message();
            while (true)
            {
                messagingServer.sendOrdered("test", "I am here - " + System.currentTimeMillis(), 10_000);
                try
                {
                    Thread.sleep(1000);
                }
                catch (InterruptedException e)
                {
                    //
                }
            }
        }
    }

    public static IgniteConfiguration createConfig(boolean server)
    {
        System.setProperty(IgniteSystemProperties.IGNITE_PERFORMANCE_SUGGESTIONS_DISABLED, "true");
        System.setProperty(IgniteSystemProperties.IGNITE_UPDATE_NOTIFIER, "false");

        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setShared(false);
        ipFinder.setAddresses(Arrays.asList("127.0.0.1:8025"));

        DiscoverySpi discovery;
        UUID nodeId;
        if (server)
        {
            nodeId = serverNodeId;

            TcpDiscoverySpi discoveryImpl = new TcpDiscoverySpi();
            discoveryImpl.setLocalPort(8025);
            discoveryImpl.setLocalAddress("0.0.0.0");
            discoveryImpl.setSocketTimeout(5_000); //ms
            discoveryImpl.setNetworkTimeout(5_000); //ms
            discoveryImpl.setIpFinder(ipFinder);

            discovery = discoveryImpl;
        }
        else
        {
            nodeId = clientNodeId;

            TcpClientDiscoverySpi discoveryImpl = new TcpClientDiscoverySpi();
            discoveryImpl.setLocalAddress("0.0.0.0");
            discoveryImpl.setSocketTimeout(5_000); //ms
            discoveryImpl.setNetworkTimeout(5_000); //ms
            discoveryImpl.setIpFinder(ipFinder);

            discovery = discoveryImpl;
        }

        IgniteConfiguration gridConfiguration = new IgniteConfiguration()
                .setGridName("grid")
                .setNodeId(nodeId)
                .setMetricsLogFrequency(0)
                .setDiscoverySpi(discovery);

        return gridConfiguration;
    }
}



Regards,
Mario
yakov yakov
Reply | Threaded
Open this post in threaded view
|

Re: Client reconnect problems

Mario,

Although client discovery is in code it has never been announced. It will be dropped in the upcoming release and client mode will appear for ordinary tcp disco.

As far as reconnection - you raised a very good question. It is currently under development in new API and will be available soon.

For now please go on with TcpDiscoverySpi.

--Yakov

2015-06-13 10:25 GMT+03:00 Mario Ivankovits <[hidden email]>:
Hi list!

I started to use Ignite (1.1.0-incubating) for a network message bus where I have a server node and several client nodes using the TcpClientDiscoverySpi.

On first startup, it does not matter in which order I start my Ignite sever or client. Each other waits as expected to have the grid running. But if I kill the server, the client fails to reconnect to it.

I have a simple test for this.
Just start the class TestIgniteClient with program parameter „s“ and again with „c“ so you have two instances running. Then you should see the message „I am here“ flowing from the server to the client.
Once you kill the server process („s“) and restart it, you will get a lot of exception on the client and it will not reconnect.

Is there something I do wrong, or should I file a JIRA about that?

Thanks for your help.


===exception==
SCHWERWIEGEND: Failed to refresh partition map [oldest=00000000-0000-0001-0000-000000000001, rmts=[], loc=445a949f-dd26-4998-8c1c-4faa05ceed81]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=00000000-0000-0001-0000-000000000001, addrs=[10.0.0.102, 0:0:0:0:0:0:0:1, 127.0.0.1], sockAddrs=[/10.0.0.102:8025, /0:0:0:0:0:0:0:1:8025, /127.0.0.1:8025], discPort=8025, order=1, intOrder=1, loc=false, ver=1.1.0#20150520-sha1:6da491f4, isClient=false], topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage [parts={-2100569601=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0], 689859866=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0], 1325947219=GridDhtPartitionMap [nodeId=445a949f-dd26-4998-8c1c-4faa05ceed81, updateSeq=4, size=0]}, super=GridDhtPartitionsAbstractMessage [exchId=null, lastVer=GridCacheVersion [topVer=0, nodeOrderDrId=0, globalTime=0, order=1434179630276], super=GridCacheMessage [msgId=4, depInfo=null, err=null, skipPrepare=false]]], policy=SYSTEM_POOL]
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:952)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1016)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:389)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:664)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:579)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:603)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1700(GridCachePartitionExchangeManager.java:57)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:967)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:108)
        at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=00000000-0000-0001-0000-000000000001, addrs=[10.0.0.102, 0:0:0:0:0:0:0:1, 127.0.0.1], sockAddrs=[/10.0.0.102:8025, /0:0:0:0:0:0:0:1:8025, /127.0.0.1:8025], discPort=8025, order=1, intOrder=1, loc=false, ver=1.1.0#20150520-sha1:6da491f4, isClient=false]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1574)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:138)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:949)
        ... 9 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each GridComputeTask and GridCacheTransaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=00000000-0000-0001-0000-000000000001, addrs=[/0:0:0:0:0:0:0:1:47100, /127.0.0.1:47100, /10.0.0.102:47100]]
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1842)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:1671)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1612)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$4000(TcpCommunicationSpi.java:140)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$RecoveryWorker.body(TcpCommunicationSpi.java:2452)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /0:0:0:0:0:0:0:1:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:1971)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1751)
                ... 5 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: java.net.SocketTimeoutException
                at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1749)
                ... 5 more
        Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /10.0.0.102:47100
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1847)
                ... 5 more
        Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read remote node recovery handshake (connection closed).
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:1971)
                at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:1751)
                ... 5 more

===test program===


import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteMessaging;
import org.apache.ignite.IgniteSystemProperties;
import org.apache.ignite.Ignition;
import org.apache.ignite.configuration.IgniteConfiguration;
import org.apache.ignite.spi.discovery.DiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.TcpClientDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;

import java.util.Arrays;
import java.util.UUID;

public class TestIgniteClient
{
    private final static UUID serverNodeId = new UUID(1, 1);
    private final static UUID clientNodeId = UUID.randomUUID();

    public static void main(String[] args)
    {
        if ("c".equalsIgnoreCase(args[0]))
        {
            System.err.println("starting client: " + clientNodeId);

            Ignition.setClientMode(true);

            IgniteConfiguration clientConfig = createConfig(false);
            Ignite igniteClient = Ignition.start(clientConfig);

            IgniteMessaging messagingClient = igniteClient.message();
            messagingClient.localListen("test", (uuid, msg) ->
            {
                System.err.println(msg);
                return true;
            });
        }
        else
        {
            System.err.println("starting server: " + clientNodeId);

            IgniteConfiguration serverConfig = createConfig(true);

            Ignite igniteServer = Ignition.start(serverConfig);

            IgniteMessaging messagingServer = igniteServer.message();
            while (true)
            {
                messagingServer.sendOrdered("test", "I am here - " + System.currentTimeMillis(), 10_000);
                try
                {
                    Thread.sleep(1000);
                }
                catch (InterruptedException e)
                {
                    //
                }
            }
        }
    }

    public static IgniteConfiguration createConfig(boolean server)
    {
        System.setProperty(IgniteSystemProperties.IGNITE_PERFORMANCE_SUGGESTIONS_DISABLED, "true");
        System.setProperty(IgniteSystemProperties.IGNITE_UPDATE_NOTIFIER, "false");

        TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
        ipFinder.setShared(false);
        ipFinder.setAddresses(Arrays.asList("127.0.0.1:8025"));

        DiscoverySpi discovery;
        UUID nodeId;
        if (server)
        {
            nodeId = serverNodeId;

            TcpDiscoverySpi discoveryImpl = new TcpDiscoverySpi();
            discoveryImpl.setLocalPort(8025);
            discoveryImpl.setLocalAddress("0.0.0.0");
            discoveryImpl.setSocketTimeout(5_000); //ms
            discoveryImpl.setNetworkTimeout(5_000); //ms
            discoveryImpl.setIpFinder(ipFinder);

            discovery = discoveryImpl;
        }
        else
        {
            nodeId = clientNodeId;

            TcpClientDiscoverySpi discoveryImpl = new TcpClientDiscoverySpi();
            discoveryImpl.setLocalAddress("0.0.0.0");
            discoveryImpl.setSocketTimeout(5_000); //ms
            discoveryImpl.setNetworkTimeout(5_000); //ms
            discoveryImpl.setIpFinder(ipFinder);

            discovery = discoveryImpl;
        }

        IgniteConfiguration gridConfiguration = new IgniteConfiguration()
                .setGridName("grid")
                .setNodeId(nodeId)
                .setMetricsLogFrequency(0)
                .setDiscoverySpi(discovery);

        return gridConfiguration;
    }
}



Regards,
Mario