Ignite Cluster Communication with SSH Tunnels

classic Classic list List threaded Threaded
17 messages Options
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Ignite Cluster Communication with SSH Tunnels

asked by jake waffle

My team is interested in having our Ignite cluster communicate on the loopback through ssh tunnels. We want the cluster to think that each node exists on the loopback essentially, but in reality the ssh tunnels will forward the packets to various machines.

So far, it seems that Ignite will first talk through the ssh tunnels for establishing connections, but then Ignite will start sending packets directly to each other. The first node's discovery spi is set to the default (multicast) and the second (joining) node uses a static ip discovery that points to the ssh tunnel on the loopback.

Is there a way to get this to work through Ignite configurations? Is the AddressResolver a possible option? I know that Ignite supports SSL, but we have our reasons for doing things this way.

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

commented by jake waffle

I just tried using the IgniteConfiguration's setLocalHost() method to set the local host for each node to 127.0.0.1. That almost worked out. The joining node was able to try and communicate through the ssh tunnel on 127.0.0.1:47500. However the existing node was replying to the host and port that the ssh tunnel was sending the message from (not localhost and not port 47501.)

If there was a way to make the existing node send the acknowledgement to the ssh tunnel on its loopback, then this would all work out perfectly.

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

commented by dmitriy setrakyan

I am actually not sure, but I raised a question on the dev list. Let's see what other community members think.

http://apache-ignite-developers.2346864.n4.nabble.com/configuring-communication-and-discovery-through-SSH-channels-td347.html

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

commented by alexey goncharuk

Currently there is no direct way in Ignite to configure this scenario. Possibly there is a workaround, however I cannot put in my head the configuration of your cluster. How many nodes do you have and do you set up SSH tunnels? Your scenario basically means that you need to set up an SSH tunnel between each pair of nodes in topology in both directions because communication uses point-to-point connection.

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

I have a simpler scenario in mind:
just one client connecting to the cluster of say 50 servers via ssh-tunnel.

I would be ready to define 50 tunnels forwarding to 50 servers on the cluster, but it does not seem supported??

Discovery works via a single tunnel, static-based discovery of the multicast-based cluster and AddressResolver, but TcpCommunicationSpi does not, as the discovery returns list of internal addresses of the cluster which are private of course, i.e. node from the list of nodes is defined as such:

TcpDiscoveryNode [id=e8744d31-ce77-43d6-b3de-09dfa22516dc, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.168.22], sockAddrs=[0:0:0:0:0:0:0:1%lo:48500, /127.0.0.1:48500, /192.168.168.22:48500], discPort=48500, order=2, intOrder=2, lastExchangeTime=1475454891191, loc=false, ver=1.7.0#19700101-sha1:00000000, isClient=false]

and 192.168.168.22 addrs is used. I would imagine that AddressResolver could be used for TcpCommunicationSpi that would translate both 192.168.168.22:COMMS_PORT into my localhost:SSH_TUNNELED_PORT_FOR_192.168.168.22.

Is it possible?

Am I missing something, the same problem would be with NATed/Firewalled addresses always, except the localhost the router/firewall address would be returned?

Thanks
Krzysztof
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Hi Krzysztof,

This issue was fixed some time ago, so in 1.8 Ignite will publish public addresses in the IP finder. You can with the nightly build [1] in the meantime.

[1] https://cwiki.apache.org/confluence/display/IGNITE/Nightly+Builds

-Val
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Thanks for the hint!
I was trying to use BasicAddressResolver from 1.7.1, but it was never
called for the list returned by the discovery.
I reckon the whole cluster must be the same version? With 1.8 on the
client only, even discovery does not work, just stays in the endless
loop.
I will convert the full cluster (not only client outside the cluster) into 1.8.

I noticed also some different behaviour with log4j module in 1.8 -
should it be configured differently?


for completeness trimmed sample code I am trying to run:
<java>

System.setProperty("IGNITE_QUIET", "false");

Ignition.setClientMode(true);

TcpDiscoverySpi spi = new TcpDiscoverySpi();

TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();


// Set initial IP addresses.

// Note that you can optionally specify a port or a port range.

ipFinder.setAddresses(Arrays.asList("localhost:48500")); // this is
pointing to one of the cluster nodes via the ssh tunnel, works in 1.7,
list is returned, not in 1.8 vs 1.7

spi.setIpFinder(ipFinder);

spi.setLocalAddress("localhost");

Map<String,String> resolverAddresses = new HashMap<String,String>();

for(int i=0; i< 55 ;i++){

 String orig = String.format("192.168.168.%d:47100", 1+i); / I would
expect this IPs returned
 String dest = String.format("localhost:%03d", 47101 + i);

resolverAddresses.put(orig, dest);

}

BasicAddressResolver basicResolver = null;

try {

basicResolver = new BasicAddressResolver(resolverAddresses);

} catch (UnknownHostException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}



spi.setAddressResolver(basicResolver);

IgniteConfiguration cfg = new IgniteConfiguration();



// Override default discovery SPI.

cfg.setDiscoverySpi(spi);
cfg.setAddressResolver(basicResolver); // it does not change anything
(spi, commSpi have the resolve set too).

cfg.setFailureDetectionTimeout(15000);

// Explicitly configure TCP communication SPI by changing local port number for

// the nodes from the first cluster.

TcpCommunicationSpi commSpi=new TcpCommunicationSpi();


commSpi.setLocalPort(47099); //exposed via reverse channel -R:47099:...

commSpi.setLocalPortRange(1);

commSpi.setLocalAddress("localhost");

commSpi.setAddressResolver(basicResolver);

commSpi.setSharedMemoryPort(-1);

// Overriding communication SPI.

cfg.setCommunicationSpi(commSpi);

// logging

cfg.setGridLogger(new Log4JLogger("config/ignite-log4j.xml")); //has
effect in 1.7 but not in 1.8


// Start Ignite node.

Ignite ignite = Ignition.start(cfg);


CacheConfiguration cacheCfg = new CacheConfiguration("run_cache");

cacheCfg.setCacheMode(CacheMode.PARTITIONED);

cacheCfg.setBackups(0);
</java>

Any hints?

Cheers

On Tue, Oct 4, 2016 at 1:15 AM, vkulichenko [via Apache Ignite Users]
<[hidden email]> wrote:

> Hi Krzysztof,
>
> This issue was fixed some time ago, so in 1.8 Ignite will publish public
> addresses in the IP finder. You can with the nightly build [1] in the
> meantime.
>
> [1] https://cwiki.apache.org/confluence/display/IGNITE/Nightly+Builds
>
> -Val
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8066.html
> To unsubscribe from Ignite Cluster Communication with SSH Tunnels, click
> here.
> NAML
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

All nodes should run on the same version.

What is exactly different in log4j behavior?

-Val
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

On the 1.8 problem stays the same (I switched off by mistake 48500
before - I switched it on again now): node connects, is listed as part
of the cluster, discovery sends list of nodes with internal addresses
and they are not translated on the local node, so even the node is
listed as part of the cluster, exceptions appear that the client
cannot connect to any of the nodes from the send list...

Now it fails on

   ctx.io().send(node, new
GridDhtAffinityAssignmentRequest(key.get1(), key.get2()),
at GridDhtAssignmentFetchFuture:185
trying to connect to port 48500 of the first node from the
availableNodes list. Why is it connecting to discovery port, even it
had fetched the cluster info already?

<log>

[02:25:57,195][INFO ][main][IgniteKernal] Non-loopback local IPs:
192.168.0.102, fe80:0:0:0:6e40:8ff:fe91:c0a4%en0,
fe80:0:0:0:9485:8fff:fec1:9de8%awdl0

[02:25:57,195][INFO ][main][IgniteKernal] Enabled local MACs:
6C400891C0A4, 96858FC19DE8

[02:25:57,213][INFO ][main][IgnitePluginProcessor] Configured plugins:

[02:25:57,213][INFO ][main][IgnitePluginProcessor]   ^-- None

[02:25:57,213][INFO ][main][IgnitePluginProcessor]

[02:25:59,804][INFO ][main][TcpCommunicationSpi] Successfully bound to
TCP port [port=47099, locHost=localhost/127.0.0.1]

[02:26:00,451][WARN ][main][NoopCheckpointSpi] Checkpoints are
disabled (to enable configure any GridCheckpointSpi implementation)

[02:26:00,486][WARN ][main][GridCollisionManager] Collision resolution
is disabled (all jobs will be activated upon arrival).

[02:26:00,489][WARN ][main][NoopSwapSpaceSpi] Swap space is disabled.
To enable use FileSwapSpaceSpi.

[02:26:00,491][INFO ][main][IgniteKernal] Security status
[authentication=off, tls/ssl=off]

[02:26:00,776][INFO ][main][GridTcpRestProtocol] Command protocol
successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0,
port=11211]

[02:26:06,180][INFO ][main][GridCacheProcessor] Started cache
[name=ignite-sys-cache, mode=REPLICATED]

[02:26:06,194][INFO ][main][GridCacheProcessor] Started cache
[name=ignite-atomics-sys-cache, mode=PARTITIONED]

[02:26:06,214][INFO ][main][GridCacheProcessor] Started cache
[name=ignite-marshaller-sys-cache, mode=REPLICATED]

[02:26:22,657][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
Connect timed out (consider increasing 'failureDetectionTimeout'
configuration property) [addr=/192.168.168.8:47100,
failureDetectionTimeout=15000]

[02:26:22,660][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
Connect timed out (consider increasing 'failureDetectionTimeout'
configuration property) [addr=/127.0.0.1:47100,
failureDetectionTimeout=15000]

[02:26:22,661][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
Failed to connect to a remote node (make sure that destination node is
alive and operating system firewall is disabled on local and remote
hosts) [addrs=[/192.168.168.8:47100, /127.0.0.1:47100,
0:0:0:0:0:0:0:1%lo:47100]]

[02:26:36,223][WARN ][main][GridCachePartitionExchangeManager] Failed
to wait for initial partition map exchange. Possible reasons are:

  ^-- Transactions in deadlock.

  ^-- Long running transactions (ignore if this is the case).

  ^-- Unreleased explicit locks.

[02:26:54,741][ERROR][exchange-worker-#46%null%][GridDhtAssignmentFetchFuture]
Failed to request affinity assignment from remote node (will continue
to another node): TcpDiscoveryNode
[id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:0:0

:0:0:1%lo:48500, /127.0.0.1:48500, /192.168.168.8:48500],
discPort=48500, order=218, intOrder=110,
lastExchangeTime=1475627164474, loc=false,
ver=1.8.0#20161004-sha1:a370bad1, isClient=false]

class org.apache.ignite.IgniteCheckedException: Failed to send message
(node may have left the grid or TCP connection cannot be established
due to firewall issues) [node=TcpDiscoveryNode
[id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:

0:0:0:0:1%lo:48500, /127.0.0.1:48500, /192.168.168.8:48500],
discPort=48500, order=218, intOrder=110,
lastExchangeTime=1475627164474, loc=false,
ver=1.8.0#20161004-sha1:a370bad1, isClient=false], topic=TOPIC_CACHE,
msg=GridDhtAffinityAssignmentRequest [topVer=AffinityTopologyVersion
[topVer=276, min

orTopVer=0]], policy=4]

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1373)

at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:841)

at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.requestFromNextNode(GridDhtAssignmentFetchFuture.java:185)

at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.init(GridDhtAssignmentFetchFuture.java:107)

at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.fetchAffinityOnJoin(CacheAffinitySharedManager.java:953)

at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onClientEvent(CacheAffinitySharedManager.java:639)

at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClientNodeEvent(GridDhtPartitionsExchangeFuture.java:619)

at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:464)

at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447)

at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)

at java.lang.Thread.run(Thread.java:745)

Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to
send message to remote node: TcpDiscoveryNode
[id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:0:0:0:0:1%lo:48500,
/127.0.0.1:48500, /192.168.168.8:48500], discPort=48

500, order=218, intOrder=110, lastExchangeTime=1475627164474,
loc=false, ver=1.8.0#20161004-sha1:a370bad1, isClient=false]

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2013)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1951)

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)

</log>


What am I missing?

In 1.8 I am unable to have any DEBUG messages with the config that
worked with 1.7, both  log4j and ignite-log4j from 1.8 on the
classpath.

Cheers
Krzysztof

On Tue, Oct 4, 2016 at 11:27 PM, vkulichenko [via Apache Ignite Users]
<[hidden email]> wrote:

> All nodes should run on the same version.
>
> What is exactly different in log4j behavior?
>
> -Val
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8090.html
> To unsubscribe from Ignite Cluster Communication with SSH Tunnels, click
> here.
> NAML
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

In reply to this post by vkulichenko
Correction of the problem interpretation, it's not discovery that
fails now but the (system?) cache bootstrapping. How to use the
AddressResolver then?

Thanks

On Wed, Oct 5, 2016 at 2:27 AM, yazuna <[hidden email]> wrote:

> On the 1.8 problem stays the same (I switched off by mistake 48500
> before - I switched it on again now): node connects, is listed as part
> of the cluster, discovery sends list of nodes with internal addresses
> and they are not translated on the local node, so even the node is
> listed as part of the cluster, exceptions appear that the client
> cannot connect to any of the nodes from the send list...
>
> Now it fails on
>
>    ctx.io().send(node, new
> GridDhtAffinityAssignmentRequest(key.get1(), key.get2()),
> at GridDhtAssignmentFetchFuture:185
> trying to connect to port 48500 of the first node from the
> availableNodes list. Why is it connecting to discovery port, even it
> had fetched the cluster info already?
>
> <log>
>
> [02:25:57,195][INFO ][main][IgniteKernal] Non-loopback local IPs:
> 192.168.0.102, fe80:0:0:0:6e40:8ff:fe91:c0a4%en0,
> fe80:0:0:0:9485:8fff:fec1:9de8%awdl0
>
> [02:25:57,195][INFO ][main][IgniteKernal] Enabled local MACs:
> 6C400891C0A4, 96858FC19DE8
>
> [02:25:57,213][INFO ][main][IgnitePluginProcessor] Configured plugins:
>
> [02:25:57,213][INFO ][main][IgnitePluginProcessor]   ^-- None
>
> [02:25:57,213][INFO ][main][IgnitePluginProcessor]
>
> [02:25:59,804][INFO ][main][TcpCommunicationSpi] Successfully bound to
> TCP port [port=47099, locHost=localhost/127.0.0.1]
>
> [02:26:00,451][WARN ][main][NoopCheckpointSpi] Checkpoints are
> disabled (to enable configure any GridCheckpointSpi implementation)
>
> [02:26:00,486][WARN ][main][GridCollisionManager] Collision resolution
> is disabled (all jobs will be activated upon arrival).
>
> [02:26:00,489][WARN ][main][NoopSwapSpaceSpi] Swap space is disabled.
> To enable use FileSwapSpaceSpi.
>
> [02:26:00,491][INFO ][main][IgniteKernal] Security status
> [authentication=off, tls/ssl=off]
>
> [02:26:00,776][INFO ][main][GridTcpRestProtocol] Command protocol
> successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0,
> port=11211]
>
> [02:26:06,180][INFO ][main][GridCacheProcessor] Started cache
> [name=ignite-sys-cache, mode=REPLICATED]
>
> [02:26:06,194][INFO ][main][GridCacheProcessor] Started cache
> [name=ignite-atomics-sys-cache, mode=PARTITIONED]
>
> [02:26:06,214][INFO ][main][GridCacheProcessor] Started cache
> [name=ignite-marshaller-sys-cache, mode=REPLICATED]
>
> [02:26:22,657][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
> Connect timed out (consider increasing 'failureDetectionTimeout'
> configuration property) [addr=/192.168.168.8:47100,
> failureDetectionTimeout=15000]
>
> [02:26:22,660][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
> Connect timed out (consider increasing 'failureDetectionTimeout'
> configuration property) [addr=/127.0.0.1:47100,
> failureDetectionTimeout=15000]
>
> [02:26:22,661][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi]
> Failed to connect to a remote node (make sure that destination node is
> alive and operating system firewall is disabled on local and remote
> hosts) [addrs=[/192.168.168.8:47100, /127.0.0.1:47100,
> 0:0:0:0:0:0:0:1%lo:47100]]
>
> [02:26:36,223][WARN ][main][GridCachePartitionExchangeManager] Failed
> to wait for initial partition map exchange. Possible reasons are:
>
>   ^-- Transactions in deadlock.
>
>   ^-- Long running transactions (ignore if this is the case).
>
>   ^-- Unreleased explicit locks.
>
> [02:26:54,741][ERROR][exchange-worker-#46%null%][GridDhtAssignmentFetchFuture]
> Failed to request affinity assignment from remote node (will continue
> to another node): TcpDiscoveryNode
> [id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
> 127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:0:0
>
> :0:0:1%lo:48500, /127.0.0.1:48500, /192.168.168.8:48500],
> discPort=48500, order=218, intOrder=110,
> lastExchangeTime=1475627164474, loc=false,
> ver=1.8.0#20161004-sha1:a370bad1, isClient=false]
>
> class org.apache.ignite.IgniteCheckedException: Failed to send message
> (node may have left the grid or TCP connection cannot be established
> due to firewall issues) [node=TcpDiscoveryNode
> [id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
> 127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:
>
> 0:0:0:0:1%lo:48500, /127.0.0.1:48500, /192.168.168.8:48500],
> discPort=48500, order=218, intOrder=110,
> lastExchangeTime=1475627164474, loc=false,
> ver=1.8.0#20161004-sha1:a370bad1, isClient=false], topic=TOPIC_CACHE,
> msg=GridDhtAffinityAssignmentRequest [topVer=AffinityTopologyVersion
> [topVer=276, min
>
> orTopVer=0]], policy=4]
>
> at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)
>
> at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1373)
>
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:841)
>
> at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.requestFromNextNode(GridDhtAssignmentFetchFuture.java:185)
>
> at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.init(GridDhtAssignmentFetchFuture.java:107)
>
> at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.fetchAffinityOnJoin(CacheAffinitySharedManager.java:953)
>
> at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onClientEvent(CacheAffinitySharedManager.java:639)
>
> at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClientNodeEvent(GridDhtPartitionsExchangeFuture.java:619)
>
> at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:464)
>
> at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447)
>
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to
> send message to remote node: TcpDiscoveryNode
> [id=337d89f0-c14a-4b5d-8d33-b995331dcf63, addrs=[0:0:0:0:0:0:0:1%lo,
> 127.0.0.1, 192.168.168.8], sockAddrs=[0:0:0:0:0:0:0:1%lo:48500,
> /127.0.0.1:48500, /192.168.168.8:48500], discPort=48
>
> 500, order=218, intOrder=110, lastExchangeTime=1475627164474,
> loc=false, ver=1.8.0#20161004-sha1:a370bad1, isClient=false]
>
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2013)
>
> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1951)
>
> at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)
>
> </log>
>
>
> What am I missing?
>
> In 1.8 I am unable to have any DEBUG messages with the config that
> worked with 1.7, both  log4j and ignite-log4j from 1.8 on the
> classpath.
>
> Cheers
> Krzysztof
>
> On Tue, Oct 4, 2016 at 11:27 PM, vkulichenko [via Apache Ignite Users]
> <[hidden email]> wrote:
>> All nodes should run on the same version.
>>
>> What is exactly different in log4j behavior?
>>
>> -Val
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8090.html
>> To unsubscribe from Ignite Cluster Communication with SSH Tunnels, click
>> here.
>> NAML
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Hi,

Try to set the address resolver on the IgniteConfiguration instead of TcpDiscoverySpi. It looks like discovery works, but further communication doesn't.

-Val
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Thanks for the hint, but in the snippet I sent there's already:

// Override default discovery SPI.

cfg.setDiscoverySpi(spi);
cfg.setAddressResolver(basicResolver); // it does not change anything
(spi, commSpi have the resolve set too).

Or do you mean something else?

Cheers

On Wed, Oct 5, 2016 at 7:20 PM, vkulichenko [via Apache Ignite Users]
<[hidden email]> wrote:

> Hi,
>
> Try to set the address resolver on the IgniteConfiguration instead of
> TcpDiscoverySpi. It looks like discovery works, but further communication
> doesn't.
>
> -Val
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8106.html
> To unsubscribe from Ignite Cluster Communication with SSH Tunnels, click
> here.
> NAML
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

You set 47099 as a communication port, but I don't see how it's mapped in the resolver. I think this is the reason.

-Val
Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

I also added 48500 mappings, so discovery could use it, but both do not use resolver for whatever reason - AddressResolver is only used for the client address - none of cluster addresses gets mapped.

The only place resolver is called seems to be this one in TcpDiscoverySpi or TcpCommunicationSpi:

<java>

 IgniteBiTuple<Collection<String>, Collection<String>> addrs = U.resolveLocalAddresses(locHost);<java


            Collection<InetSocketAddress> extAddrs = addrRslvr == null ? null :

                U.resolveAddresses(addrRslvr, F.flat(Arrays.asList(addrs.get1(), addrs.get2())), boundTcpPort);

</java>

but these are only to resolve local addresses. So I think I am missing something fundamental - I would think resolver purpose is to map addresses from the cluster, which are behind the NAT/SSH tunnel. 
What am I missing?


Log does not mention any problems with  port 47099:
<log>

01:21:39,674][INFO ][main][GridCacheProcessor] Started cache [name=ignite-atomics-sys-cache, mode=PARTITIONED]

[01:21:39,690][INFO ][main][GridCacheProcessor] Started cache [name=ignite-marshaller-sys-cache, mode=REPLICATED]

[01:21:42,540][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi] Connect timed out (consider increasing 'failureDetectionTimeout' configuration property) [addr=/192.168.168.5:47100, failureDetectionTimeout=15000]

[01:21:42,541][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi] Connect timed out (consider increasing 'failureDetectionTimeout' configuration property) [addr=/127.0.0.1:47100, failureDetectionTimeout=15000]

[01:21:42,541][WARN ][exchange-worker-#46%null%][TcpCommunicationSpi] Failed to connect to a remote node (make sure that destination node is alive and operating system firewall is disabled on local and remote hosts) [addrs=[/192.168.168.5:47100, /127.0.0.1:47100, 0:0:0:0:0:0:0:1%lo:47100]]

[01:21:44,594][ERROR][exchange-worker-#46%null%][GridDhtAssignmentFetchFuture] Failed to request affinity assignment from remote node (will continue to another node): TcpDiscoveryNode [id=8148bc4e-6348-4a7a-95ed-14e403b9c615, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.168.5], sockAddrs=[/192.168.

168.5:48500, 0:0:0:0:0:0:0:1%lo:48500, /127.0.0.1:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1475796099172, loc=false, ver=1.8.0#20161004-sha1:a370bad1, isClient=false]

class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=8148bc4e-6348-4a7a-95ed-14e403b9c615, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.168.5], sockAddrs=[/192.1

68.168.5:48500, 0:0:0:0:0:0:0:1%lo:48500, /127.0.0.1:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1475796099172, loc=false, ver=1.8.0#20161004-sha1:a370bad1, isClient=false], topic=TOPIC_CACHE, msg=GridDhtAffinityAssignmentRequest [topVer=AffinityTopologyVersion [topVer=67, minorTop

Ver=0]], policy=4]

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1309)

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1373)

at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:841)

at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.requestFromNextNode(GridDhtAssignmentFetchFuture.java:185)

at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtAssignmentFetchFuture.init(GridDhtAssignmentFetchFuture.java:107)

at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.fetchAffinityOnJoin(CacheAffinitySharedManager.java:953)

at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onClientEvent(CacheAffinitySharedManager.java:639)

at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClientNodeEvent(GridDhtPartitionsExchangeFuture.java:619)

at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:464)

at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1447)

at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)

at java.lang.Thread.run(Thread.java:745)

Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send message to remote node: TcpDiscoveryNode [id=8148bc4e-6348-4a7a-95ed-14e403b9c615, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.168.5], sockAddrs=[/192.168.168.5:48500, 0:0:0:0:0:0:0:1%lo:48500, /127.0.0.1:48500], discPort=48

500, order=1, intOrder=1, lastExchangeTime=1475796099172, loc=false, ver=1.8.0#20161004-sha1:a370bad1, isClient=false]

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2013)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1951)

at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1304)

... 11 more

Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=8148bc4e-6348-4a7a-95ed-14e403b9c615

, addrs=[/192.168.168.5:47100, /127.0.0.1:47100, 0:0:0:0:0:0:0:1%lo:47100]]

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2519)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2157)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2051)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1985)

... 13 more

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /192.168.168.5:47100

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2521)

... 16 more

Caused by: java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2380)

... 16 more

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: /127.0.0.1:47100

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2521)

... 16 more

Caused by: java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2380)

... 16 more

Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to address: 0:0:0:0:0:0:0:1%lo:47100

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2521)

... 16 more

Caused by: java.net.UnknownHostException

at sun.nio.ch.Net.translateException(Net.java:155)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:127)

at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2380)

... 16 more

[01:22:09,700][WARN ][main][GridCachePartitionExchangeManager] Failed to wait for initial partition map exchange. Possible reasons are:

  ^-- Transactions in deadlock.

  ^-- Long running transactions (ignore if this is the case).

...
</log>

I would appreciate clarifying this..
Krzysztof

On Fri, Oct 7, 2016 at 12:58 AM, vkulichenko [via Apache Ignite Users] <[hidden email]> wrote:
You set 47099 as a communication port, but I don't see how it's mapped in the resolver. I think this is the reason.

-Val


If you reply to this email, your message will be added to the discussion below:
http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8134.html
To unsubscribe from Ignite Cluster Communication with SSH Tunnels, click here.
NAML

Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

We would really like to use Ignite for our project

(http://gaia.esac.esa.int/documentation/GDR1/Data_analysis/sec_cu7var/sec_cu7introduction.html which is part of http://sci.esa.int/gaia/) but this is a blocker.

Do you think we could make client behind the NAT working?

Best regards,
Krzysztof

Krzysztof Krzysztof
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Hello,

We would appreciate confirmation that GridDhtAssignmentFetchFuture::requestFromNextNode() makes impossible to have a client communicating via NAT, even with AddressResolver - or a pointer where we make a mistake in the code/logic.
Judging by the code, GridDhtAssignmentFetchFuture has no notion of AddressResolver at all and will always try to contact returned nodes from behind the NAT..

Is there any way to make it work? WE must make a decision if to use Ignite in our project and this seems to be a blocker..

Best Regards
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Cluster Communication with SSH Tunnels

Hi Krzysztof,

Definitely, AddressResolver can be used for the purposes when you need to connect nodes that are behind NAT with the other ones.

Basically, in an AddressResolver configuration each node should simply list a mapping of its private addresses to its external addresses. After that when a node joins a cluster it sends its mapped addresses to the rest of the cluster so that every other node can connect to it through TcpCommunicationSpi.

In my understanding there is a miss-configuration at the level of AddressResolver, TcpCommunicationSpi or TcpDiscoverySpi. Please double check that:
  1. Every cluster node provides private->public addresses mapping of its own IPs in the AddressResolver.
  2. TcpDiscoverySpi of every node contains only public addresses.
  3. All the ports that are used by both TcpDiscoverySpi and TcpCommunicationSpi are opened.
If nothing helps please share a full configuration you use.

--
Denis


On Tue, Oct 11, 2016 at 4:27 PM, Krzysztof <[hidden email]> wrote:
Hello,

We would appreciate confirmation that
GridDhtAssignmentFetchFuture::requestFromNextNode() makes impossible to have
a client communicating via NAT, even with AddressResolver - or a pointer
where we make a mistake in the code/logic.
Judging by the code, GridDhtAssignmentFetchFuture has no notion of
AddressResolver at all and will always try to contact returned nodes from
behind the NAT..

Is there any way to make it work? WE must make a decision if to use Ignite
in our project and this seems to be a blocker..

Best Regards



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Ignite-Cluster-Communication-with-SSH-Tunnels-tp273p8225.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.