Nodes running on different operating systems

classic Classic list List threaded Threaded
8 messages Options
Graham Bull Graham Bull
Reply | Threaded
Open this post in threaded view
|

Nodes running on different operating systems

I assume that it's a supported scenario to have an Ignite setup consisting of nodes running on different operating systems?

For instance, I've got a Windows host machine, and two virtual Linux guests. All three machines are running the same version of Apache Ignite (1.6.0) and the same version of Java (1.8.0_91).

I start the two Ignite instances on Linux, and they detect each other and stay running.
"[12:47:45] Topology snapshot [ver=2, servers=2, clients=0, CPUs=4, heap=2.0GB]"

I then start the instance on Windows. It starts and hangs for a while on:
"[12:47:50] Security status [authentication=off, tls/ssl=off]"

Then both Linux instances respond with:
"[12:48:13] Topology snapshot [ver=3, servers=1, clients=0, CPUs=2, heap=1.0GB]"

One of them then exits:
"[12:48:23] Ignite node stopped OK [uptime=00:00:39:924]"

The other stays running, as does the Windows instance:
"[12:48:23] Topology snapshot [ver=4, servers=2, clients=0, CPUs=10, heap=2.0GB]"

A few minutes later the second Linux instance exits:
"[12:55:06] Ignite node stopped OK [uptime=00:07:21:069]"

The Windows instance stays running:
"[12:55:06] Topology snapshot [ver=5, servers=1, clients=0, CPUs=8, heap=1.0GB]"

(I'm using the binary download on all machines. I haven't used Ignite before now; I spent a few hours looking at the 1.5.0-final release last week, then saw the 1.6.0 release today.)

Thanks in advance.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Yes, this is supported. Looks like your configuration is wrong or nodes stop for some external reason.

Can you attach your configuration file and the whole logs from all the nodes?

-Val
Graham Bull Graham Bull
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Hi Val,

I'm actually using a default configuration, just running bin/ignite.sh|bat without any arguments.  I'll look at creating and using configuration files instead.

Thanks,

Graham



On 23 May 2016 at 13:55, vkulichenko <[hidden email]> wrote:
Yes, this is supported. Looks like your configuration is wrong or nodes stop
for some external reason.

Can you attach your configuration file and the whole logs from all the
nodes?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Nodes-running-on-different-operating-systems-tp5098p5111.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Graham,

Default config means that multicast is used for discovery. Can you try static IP configuration [1] and see if the issue is reproduced?

[1] https://apacheignite.readme.io/docs/cluster-config#static-ip-based-discovery

-Val
Graham Bull Graham Bull
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Thanks for the suggestion, but unfortunately it makes no difference.

All three nodes are now using the same configuration, except that I've put each machine's local IP address at the top of the list:

<?xml version="1.0" encoding="UTF-8"?>
  <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="discoverySpi">
      <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="ipFinder">
          <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
            <property name="addresses">
              <list>
                <value>192.168.56.1</value> <!--windows/host-->
                <value>192.168.56.101</value> <!--linux1-->
                <value>192.168.56.102</value> <!--linux2-->
              </list>
            </property>
          </bean>
        </property>
      </bean>
    </property>
  </bean>
</beans>

I've noticed something interesting. If I start the Windows node first followed by just one of the Linux nodes, then the Linux node doesn't seem to be able to maintain a stable connection, and repeatedly connects then disconnects:

[10:00:32] Topology snapshot [ver=1, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:01:00] Topology snapshot [ver=3, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:01:41] Topology snapshot [ver=7, servers=2, clients=0, CPUs=10, heap=2.0GB]
[10:01:41] Topology snapshot [ver=7, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:02:21] Topology snapshot [ver=11, servers=2, clients=0, CPUs=10, heap=2.0GB]
[10:02:21] Topology snapshot [ver=11, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:02:42] Topology snapshot [ver=13, servers=2, clients=0, CPUs=10, heap=2.0GB]
[10:02:42] Topology snapshot [ver=13, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:06:25] Topology snapshot [ver=35, servers=2, clients=0, CPUs=10, heap=2.0GB]
[10:06:25] Topology snapshot [ver=35, servers=1, clients=0, CPUs=8, heap=1.0GB]
[10:07:46] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB]
[10:07:46] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB]

This is from the log (happens every 20 seconds):

[10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.2.15:47500, /10.0.2.15:47500, /127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false]

[10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB]

[10:07:46,036][WARNING][disco-event-worker-#46%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.2.15:47500, /10.0.2.15:47500, /127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false]

[10:07:46,036][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB]

[10:07:46,043][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=42, minorTopVer=0], evt=NODE_JOINED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100]

[10:07:46,049][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=43, minorTopVer=0], evt=NODE_FAILED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100]

[10:07:56,298][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 9760.

Thanks,

Graham


On 23 May 2016 at 16:00, vkulichenko <[hidden email]> wrote:
Graham,

Default config means that multicast is used for discovery. Can you try
static IP configuration [1] and see if the issue is reproduced?

[1]
https://apacheignite.readme.io/docs/cluster-config#static-ip-based-discovery

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Nodes-running-on-different-operating-systems-tp5098p5126.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Hi Graham,

Note that all nodes in topology have to be able to connect to each other in both directions. Looks like one of your nodes can accept connections, but can't create them (or other way around). Probably your Windows box has a firewall enabled?

-Val
Graham Bull Graham Bull
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

I've disabled the firewall on each of the three machines, but unfortunately it makes no difference :(

I need to leave this for now, but should return to it at some point later on.

Thanks for your help Val,

Graham



On 25 May 2016 at 12:57, vkulichenko <[hidden email]> wrote:
Hi Graham,

Note that all nodes in topology have to be able to connect to each other in
both directions. Looks like one of your nodes can accept connections, but
can't create them (or other way around). Probably your Windows box has a
firewall enabled?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Nodes-running-on-different-operating-systems-tp5098p5181.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Nodes running on different operating systems

Hi Graham,

The issue sounds really weird and most likely there is some kind of misconfiguration. Please let me know once you're able to get back to this.

-Val