How to do address resolution?

classic Classic list List threaded Threaded
21 messages Options
12
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

How to do address resolution?

I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...



jstack.out (41K) Download Attachment
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

It’s not that Visor connects to a thick client, it’s that it is a thick client. There are some weird implementation details — like it’s written in Scala and using “daemon mode” — but it becomes part of the cluster, so the same “rules” apply as any other thick client. Connections to other nodes are “on demand” so the pause it likely to be it trying to open a communicationSPI connection to one of the other nodes.

I agree that this is not necessarily intuitive for an administrative tool, but it’s what we have until all the functionality can be provided by a thin client or using the REST API.

On 1 Jul 2020, at 16:38, John Smith <[hidden email]> wrote:

So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...




dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

In reply to this post by javadevmtl
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Hi, yes I figured that visor is just another thick client. 

By using address resolver on my thick client applications inside container everything works fine and visor also connects properly (no need to add all client configs everywhere).

As stated it just adds tiny delay when visor needs to connect to the other clients. And of course the "issue" when it fully blocks because it can't reach the client even though it knows the client is there.

I dunno if I'm the only one who is using mixed environment. But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[hidden email]> wrote:
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

What is the tread you're referring to? Visor always will be connecting to the clients regardless of your deployment configuration. 

 Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

You're always welcome. Could you point out the documentation you used to configure the AdressResolver? Agree, we need to document or blog about best practices.
  
-
Denis


On Wed, Jul 1, 2020 at 10:49 AM John Smith <[hidden email]> wrote:
Hi, yes I figured that visor is just another thick client. 

By using address resolver on my thick client applications inside container everything works fine and visor also connects properly (no need to add all client configs everywhere).

As stated it just adds tiny delay when visor needs to connect to the other clients. And of course the "issue" when it fully blocks because it can't reach the client even though it knows the client is there.

I dunno if I'm the only one who is using mixed environment. But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[hidden email]> wrote:
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

If you look for the "what does all partition owners have left mean?" thread.

There is mention to improve the protocol so that other nodes don't need to connect to clients running inside containers... It links to another thread indicating that there may be a PR to add a flag of some sort to mark the client as "virtualized" or something like that...


And nothing is mentioned elsewhere in the official docs.

On Wed., Jul. 1, 2020, 2:22 p.m. Denis Magda, <[hidden email]> wrote:
But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

What is the tread you're referring to? Visor always will be connecting to the clients regardless of your deployment configuration. 

 Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

You're always welcome. Could you point out the documentation you used to configure the AdressResolver? Agree, we need to document or blog about best practices.
  
-
Denis


On Wed, Jul 1, 2020 at 10:49 AM John Smith <[hidden email]> wrote:
Hi, yes I figured that visor is just another thick client. 

By using address resolver on my thick client applications inside container everything works fine and visor also connects properly (no need to add all client configs everywhere).

As stated it just adds tiny delay when visor needs to connect to the other clients. And of course the "issue" when it fully blocks because it can't reach the client even though it knows the client is there.

I dunno if I'm the only one who is using mixed environment. But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[hidden email]> wrote:
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Sorry, mixed the thread, it the one that asks if server nodes connect back to thick clients and it was you who mentioned the new feature...

On Wed., Jul. 1, 2020, 4:03 p.m. John Smith, <[hidden email]> wrote:
If you look for the "what does all partition owners have left mean?" thread.

There is mention to improve the protocol so that other nodes don't need to connect to clients running inside containers... It links to another thread indicating that there may be a PR to add a flag of some sort to mark the client as "virtualized" or something like that...


And nothing is mentioned elsewhere in the official docs.

On Wed., Jul. 1, 2020, 2:22 p.m. Denis Magda, <[hidden email]> wrote:
But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

What is the tread you're referring to? Visor always will be connecting to the clients regardless of your deployment configuration. 

 Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

You're always welcome. Could you point out the documentation you used to configure the AdressResolver? Agree, we need to document or blog about best practices.
  
-
Denis


On Wed, Jul 1, 2020 at 10:49 AM John Smith <[hidden email]> wrote:
Hi, yes I figured that visor is just another thick client. 

By using address resolver on my thick client applications inside container everything works fine and visor also connects properly (no need to add all client configs everywhere).

As stated it just adds tiny delay when visor needs to connect to the other clients. And of course the "issue" when it fully blocks because it can't reach the client even though it knows the client is there.

I dunno if I'm the only one who is using mixed environment. But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[hidden email]> wrote:
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...


dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Thanks, John. That connectivity improvement fixes situations when a server needs to open a connection to a client but fails. The client will be opening the connection instead after getting a special message via the discovery networking layer. It won’t improve the communication between Visor and clients.

We’ll document the address resolver in the future. Thanks for pointers.

Denis

On Wednesday, July 1, 2020, John Smith <[hidden email]> wrote:
Sorry, mixed the thread, it the one that asks if server nodes connect back to thick clients and it was you who mentioned the new feature...

On Wed., Jul. 1, 2020, 4:03 p.m. John Smith, <[hidden email]> wrote:
If you look for the "what does all partition owners have left mean?" thread.

There is mention to improve the protocol so that other nodes don't need to connect to clients running inside containers... It links to another thread indicating that there may be a PR to add a flag of some sort to mark the client as "virtualized" or something like that...


And nothing is mentioned elsewhere in the official docs.

On Wed., Jul. 1, 2020, 2:22 p.m. Denis Magda, <[hidden email]> wrote:
But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

What is the tread you're referring to? Visor always will be connecting to the clients regardless of your deployment configuration. 

 Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

You're always welcome. Could you point out the documentation you used to configure the AdressResolver? Agree, we need to document or blog about best practices.
  
-
Denis


On Wed, Jul 1, 2020 at 10:49 AM John Smith <[hidden email]> wrote:
Hi, yes I figured that visor is just another thick client. 

By using address resolver on my thick client applications inside container everything works fine and visor also connects properly (no need to add all client configs everywhere).

As stated it just adds tiny delay when visor needs to connect to the other clients. And of course the "issue" when it fully blocks because it can't reach the client even though it knows the client is there.

I dunno if I'm the only one who is using mixed environment. But you guys also mentioned in my other thread that you are working on a feature that doesn't require connecting to the client when it's running inside a container.

Anyways thanks for creating an issue and as well just wondering if any docs should be updated for containers because I found the BasicAddresResolver java doc by chance.

On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[hidden email]> wrote:
Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to server nodes and thick clients. It's connected as a daemon node that is filtered out from metrics and other public APIs. That's why you don't see Visor being reported in the cluster topology metrics along with servers or thick clients: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the cluster and communicate with other cluster members:
  • Discovery SPI - As any server node or a thick client, Visor will join the cluster by connecting to one of the server nodes. It will use an IP Finder that you set in your IgniteConfiguration file. Once Visor joins the cluster, it will collect information about the cluster topology and display these basic metrics to you in a terminal window. Visor receives this information about the cluster topology through the server node used to join the cluster. The same server node will update Visor on any topology changes.
  • Communication SPI - Whenever Visor needs to get metrics from a specific server or thick client, it will open a direct TCP/IP connection with the server/client. In your case, it failed to reach out to some clients and hung. The hanging is not the right way of handling this type of issues and I've opened a ticket to address this: https://issues.apache.org/jira/browse/IGNITE-13201
Considering this implementation specificities, I can recommend you do one of the following:
  • List all the thick clients in the AddressResolver configuration. This is required. Hope my explanation above makes things clear for you.
  • Or, run Visor from inside the private network. You would need to ssh to one of your machines. With this, you don't need to deal with AddressResolvers.
  • Or, use contemporary tools for Ignite cluster monitoring. Ignite supports JMX and OpenCensus protocols that allow you to consume metrics from tools like Zabbix or Prometheus. You deploy a tool inside of your private network so that it can collect metrics from the cluster and open a single port number for those who will observe the metrics via a tool's user interface. If you need both monitoring and *management* capabilities, then have a look at GridGain Control Center.
-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[hidden email]> wrote:
So this is what I gathered from this experience.

When running commands on Visor's console, Visor will attempt to connect to the thick client.

For example if you type the "node" command and attempt to get detailed statistics for a specific thick client, Visor will pause on the data region stats until it can connect.

Furthermore if you have multiple thick clients and Visor has not connected to some of them yet and you call a more global command like "cache", this command will also pause until a connection has been made to all thick clients.

1- Whether this is good behaviour or not is up for debate. Especially the part when a thick client is listed in the topology/nodes but cannot be reached and visor hangs indefinitely.
2- Not sure if this behaviour in any way affects the server node if they ever attempt to open a connection to a thick client and the protocol somehow freezes just like #1 above.

On Tue, 30 Jun 2020 at 09:54, John Smith <[hidden email]> wrote:
Ok so. Is this expected behaviour? From user perspective this seems like a bug.

Visor is supposed to be used as a way to monitor...

So if as a user we enter a command and it just freezes indefinently it just seems unfriendly.

In another thread the the team mentioned that they are working on something that does not require the protocol to communicate back to a thick client. So wondering if this is in a way related as well...

On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <[hidden email]> wrote:
Hello!

I can see the following in the thread dump:
"main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable [0x00007f02cad1e000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(Native Method)
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
- locked <0x00000000ec066048> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
- locked <0x00000000ec066038> (a java.lang.Object)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
at org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
at org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
at org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
at org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
at org.apache.ignite.visor.visor$.execute(visor.scala:1800)

It seems that Visor is trying to connect to client node via Communication, and it fails because the network connection is filtered out.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 23:47, John Smith <[hidden email]>:
Ok.

I am able to reproduce the "issue" unless we have a misunderstanding and we are talking about the same thing...

My thick client runs inside a container in a closed network NOT bridged and NOT host. I added a flag to my application that allows it to add the address resolver to the config.

1- If I disable address resolution and I connect with visor to the cluster and try to print detailed statistics for that particular client, visor freezes indefinitely at the Data Region Snapshot. 
Control C doesn't kill the visor either. It just stuck. This also happens when running the cache command. Just freezes indefinitely.

I attached the jstack output to the email but it is also here: https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0

2- If I enable address resolution for the thick client then all the commands work ok. I also see an "Accepted incoming communication connection" log in the client.









On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

The easiest way is jstack <process id of visor>

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 20:20, John Smith <[hidden email]>:
How?

On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Try collecting thread dump from Visor as it freezes.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 18:11, John Smith <[hidden email]>:
How though?

1- Entered node command
2- Got list of nodes, including thick clients
3- Selected thick client
4- Entered Y for detailed statistics
5- Snapshot details displayed
6- Data region stats frozen

I think the address resolution is working for this as well. I need to confirm. Because I fixed the resolver as per your solution and visor no longer freezes on #6 above.

On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

This usually means there's no connectivity between node and Visor.

Regards,
--
Ilya Kasnacheev


пн, 29 июн. 2020 г. в 17:01, John Smith <[hidden email]>:
Also I think for Visor as well?

When I do top or node commands, I can see the thick client. But when I look at detailed statistics for that particular thick client it freezes "indefinitely". Regular statistics it seems ok.

On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

For thick clients, you need both 47100 and 47500, both directions (perhaps for 47500 only client -> server is sufficient, but for 47100, both are needed).

For thin clients, 10800 is enough. For control.sh, 11211.

Regards,
--
Ilya Kasnacheev


пт, 26 июн. 2020 г. в 22:06, John Smith <[hidden email]>:
I'm askin in separate question so people can search for it if they ever come across this...

My server nodes are started as and I also connect the client as such.

                  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                      <property name="addresses">
                          <list>
                            <value>foo:47500</value>
...
                          </list>
                      </property>
                  </bean>

In my client code I used the basic address resolver

And I put in the map

"{internalHostIP}:47500", "{externalHostIp}:{externalPort}"

igniteConfig.setAddressResolver(addrResolver);

QUESTIONS
___________________

1- Port 47500 is used for discovery only?
2- Port 47100 is used for actual coms to the nodes?
3- In my container environment I have only mapped 47100, do I also need to map for 47500 for the Tcp Discovery SPI?
4- When I connect with Visor and I try to look at details for the client node it blocks. I'm assuming that's because visor cannot connect back to the client at 47100?
Se logs below

LOGS
___________________

When I look at the client logs I get...

IgniteConfiguration [
igniteInstanceName=xxxxxx,
...
discoSpi=TcpDiscoverySpi [
  addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
  commSpi=TcpCommunicationSpi [
...
    locAddr=null,
    locHost=null,
    locPort=47100,
    addrRslvr=null, <--- Do I need to use BasicResolver or here???
...
    ],
...
    addrRslvr=BasicAddressResolver [
      inetAddrMap={},
      inetSockAddrMap={/internalIp:47100=/externalIp:2389} <---- 
    ],
...
    clientMode=true,
...




--
-
Denis

Humphrey Humphrey
Reply | Threaded
Open this post in threaded view
|

Re: How to do address resolution?

Not sure if this will help, I've also had issues with Visor keeps hanging the
cluster.

When I changed the configuration to ClientMode (default is ServerMode) it
solved my problem. Might be good to give it a try.

Humphrey



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
12