Ignite Visor Cache command hangs indefinitely.

classic Classic list List threaded Threaded
23 messages Options
12
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Ignite Visor Cache command hangs indefinitely.

Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hi, It's 100% that.

I'm just stating that my applications run inside a container network and the Ignite is installed on it's own VMS. The networks see each other and this works. Also Visor can connect. No problems.
It's only when for example we have a dev machine connect from WIFI and while a full mesh cluster is created VISOR cannot reach that node.
Or what if a badly configured client connects and causes this issue.

All I'm saying if Ignite Visor is THE TOOL to debug and check cluster state etc... It's a bit odd that it hangs for ever if it cannot reach a specific client. I think that Visor/the protocol should know that it's a CLIENT ONLY and not try to get stats from it. What do you think?



On Thu, 13 Jun 2019 at 09:52, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello!

Visor is not the tool to debug cluster. control.sh probably is.

Visor is a node in topology (a daemon node, but still) and as such it follows the same limitations as any other node.

Regards,
--
Ilya Kasnacheev


пт, 14 июн. 2019 г. в 22:41, John Smith <[hidden email]>:
Hi, It's 100% that.

I'm just stating that my applications run inside a container network and the Ignite is installed on it's own VMS. The networks see each other and this works. Also Visor can connect. No problems.
It's only when for example we have a dev machine connect from WIFI and while a full mesh cluster is created VISOR cannot reach that node.
Or what if a badly configured client connects and causes this issue.

All I'm saying if Ignite Visor is THE TOOL to debug and check cluster state etc... It's a bit odd that it hangs for ever if it cannot reach a specific client. I think that Visor/the protocol should know that it's a CLIENT ONLY and not try to get stats from it. What do you think?



On Thu, 13 Jun 2019 at 09:52, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Ok but visor is used to get info on cache etc... So it just hangs on client's it cannot reach. Maybe it should have a timeout if it can't reach the specific node? Or does it have one but it's super high?
Or if it knows it's a client node then to handle it differently?


On Tue, 18 Jun 2019 at 10:57, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Visor is not the tool to debug cluster. control.sh probably is.

Visor is a node in topology (a daemon node, but still) and as such it follows the same limitations as any other node.

Regards,
--
Ilya Kasnacheev


пт, 14 июн. 2019 г. в 22:41, John Smith <[hidden email]>:
Hi, It's 100% that.

I'm just stating that my applications run inside a container network and the Ignite is installed on it's own VMS. The networks see each other and this works. Also Visor can connect. No problems.
It's only when for example we have a dev machine connect from WIFI and while a full mesh cluster is created VISOR cannot reach that node.
Or what if a badly configured client connects and causes this issue.

All I'm saying if Ignite Visor is THE TOOL to debug and check cluster state etc... It's a bit odd that it hangs for ever if it cannot reach a specific client. I think that Visor/the protocol should know that it's a CLIENT ONLY and not try to get stats from it. What do you think?



On Thu, 13 Jun 2019 at 09:52, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

John,

Sure, you’re right that Visor is the tool for management and monitoring. Not sure that Ilya’s statement makes a practical sense.

Looping in our Visor experts. Alexey, Yury, could you please check out the issue?

Denis

On Tuesday, June 18, 2019, John Smith <[hidden email]> wrote:
Ok but visor is used to get info on cache etc... So it just hangs on client's it cannot reach. Maybe it should have a timeout if it can't reach the specific node? Or does it have one but it's super high?
Or if it knows it's a client node then to handle it differently?


On Tue, 18 Jun 2019 at 10:57, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Visor is not the tool to debug cluster. control.sh probably is.

Visor is a node in topology (a daemon node, but still) and as such it follows the same limitations as any other node.

Regards,
--
Ilya Kasnacheev


пт, 14 июн. 2019 г. в 22:41, John Smith <[hidden email]>:
Hi, It's 100% that.

I'm just stating that my applications run inside a container network and the Ignite is installed on it's own VMS. The networks see each other and this works. Also Visor can connect. No problems.
It's only when for example we have a dev machine connect from WIFI and while a full mesh cluster is created VISOR cannot reach that node.
Or what if a badly configured client connects and causes this issue.

All I'm saying if Ignite Visor is THE TOOL to debug and check cluster state etc... It's a bit odd that it hangs for ever if it cannot reach a specific client. I think that Visor/the protocol should know that it's a CLIENT ONLY and not try to get stats from it. What do you think?



On Thu, 13 Jun 2019 at 09:52, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 


--
--
Denis Magda

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Correct and this is a pure practical issue. I can even imagine scenario where you have a cluster and for compliance reasons visor is running in a demilitarized zone.

And all I'm saying is that the visor CACHE command or any for that matter should not hang waiting to connect to specific clients.

It should maybe timeout indicate so and get the info it has at least. Or maybe just give us the server nodes/info if available.

That's where I would like your opinion on it.

On Tue, 18 Jun 2019 at 22:51, Denis Magda <[hidden email]> wrote:
John,

Sure, you’re right that Visor is the tool for management and monitoring. Not sure that Ilya’s statement makes a practical sense.

Looping in our Visor experts. Alexey, Yury, could you please check out the issue?

Denis

On Tuesday, June 18, 2019, John Smith <[hidden email]> wrote:
Ok but visor is used to get info on cache etc... So it just hangs on client's it cannot reach. Maybe it should have a timeout if it can't reach the specific node? Or does it have one but it's super high?
Or if it knows it's a client node then to handle it differently?


On Tue, 18 Jun 2019 at 10:57, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Visor is not the tool to debug cluster. control.sh probably is.

Visor is a node in topology (a daemon node, but still) and as such it follows the same limitations as any other node.

Regards,
--
Ilya Kasnacheev


пт, 14 июн. 2019 г. в 22:41, John Smith <[hidden email]>:
Hi, It's 100% that.

I'm just stating that my applications run inside a container network and the Ignite is installed on it's own VMS. The networks see each other and this works. Also Visor can connect. No problems.
It's only when for example we have a dev machine connect from WIFI and while a full mesh cluster is created VISOR cannot reach that node.
Or what if a badly configured client connects and causes this issue.

All I'm saying if Ignite Visor is THE TOOL to debug and check cluster state etc... It's a bit odd that it hangs for ever if it cannot reach a specific client. I think that Visor/the protocol should know that it's a CLIENT ONLY and not try to get stats from it. What do you think?



On Thu, 13 Jun 2019 at 09:52, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Please enable verbose logging and share logs from both visor, client and server nodes, so that we could check that.

There should be messages related to connection attempts.

Regards,
--
Ilya Kasnacheev


чт, 13 июн. 2019 г. в 00:06, John Smith <[hidden email]>:
The clients are in the same low latency network, but they are running inside container network. While ignite is running on it's own cluster. So from that stand point they all see each other...

On Wed, 12 Jun 2019 at 17:04, John Smith <[hidden email]> wrote:
Ok thanks

On Mon, 10 Jun 2019 at 04:48, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

As a rule, a faulty thick client can destabilize a cluster. Ignite's architecture assumes that all clients are collocated, i.e. that the network between any two nodes (including clients) is reliable, fast and low-latency.

It is not recommended to connect thick clients from different networks. Use thin clients where possible.

You can file a ticket against Apache Ignite JIRA regarding visor behavior if you like.

Regards,
--
Ilya Kasnacheev


пт, 7 июн. 2019 г. в 23:15, John Smith <[hidden email]>:
Correct. Should it not at least timeout and at least show what it has available? Basically we have a central cluster and various clients connect to it from different networks. As an example: Docker containers.

We make sure that the clients are client nodes only and we avoid creating any caches on clients.

On Fri, 7 Jun 2019 at 10:19, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think that Visor will talk to all nodes when trying to run caches command, and if it can't reach client nodes the operation will never finish.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 22:34, John Smith <[hidden email]>:
Hi, any thoughts on this?

On Fri, 31 May 2019 at 10:21, John Smith <[hidden email]> wrote:
I think it should at least time out and show stats of the nodes it could reach? I don't see why it's dependant on client nodes.

On Thu, 30 May 2019 at 11:58, John Smith <[hidden email]> wrote:
Sorry pressed enter to quickly....

So basically I'm 100% sure if visor cache command cannot reach the client node then it just stays there not doing anything.

On Thu, 30 May 2019 at 11:57, John Smith <[hidden email]> wrote:
Hi, running 2.7.0

- I have a 4 node cluster and it seems to be running ok.
- I have clients connecting and doing what they need to do.
- The clients are set as client = true.
- The clients are also connecting from various parts of the network.

The problem with ignite visor cache command is if visor cannot reach a specific client node it just seems to hang indefinitely.

Choose node number ('c' to cancel) [0]: c
visor> cache

It just stays like that no errors printed nothing... 


--
--
Denis Magda

Vasiliy Sisko Vasiliy Sisko
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello @javadevmtl

I failed to reproduce your problem.
In case of any error in cache command Visor CMD shows message "No caches
found".
Please provide logs of visor, server and client nodes after command hang.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Ok, where do I look for the visor logs when it hangs? And it's not a no caches issue the cluster works great. It when visor cannot reach a specific client node.

On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[hidden email]> wrote:
Hello @javadevmtl

I failed to reproduce your problem.
In case of any error in cache command Visor CMD shows message "No caches
found".
Please provide logs of visor, server and client nodes after command hang.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Actually this hapenned when the WIFI node connected. But it never hapenned before...

[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0, /0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59, intOrder=32, lastExchangeTime=1561042306599, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], durationFromInit=0]
[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true]
[14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED, node=45516c37-5ee0-4046-a13a-9573607d25aa]
[14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100, rmtAddr=/MY_WIFI_IP:62249]
[14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626, startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995], checkpointLockWait=0ms, checkpointLockHoldTime=12ms, walCpRecordFsyncDuration=3ms, pages=242, reason='timeout']
[14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242, markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms, pagesWrite=14ms, fsync=101ms, total=138ms]
[14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi] Local node seems to be disconnected from topology (failure detection timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500]
[14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s]
[14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861, rmtPort=56861
[14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread [name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6, waitCnt=24621465]

[14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

        [14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx] Possible too long JVM pause: 2047 milliseconds.
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607, rmtPort=37607]
        [14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor] Thread dump at 2019/06/20 14:52:47 UTC
        Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
            Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
                at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

        Thread [name="Thread-6972", id=33108, state=TIMED_WAITING, blockCnt=0, waitCnt=17]
            Lock [object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
                at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
                at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)
        

On Thu, 20 Jun 2019 at 10:08, John Smith <[hidden email]> wrote:
Ok, where do I look for the visor logs when it hangs? And it's not a no caches issue the cluster works great. It when visor cannot reach a specific client node.

On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[hidden email]> wrote:
Hello @javadevmtl

I failed to reproduce your problem.
In case of any error in cache command Visor CMD shows message "No caches
found".
Please provide logs of visor, server and client nodes after command hang.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

Hello!

It is recommended to turn off failure detection since its default config is not very convenient. Maybe it is also fixed in 2.7.5.

This just means some operation took longer than expected and Ignite panicked.

Regards,

чт, 20 июн. 2019 г., 19:28 John Smith <[hidden email]>:
Actually this hapenned when the WIFI node connected. But it never hapenned before...

[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0, /0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59, intOrder=32, lastExchangeTime=1561042306599, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], durationFromInit=0]
[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true]
[14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED, node=45516c37-5ee0-4046-a13a-9573607d25aa]
[14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100, rmtAddr=/MY_WIFI_IP:62249]
[14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626, startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995], checkpointLockWait=0ms, checkpointLockHoldTime=12ms, walCpRecordFsyncDuration=3ms, pages=242, reason='timeout']
[14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242, markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms, pagesWrite=14ms, fsync=101ms, total=138ms]
[14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi] Local node seems to be disconnected from topology (failure detection timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500]
[14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s]
[14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861, rmtPort=56861
[14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread [name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6, waitCnt=24621465]

[14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

        [14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx] Possible too long JVM pause: 2047 milliseconds.
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607, rmtPort=37607]
        [14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor] Thread dump at 2019/06/20 14:52:47 UTC
        Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
            Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
                at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

        Thread [name="Thread-6972", id=33108, state=TIMED_WAITING, blockCnt=0, waitCnt=17]
            Lock [object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
                at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
                at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)
        

On Thu, 20 Jun 2019 at 10:08, John Smith <[hidden email]> wrote:
Ok, where do I look for the visor logs when it hangs? And it's not a no caches issue the cluster works great. It when visor cannot reach a specific client node.

On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[hidden email]> wrote:
Hello @javadevmtl

I failed to reproduce your problem.
In case of any error in cache command Visor CMD shows message "No caches
found".
Please provide logs of visor, server and client nodes after command hang.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Visor Cache command hangs indefinitely.

How to turn it off?

Also i think i know what may have been the visor issue. I was connecting to cluster not specifying ports 47500..47509. But once I added that it seems more stable. I can even see the wifi node and everything.


On Fri, 21 Jun 2019 at 06:01, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

It is recommended to turn off failure detection since its default config is not very convenient. Maybe it is also fixed in 2.7.5.

This just means some operation took longer than expected and Ignite panicked.

Regards,

чт, 20 июн. 2019 г., 19:28 John Smith <[hidden email]>:
Actually this hapenned when the WIFI node connected. But it never hapenned before...

[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=e9e9f4b9-b249-4a4d-87ee-fc97097ad9ee, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=45516c37-5ee0-4046-a13a-9573607d25aa, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, MY_WIFI_IP, MY_WIFI_IP], sockAddrs=[/MY_WIFI_IP:0, /0:0:0:0:0:0:0:1:0, /127.0.0.1:0, /MY_WIFI_IP:0], discPort=0, order=59, intOrder=32, lastExchangeTime=1561042306599, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], durationFromInit=0]
[14:51:46,660][INFO][exchange-worker-#43%xxxxxx%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=59, minorTopVer=0], crd=true]
[14:51:46,662][INFO][exchange-worker-#43%xxxxxx%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=59, minorTopVer=0], force=false, evt=NODE_JOINED, node=45516c37-5ee0-4046-a13a-9573607d25aa]
[14:51:47,123][INFO][grid-nio-worker-tcp-comm-2-#26%xxxxxx%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/xxx.xxx.xxx.69:47100, rmtAddr=/MY_WIFI_IP:62249]
[14:51:59,428][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=56e2ea25-7273-49ab-81ac-0fdbc5945626, startPtr=FileWALPointer [idx=137, fileOff=45790479, len=17995], checkpointLockWait=0ms, checkpointLockHoldTime=12ms, walCpRecordFsyncDuration=3ms, pages=242, reason='timeout']
[14:51:59,544][INFO][db-checkpoint-thread-#1068%xxxxxx%][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=56e2ea25-7273-49ab-81ac-0fdbc5945626, pages=242, markPos=FileWALPointer [idx=137, fileOff=45790479, len=17995], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=23ms, pagesWrite=14ms, fsync=101ms, total=138ms]
[14:52:45,827][INFO][tcp-disco-msg-worker-#2%xxxxxx%][TcpDiscoverySpi] Local node seems to be disconnected from topology (failure detection timeout is reached) [failureDetectionTimeout=10000, connCheckInterval=500]
[14:52:45,847][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker, blockedFor=39s]
[14:52:45,859][INFO][tcp-disco-sock-reader-#36%xxxxxx%][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/xxx.xxx.xxx.76:56861, rmtPort=56861
[14:52:45,864][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][G] Thread [name="tcp-disco-msg-worker-#2%xxxxxx%", id=83, state=RUNNABLE, blockCnt=6, waitCnt=24621465]

[14:52:45,875][SEVERE][ttl-cleanup-worker-#1652%xxxxxx%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-disco-msg-worker, igniteInstanceName=xxxxxx, finished=false, heartbeatTs=1561042326687]
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1831)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1826)
        at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:233)
        at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:151)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

        [14:52:47,974][WARNING][jvm-pause-detector-worker][IgniteKernal%xxxxxx] Possible too long JVM pause: 2047 milliseconds.
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,994][INFO][tcp-disco-srvr-#3%xxxxxx%][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/xxx.xxx.xxx.72, rmtPort=37607]
        [14:52:47,996][INFO][tcp-disco-sock-reader-#37%xxxxxx%][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/xxx.xxx.xxx.72:37607, rmtPort=37607]
        [14:52:48,005][WARNING][ttl-cleanup-worker-#1652%xxxxxx%][FailureProcessor] Thread dump at 2019/06/20 14:52:47 UTC
        Thread [name="sys-#25624%xxxxxx%", id=33109, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
            Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3a9414a4, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
                at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

        Thread [name="Thread-6972", id=33108, state=TIMED_WAITING, blockCnt=0, waitCnt=17]
            Lock [object=java.util.concurrent.SynchronousQueue$TransferStack@62bdd75c, ownerName=null, ownerId=-1]
                at sun.misc.Unsafe.park(Native Method)
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
                at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
                at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
                at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)
        

On Thu, 20 Jun 2019 at 10:08, John Smith <[hidden email]> wrote:
Ok, where do I look for the visor logs when it hangs? And it's not a no caches issue the cluster works great. It when visor cannot reach a specific client node.

On Thu., Jun. 20, 2019, 8:45 a.m. Vasiliy Sisko, <[hidden email]> wrote:
Hello @javadevmtl

I failed to reproduce your problem.
In case of any error in cache command Visor CMD shows message "No caches
found".
Please provide logs of visor, server and client nodes after command hang.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
12