Failed to read data from remote connection

classic Classic list List threaded Threaded
11 messages Options
wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

Failed to read data from remote connection

When client (c++ node) restart mulit times,
The server and other client will throw this  excption

 ERROR o.a.i.s.c.tcp.TcpCommunicationSpi  - Failed to read data from remote
connection (will wait for 2000ms).
org.apache.ignite.IgniteCheckedException: Failed to select events on
selector.
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2135)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException: null
        at
java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1958)
        ... 3 common frames omitted



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Igor Sapego-1 Igor Sapego-1
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

Can you explain your case in more detail? I'm not quite
understand what the problem is.

Best Regards,
Igor


On Tue, Nov 27, 2018 at 1:27 PM wangsan <[hidden email]> wrote:
When client (c++ node) restart mulit times,
The server and other client will throw this  excption

 ERROR o.a.i.s.c.tcp.TcpCommunicationSpi  - Failed to read data from remote
connection (will wait for 2000ms).
org.apache.ignite.IgniteCheckedException: Failed to select events on
selector.
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2135)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException: null
        at
java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:197)
        at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1958)
        ... 3 common frames omitted



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

I have a cluster with zookeeper discovery, Eg java server node s1,java client
node jc1 and cpp client node cpp1
Sometimes when cpp1 restart ,s1 and jc1 will throw this exception  many
times

     Failed to process selector key [ses=GridSelectorNioSessionImpl

And cpp1 with have many messages likes this :
    Established outgoing communication connection
[locAddr=/0:0:0:0:0:0:0:1:35153, rmtAddr=/0:0:0:0:0:0:0:1%lo:47107]
     [TcpCommunicationSpi] Closing NIO session because of unhandled
exception

the details log is : errorlog.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1807/errorlog.zip>  











--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

As I restart cpp client many times concurrently ,may be zkcluster(ignite)
has some node path has been closed.
From cpp client logs ,
I can see zkdiscovery watch  0000000044 first,but the node has been closed
watchPath=f78ec20a-5458-47b2-86e9-7b7ed0ee4227:0e508bf8-521f-4898-9b83-fc216b35601c:81|0000000044
About 30 seconds past,Received communication error resolve request
Then it watch another path: 0000000042
watchPath=5cb0efb1-0d1b-4b54-a8b7-ac3414e7735f:23fb17f7-cdbd-4cee-991a-46041bb0fa26:81|0000000042
Then I don't why log "Start check connection process"?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Igor Sapego Igor Sapego
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

Do you shut down C++ node properly prior killing the process?

Does this exceptions impacts cluster's functionality anyhow?

Best Regards,
Igor


On Wed, Nov 28, 2018 at 8:53 AM wangsan <[hidden email]> wrote:
As I restart cpp client many times concurrently ,may be zkcluster(ignite)
has some node path has been closed.
From cpp client logs ,
I can see zkdiscovery watch  0000000044 first,but the node has been closed
watchPath=f78ec20a-5458-47b2-86e9-7b7ed0ee4227:0e508bf8-521f-4898-9b83-fc216b35601c:81|0000000044
About 30 seconds past,Received communication error resolve request
Then it watch another path: 0000000042
watchPath=5cb0efb1-0d1b-4b54-a8b7-ac3414e7735f:23fb17f7-cdbd-4cee-991a-46041bb0fa26:81|0000000042
Then I don't why log "Start check connection process"?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

Do you shut down C++ node properly prior killing the process?
    Yeath, c++ node was killed by kill -9 .not sighup. It is a wrong ops,And
I will use kill ops.

Does this exceptions impacts cluster's functionality anyhow?
    I am not sure about the exceptions. My cluster will crash with oom
(could not create native thread).And the ulimit and the config show the max
user processes is very large(64k). There are about 20 nodes in ignite. I
don't know why the cluster cost so many threads? So is this exceptions will
trigger two many socket (thread) ?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

Re: Failed to read data from remote connection

This post was updated on .
In reply to this post by Igor Sapego
Now the cluster have 100+ nodes, when 'Start check connection process'
happens,
Some node will throw oom with Direct buffer memory (java nio).
When check connections,Many nio socker will be create ,Then oom happens?

How to fix the oom except MaxDirectMemorySize and DisableExplicitGC?

Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Failed to read data from remote connection

“OOME: Direct buffer memory” means that MaxDirectMemorySize is too small.

Set a larger MaxDirectMemorySize value.

 

Stan

 

From: [hidden email]
Sent: 18 декабря 2018 г. 5:08
To: [hidden email]
Subject: Re: Failed to read data from remote connection

 

Now the cluster have 100+ nodes, when 'Start check connection process'

happens,

Some node will throw oom with Direct buffer memory (java nio).

When check connections,Many nio socker will be create ,Then oom happens?

 

How to fix the oom except larger xmx?

 

Thanks.

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

RE: Failed to read data from remote connection

Yeath, set a larger MaxDirectMemorySize .
But, I am afraid of when nodes size be more larger.The directmemory will be
larger with node sizes.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Failed to read data from remote connection

Not really. The amount of direct memory needed doesn’t grow with the node count nor the amount of data you store.

 

Stan

 

From: [hidden email]
Sent: 12 января 2019 г. 9:30
To: [hidden email]
Subject: RE: Failed to read data from remote connection

 

Yeath, set a larger MaxDirectMemorySize .

But, I am afraid of when nodes size be more larger.The directmemory will be

larger with node sizes.

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

wangsan wangsan
Reply | Threaded
Open this post in threaded view
|

RE: Failed to read data from remote connection

When check connections,Many nio socker will be create(one socker per node)
,Then direct memory will grow   up with the node count?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/