excessive timeouts and load on NODE_JOINED and NODE_LEFT events

classic Classic list List threaded Threaded
7 messages Options
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hi igniters,

Everytime a client node connects or disconnects from the grid we get
exessive number of timeouts and huge load on grid nodes. Current grid-node
metrics are the followings;

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=cbdf5b45, uptime=40 days, 08:39:18.888]
    ^-- H/N/C [hosts=50, nodes=51, CPUs=288]
    ^-- CPU [cur=50.3%, avg=11.15%, GC=0%]
    ^-- PageMemory [pages=7115259]
    ^-- Heap [used=6753MB, free=17.56%, comm=8192MB]
    ^-- Off-heap [used=28119MB, free=2.94%, comm=28972MB]
    ^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
    ^--   default region [used=28118MB, free=1.93%, comm=28672MB]
    ^--   metastoreMemPlc region [used=1MB, free=98.76%, comm=100MB]
    ^--   TxLog region [used=0MB, free=100%, comm=100MB]
    ^-- Ignite persistence [used=32000MB]
    ^--   sysMemPlc region [used=0MB]
    ^--   default region [used=32000MB]
    ^--   metastoreMemPlc region [used=unknown]
    ^--   TxLog region [used=0MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=128, qSize=0]

this log is from our internal monitoring tool;
PROBLEM: ignite-22: Warning: Processor load is high on pk-ignite: 5.485

ignite logs and gc logs;
ignite.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip>  
gc.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2515/gc.zip>  

any thougs?





-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hello!

When a client node joins or leaves, it will cause a Partition Map Exchange which includes a brief pause.

If you have requests with very short timeouts, you can see that some operations will fail.

I think there are some client node PME optimizations in the works, in the meantime consider increasing timeouts or avoiding clients' joins/leaves during operation.

Regards,
--
Ilya Kasnacheev


пн, 11 нояб. 2019 г. в 11:35, ihalilaltun <[hidden email]>:
Hi igniters,

Everytime a client node connects or disconnects from the grid we get
exessive number of timeouts and huge load on grid nodes. Current grid-node
metrics are the followings;

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=cbdf5b45, uptime=40 days, 08:39:18.888]
    ^-- H/N/C [hosts=50, nodes=51, CPUs=288]
    ^-- CPU [cur=50.3%, avg=11.15%, GC=0%]
    ^-- PageMemory [pages=7115259]
    ^-- Heap [used=6753MB, free=17.56%, comm=8192MB]
    ^-- Off-heap [used=28119MB, free=2.94%, comm=28972MB]
    ^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
    ^--   default region [used=28118MB, free=1.93%, comm=28672MB]
    ^--   metastoreMemPlc region [used=1MB, free=98.76%, comm=100MB]
    ^--   TxLog region [used=0MB, free=100%, comm=100MB]
    ^-- Ignite persistence [used=32000MB]
    ^--   sysMemPlc region [used=0MB]
    ^--   default region [used=32000MB]
    ^--   metastoreMemPlc region [used=unknown]
    ^--   TxLog region [used=0MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=128, qSize=0]

this log is from our internal monitoring tool;
PROBLEM: ignite-22: Warning: Processor load is high on pk-ignite: 5.485

ignite logs and gc logs;
ignite.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/ignite.zip
gc.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t2515/gc.zip

any thougs?





-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

This post was updated on .
Hi Ilya,

Can we restrict PME operations for client nodes explicitly? As far as I know
PME does not occur when client nodes are connected. This is a production
environment and as you may expect we have many clients joining and removing
the grid-nodes under heavy traffic.

Any suggestions except increasing timeouts and avoiding client connections?
May be a configuration that will block PME operations on clients'
joining/leaving?

by the way we did not have this issue before, server nodes are up and running at least 40 days.

which timeouts we are talking about?

Regards.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hello!

If Ignite decides to hold a PME, there is no way you can 'block' that.

If indeed these timeouts are caused by PME, you should make sure to not do actions that can trigger those.

Please note you have non-verbose logs, so it's hard to say exactly if this is the case.

Regards,
--
Ilya Kasnacheev


пн, 11 нояб. 2019 г. в 17:56, ihalilaltun <[hidden email]>:
Hi Ilya,

Can we restrict PME operations for client nodes explicitly? As far as I know
PME does not occur when client nodes are connected. This is a production
environment and as you may expect we have many clients joining and removing
the grid-nodes under heavy traffic.

Any suggestions except increasing timeouts and avoiding client connections?
May be a configuration that will block PME operations on clients'
joining/leaving?

Regards.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hi,

Timeouts always starts when NODE_JOINED event has been fired, i am not sure
if this event causes PME to take place or not.

As I said before, this is a live system and we cannot stop ignite operations
while PME is running :(

I'll try to change log level to DEBUG, if I can do that, I'll share the logs
here.

Regards



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
Maksim Stepachev Maksim Stepachev
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hi,

I suppose this fix should fix your problem https://issues.apache.org/jira/browse/IGNITE-9558 .


ср, 13 нояб. 2019 г. в 10:43, ihalilaltun <[hidden email]>:
Hi,

Timeouts always starts when NODE_JOINED event has been fired, i am not sure
if this event causes PME to take place or not.

As I said before, this is a live system and we cannot stop ignite operations
while PME is running :(

I'll try to change log level to DEBUG, if I can do that, I'll share the logs
here.

Regards



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Re: excessive timeouts and load on NODE_JOINED and NODE_LEFT events

Hi Maksim,

Thanks, i think it will

regards



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify