How to stop a node from dying when memory is full?

classic Classic list List threaded Threaded
6 messages Options
colinc colinc
Reply | Threaded
Open this post in threaded view
|

How to stop a node from dying when memory is full?

In a system that is not using native persistence, what is the recommended way
of stopping a cluster from running out of memory - or stopping it from
crashing when it does?

As per the below jira, memory monitoring appears to be unreliable in the
latest version of Ignite:
https://issues.apache.org/jira/browse/IGNITE-12096

Even when working, this is an estimate that is updated periodically, which
makes it hard to reliably avoid a critical OOM in a system that is rapidly
filling caches.

It is technically possible to create a custom failure handler - but I
understand that trapping the failure in this way is considered to be bad
practice, since it can leave Ignite in an inconsistent state.

How are people addressing this challenge?

Regards,
Colin.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ilya Kasnacheev Ilya Kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: How to stop a node from dying when memory is full?

Hello!

You can try enabling Page Eviction, in this case pages with K-V pairs contained in them will be dropped.

Regards,

On 2019/08/29 11:54:11, colinc <[hidden email]> wrote:

> In a system that is not using native persistence, what is the recommended way
> of stopping a cluster from running out of memory - or stopping it from
> crashing when it does?
>
> As per the below jira, memory monitoring appears to be unreliable in the
> latest version of Ignite:
> https://issues.apache.org/jira/browse/IGNITE-12096
>
> Even when working, this is an estimate that is updated periodically, which
> makes it hard to reliably avoid a critical OOM in a system that is rapidly
> filling caches.
>
> It is technically possible to create a custom failure handler - but I
> understand that trapping the failure in this way is considered to be bad
> practice, since it can leave Ignite in an inconsistent state.
>
> How are people addressing this challenge?
>
> Regards,
> Colin.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
colinc colinc
Reply | Threaded
Open this post in threaded view
|

Re: How to stop a node from dying when memory is full?

Thanks - that does seem to be effective at stopping the OOM condition at
least.

Is there any way to determine which cache entries were affected by the page
expiry, do you know? The EVT_CACHE_ENTRY_EVICTED doesn't seem to get fired
in this case as far as I can tell. Is that your expectation?

This is important for performing clean-up of related cache entries to ensure
referential integrity.

Regards,
Colin.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
danami danami
Reply | Threaded
Open this post in threaded view
|

Re: How to stop a node from dying when memory is full?

In reply to this post by Ilya Kasnacheev
I'd like to extend Colin's question.

What if I'm using TRANSACTIONAL_SNAPSHOT mode, therefore I can't use Page
Eviction?
How then, other than persistence, can I avoid OOM errors?
Can I write a custom failure handler, to clear the cache/data region for
example? Is it technically possible? (If so, how?) Will it work or is it bad
and inconsistent as Colin suggested?
Is using persistence my only option to avoid OOM errors or do I have other
choices?

Thank you for your help,
Dana




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mcherkasov mcherkasov
Reply | Threaded
Open this post in threaded view
|

Re: How to stop a node from dying when memory is full?

Hi Dana,

Do you have java.lang.OOM or IgniteOOM ?

if you use TRANSACTIONAL_SNAPSHOT then it's transactional data, page eviction means that you just remove some random pages, so it doesn't make sense to have TRANSACTIONAL_SNAPSHOT mode and remove random data at the same time, it just destroys the whole idea of transactions and data consistency. 
Might be I miss something about your case, but I would say if you have TRANSACTIONAL_SNAPSHOT you just can not use page eviction, if you can use page eviction, then don't use TRANSACTIONAL_SNAPSHOT.

Regarding custom failure handler, it should work, I don't see any reason why it shouldn't, I would really appreciate if you will send us some update about this approach.

Thanks,
Mike.


On Thu, Aug 27, 2020 at 2:13 AM danami <[hidden email]> wrote:
I'd like to extend Colin's question.

What if I'm using TRANSACTIONAL_SNAPSHOT mode, therefore I can't use Page
Eviction?
How then, other than persistence, can I avoid OOM errors?
Can I write a custom failure handler, to clear the cache/data region for
example? Is it technically possible? (If so, how?) Will it work or is it bad
and inconsistent as Colin suggested?
Is using persistence my only option to avoid OOM errors or do I have other
choices?

Thank you for your help,
Dana




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Thanks,
Mikhail.
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: How to stop a node from dying when memory is full?

In reply to this post by danami
Folks, this article should be relevant to you. It covers all techniques to avoid the OOM except for swapping:

You can use swapping as a part of your toolbox to survive the time when a node is running out of the memory space:

-
Denis


On Thu, Aug 27, 2020 at 2:13 AM danami <[hidden email]> wrote:
I'd like to extend Colin's question.

What if I'm using TRANSACTIONAL_SNAPSHOT mode, therefore I can't use Page
Eviction?
How then, other than persistence, can I avoid OOM errors?
Can I write a custom failure handler, to clear the cache/data region for
example? Is it technically possible? (If so, how?) Will it work or is it bad
and inconsistent as Colin suggested?
Is using persistence my only option to avoid OOM errors or do I have other
choices?

Thank you for your help,
Dana




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/