recoveryBallotBoxes in MvccProcessorImpl memory leak?

classic Classic list List threaded Threaded
26 messages Options
12
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi, I am on Ignite 2.7.5 with MVCC disabled for all caches (all
CacheAtomicityMode is ATOMIC)

After analysing a few heaps on the CLIENT node JVM using Eclipse MAT there
is a 'recoveryBallotBoxes' ConcurrentHashMap in the MvccProcessorImpl that
is growing infinitely in size on the heap at a constant rate and can not be
garbage collected. After 10 hours the HashMap takes 600MB of the heap and
the only solution thus far is for me to restart the JVM.

Would any of you know what this leak might be caused by and what
recoveryBallotBoxes is used for when MVCC is disabled? (and how to prevent
it from permanently growing in size).

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/mvcc/MvccProcessorImpl.java



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi team, any update/clarification on this? It is quite critical bug in
production environment as it is taking 100% CPU usage and leads to OOM /
crashes.

As more information, this is also affecting the ignite server JVMs causing
them to crash, and it seems to be assigning a mvcc coordinator node
regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.

If MVCC is disabled, should there be no MVCC coordinator node in the first
place? Nor should there be anything being populated in the Mvcc classes
(otherwise they never get processed and this leads to a memory leak).

Furthermore, is there a way to disable MVCC completely?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hello!

Since you are the first to report such problem, I recommend you to try make a reproducer project and/or file an issue against Ignite JIRA. Then, somebody will check it.

Regards,
--
Ilya Kasnacheev


ср, 30 окт. 2019 г. в 06:59, mvkarp <[hidden email]>:
Hi team, any update/clarification on this? It is quite critical bug in
production environment as it is taking 100% CPU usage and leads to OOM /
crashes.

As more information, this is also affecting the ignite server JVMs causing
them to crash, and it seems to be assigning a mvcc coordinator node
regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.

If MVCC is disabled, should there be no MVCC coordinator node in the first
place? Nor should there be anything being populated in the Mvcc classes
(otherwise they never get processed and this leads to a memory leak).

Furthermore, is there a way to disable MVCC completely?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

In reply to this post by mvkarp
Please try to capture head dumps that will show where is the leak. Share the dumps with us if the leak is not caused by the application code.

-
Denis


On Tue, Oct 29, 2019 at 11:59 PM mvkarp <[hidden email]> wrote:
Hi team, any update/clarification on this? It is quite critical bug in
production environment as it is taking 100% CPU usage and leads to OOM /
crashes.

As more information, this is also affecting the ignite server JVMs causing
them to crash, and it seems to be assigning a mvcc coordinator node
regardless of not having a single TRANSACTIONAL_SNAPSHOT atomicity cache.

If MVCC is disabled, should there be no MVCC coordinator node in the first
place? Nor should there be anything being populated in the Mvcc classes
(otherwise they never get processed and this leads to a memory leak).

Furthermore, is there a way to disable MVCC completely?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>

I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
(there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ivan Pavlukhin Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi,

Sounds like a bug. Would be great to have a ticket with reproducer.

пт, 1 нояб. 2019 г. в 03:25, mvkarp <[hidden email]>:

>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>
>
> I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
> (there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

In reply to this post by mvkarp
Hello!

Can you please show contents of some of these records, as well as their referential path to MvccProcessorImpl?

Regards,
--
Ilya Kasnacheev


пт, 1 нояб. 2019 г. в 03:25, mvkarp <[hidden email]>:
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heapanalysisMAT.jpg>

I've attached an Eclipse MAT heap analysis. As you can see MVCC is disabled
(there are no TRANSACTIONAL_SNAPSHOT caches in the cluster)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

I've created ticket, not too sure about how to go about creating a reproducer
for this - https://issues.apache.org/jira/browse/IGNITE-12350

I've attached some extra screenshots showing what is inside these records
and path to GC roots. heap.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

I've attached another set of screenshots, might be more clear.
heap.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip>  


mvkarp wrote

> I've attached some extra screenshots showing what is inside these records
> and path to GC roots. heap.zip
> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt; 
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hello!

Can you please check whether there are any especially large objects inside recoveryBallotBoxes object graph? Sorting by retained heap may help in determining this. It would be nice to know what is the type histogram of what's inside recoveryBallotBoxes and where the bulk of heap usage resides.

Regards,
--
Ilya Kasnacheev


чт, 7 нояб. 2019 г. в 06:23, mvkarp <[hidden email]>:
I've attached another set of screenshots, might be more clear.
heap.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip


mvkarp wrote
> I've attached some extra screenshots showing what is inside these records
> and path to GC roots. heap.zip
> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Let me know if these help or if you need anything more specific.
recoveryBallotBoxes.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>  


ilya.kasnacheev wrote

> Hello!
>
> Can you please check whether there are any especially large objects inside
> recoveryBallotBoxes object graph? Sorting by retained heap may help in
> determining this. It would be nice to know what is the type histogram of
> what's inside recoveryBallotBoxes and where the bulk of heap usage
> resides.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>
>> I've attached another set of screenshots, might be more clear.
>> heap.zip
>> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>>
>>
>> mvkarp wrote
>> > I've attached some extra screenshots showing what is inside these
>> records
>> > and path to GC roots. heap.zip
>> > &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hello!

How many nodes do you have in your cluster?

From the dump it seems that the number of server nodes is in thousands. Is this the case?

Regards,
--
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 10:26, mvkarp <[hidden email]>:
Let me know if these help or if you need anything more specific.
recoveryBallotBoxes.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip


ilya.kasnacheev wrote
> Hello!
>
> Can you please check whether there are any especially large objects inside
> recoveryBallotBoxes object graph? Sorting by retained heap may help in
> determining this. It would be nice to know what is the type histogram of
> what's inside recoveryBallotBoxes and where the bulk of heap usage
> resides.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>
>> I've attached another set of screenshots, might be more clear.
>> heap.zip
>> &lt;http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>>
>>
>> mvkarp wrote
>> > I've attached some extra screenshots showing what is inside these
>> records
>> > and path to GC roots. heap.zip
>> > &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi,

This is not the case. Always only a maximum total of two server nodes. One
JVM server on each. However there are many client JVMs that start and stop
caches with setClientMode=true. It looks like one of the server instances is
immune to the issue, whilst the most newly created one gets the leak, with a
lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
(one of the nodes don't get any of these partition exchanges, however the
exact server node that gets this can alternate so its not linked to one node
in particular but seems to be linked to the most newly launched server).


ilya.kasnacheev wrote

> Hello!
>
> How many nodes do you have in your cluster?
>
> From the dump it seems that the number of server nodes is in thousands. Is
> this the case?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>
>> Let me know if these help or if you need anything more specific.
>> recoveryBallotBoxes.zip
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>>
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > Can you please check whether there are any especially large objects
>> inside
>> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
>> > determining this. It would be nice to know what is the type histogram
>> of
>> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> > resides.
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
>>
>> > liquid_ninja2k@
>>
>> > &gt;:
>> >
>> >> I've attached another set of screenshots, might be more clear.
>> >> heap.zip
>> >> &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >>
>> >>
>> >> mvkarp wrote
>> >> > I've attached some extra screenshots showing what is inside these
>> >> records
>> >> > and path to GC roots. heap.zip
>> >> > &lt;
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hello!

This is very strange, since we expect this collection to be cleared on exchange.

Please make sure you don't have any stray exceptions during exchange in your logs.

Regards,
--
Ilya Kasnacheev


пт, 8 нояб. 2019 г. в 12:49, mvkarp <[hidden email]>:
Hi,

This is not the case. Always only a maximum total of two server nodes. One
JVM server on each. However there are many client JVMs that start and stop
caches with setClientMode=true. It looks like one of the server instances is
immune to the issue, whilst the most newly created one gets the leak, with a
lot of partition exchanges happening for EVT_NODE_JOINED and EVT_NODE_LEFT
(one of the nodes don't get any of these partition exchanges, however the
exact server node that gets this can alternate so its not linked to one node
in particular but seems to be linked to the most newly launched server).


ilya.kasnacheev wrote
> Hello!
>
> How many nodes do you have in your cluster?
>
> From the dump it seems that the number of server nodes is in thousands. Is
> this the case?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 8 нояб. 2019 г. в 10:26, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>
>> Let me know if these help or if you need anything more specific.
>> recoveryBallotBoxes.zip
>> <
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/recoveryBallotBoxes.zip>
>>
>>
>>
>> ilya.kasnacheev wrote
>> > Hello!
>> >
>> > Can you please check whether there are any especially large objects
>> inside
>> > recoveryBallotBoxes object graph? Sorting by retained heap may help in
>> > determining this. It would be nice to know what is the type histogram
>> of
>> > what's inside recoveryBallotBoxes and where the bulk of heap usage
>> > resides.
>> >
>> > Regards,
>> > --
>> > Ilya Kasnacheev
>> >
>> >
>> > чт, 7 нояб. 2019 г. в 06:23, mvkarp &lt;
>>
>> > liquid_ninja2k@
>>
>> > &gt;:
>> >
>> >> I've attached another set of screenshots, might be more clear.
>> >> heap.zip
>> >> &lt;
>> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >>
>> >>
>> >> mvkarp wrote
>> >> > I've attached some extra screenshots showing what is inside these
>> >> records
>> >> > and path to GC roots. heap.zip
>> >> > &lt;
>> >> http://apache-ignite-users.70518.x6.nabble.com/file/t2658/heap.zip&gt;
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

In reply to this post by ilya.kasnacheev
Hi,

There are no more exceptions or err in the logs, only hundreds of thousands
of these logs - heap usage is still increasing steeply and leak is still
present.

[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager] Topology
snapshot [ver=366003, locNode=6a9db3c2, servers=2, clients=17, state=ACTIVE,
CPUs=64, offheap=960.0GB, heap=46.0GB]
[13:46:17,632][INFO][disco-event-worker-#102][GridDiscoveryManager]   ^--
Baseline [id=0, size=2, online=0, offline=2]
[13:46:17,683][INFO][exchange-worker-#103][time] Started exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
mvccCrd=MvccCoordinator [nodeId=99624746-b624-49d6-9e36-bb6d648e9c3b,
crdVer=1571956920778, topVer=AffinityTopologyVersion [topVer=315751,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=NODE_LEFT,
evtNode=824dca07-a847-4fd7-81a5-ac0aa8644b26, customEvt=null,
allowMerge=true]
[13:46:17,685][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Finish exchange future [startVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=366003,
minorTopVer=0], err=null]
[13:46:17,708][INFO][exchange-worker-#103][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=6a9db3c2-08df-4bc2-8a26-13df50b86207,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode
[id=824dca07-a847-4fd7-81a5-ac0aa8644b26, addrs=[10.16.1.47, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, xxxxx.com.au/10.16.1.47:0], discPort=0,
order=365983, intOrder=183032, lastExchangeTime=1573393534700, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true], done=true],
topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0],
durationFromInit=21]
[13:46:17,708][INFO][exchange-worker-#103][time] Finished exchange init
[topVer=AffinityTopologyVersion [topVer=366003, minorTopVer=0], crd=false]
[13:46:17,770][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366003, minorTopVer=0], force=false, evt=NODE_LEFT,
node=824dca07-a847-4fd7-81a5-ac0aa8644b26]
[13:46:18,620][INFO][disco-event-worker-#102][GridDiscoveryManager] Added
new node to topology: TcpDiscoveryNode
[id=1115b6b7-7caf-4737-9c61-930e193468f6, addrs=[10.16.1.43, 127.0.0.1],
sockAddrs=[/127.0.0.1:0, xxxxx/10.16.1.43:0], discPort=0, order=366004,
intOrder=183041, lastExchangeTime=1573393578569, loc=false,
ver=2.7.5#20190603-sha1:be4f2a15, isClient=true]


*Lots of these warnings *
[13:49:04,673][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
too long JVM pause: 798 milliseconds.

* and sometimes this *
[13:49:04,863][INFO][exchange-worker-#103][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=366038, minorTopVer=0], force=false, evt=NODE_JOINED,
node=7b25d879-b674-4e7d-b5f7-d1c6619e0091]
[13:49:05,677][INFO][grid-nio-worker-tcp-comm-0-#72][TcpCommunicationSpi]
Accepted incoming communication connection [locAddr=/10.16.1.47:47101,
rmtAddr=/10.16.1.48:50550]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,706][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.16.1.47, rmtPort=53836]
[13:49:05,707][INFO][tcp-disco-sock-reader-#25013][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.16.1.47:53836, rmtPort=53836]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

We have frequently stopping and starting clients in short lived client JVM
processes as required for our purposes, this seems to lead to a huge bunch
of PME (but no rebalancing) and topology changes (topVer=300,000+)

Still can not figure out why this map won't clear (there are no exceptions
or err at all in the entire log)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ivan Pavlukhin Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi,

I suspect a following here. Some node treats itself as a MVCC
coordinator and creates a new RecoveryBallotBox when each client node
leaves. Some (may be all) other nodes think that MVCC is disabled and
do not send a vote (assumed for aforementioned ballot box) to MVCC
coordinator. Consequently a memory leak.

A following could be done:
1. Figure out why some node treats itself MVCC coordinator and others
think that MVCC is disabled.
2. Try to introduce some defensive matters in Ignite code to protect
from the leak in a long running cluster.

As a last chance workaround I can suggest writing custom code, which
cleans recoveryBallotBoxes map from time to time (most likely using
reflection).

пн, 11 нояб. 2019 г. в 08:53, mvkarp <[hidden email]>:

>
> We have frequently stopping and starting clients in short lived client JVM
> processes as required for our purposes, this seems to lead to a huge bunch
> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>
> Still can not figure out why this map won't clear (there are no exceptions
> or err at all in the entire log)
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
mvkarp mvkarp
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi,

Would you have any suggestion on how to implement a last chance workaround
for this issue for the server JVM?


Ivan Pavlukhin wrote

> Hi,
>
> I suspect a following here. Some node treats itself as a MVCC
> coordinator and creates a new RecoveryBallotBox when each client node
> leaves. Some (may be all) other nodes think that MVCC is disabled and
> do not send a vote (assumed for aforementioned ballot box) to MVCC
> coordinator. Consequently a memory leak.
>
> A following could be done:
> 1. Figure out why some node treats itself MVCC coordinator and others
> think that MVCC is disabled.
> 2. Try to introduce some defensive matters in Ignite code to protect
> from the leak in a long running cluster.
>
> As a last chance workaround I can suggest writing custom code, which
> cleans recoveryBallotBoxes map from time to time (most likely using
> reflection).
>
> пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;

> liquid_ninja2k@

> &gt;:
>>
>> We have frequently stopping and starting clients in short lived client
>> JVM
>> processes as required for our purposes, this seems to lead to a huge
>> bunch
>> of PME (but no rebalancing) and topology changes (topVer=300,000+)
>>
>> Still can not figure out why this map won't clear (there are no
>> exceptions
>> or err at all in the entire log)
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>
> --
> Best regards,
> Ivan Pavlukhin





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ivan Pavlukhin Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: recoveryBallotBoxes in MvccProcessorImpl memory leak?

Hi,

My first thought is deploying a service [1] (remotely dynamically
Ignite.services().deploy() or statically
IgniteConfiguration.setServiceConfiguration()) clearing problematic
map periodically.

[1] https://apacheignite.readme.io/docs/service-grid

пн, 11 нояб. 2019 г. в 13:20, mvkarp <[hidden email]>:

>
> Hi,
>
> Would you have any suggestion on how to implement a last chance workaround
> for this issue for the server JVM?
>
>
> Ivan Pavlukhin wrote
> > Hi,
> >
> > I suspect a following here. Some node treats itself as a MVCC
> > coordinator and creates a new RecoveryBallotBox when each client node
> > leaves. Some (may be all) other nodes think that MVCC is disabled and
> > do not send a vote (assumed for aforementioned ballot box) to MVCC
> > coordinator. Consequently a memory leak.
> >
> > A following could be done:
> > 1. Figure out why some node treats itself MVCC coordinator and others
> > think that MVCC is disabled.
> > 2. Try to introduce some defensive matters in Ignite code to protect
> > from the leak in a long running cluster.
> >
> > As a last chance workaround I can suggest writing custom code, which
> > cleans recoveryBallotBoxes map from time to time (most likely using
> > reflection).
> >
> > пн, 11 нояб. 2019 г. в 08:53, mvkarp &lt;
>
> > liquid_ninja2k@
>
> > &gt;:
> >>
> >> We have frequently stopping and starting clients in short lived client
> >> JVM
> >> processes as required for our purposes, this seems to lead to a huge
> >> bunch
> >> of PME (but no rebalancing) and topology changes (topVer=300,000+)
> >>
> >> Still can not figure out why this map won't clear (there are no
> >> exceptions
> >> or err at all in the entire log)
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
12