Re:OutOfMemoryException with Persistence and Eviction Enabled

classic Classic list List threaded Threaded
4 messages Options
Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Re:OutOfMemoryException with Persistence and Eviction Enabled

Here is the exception:

Sep 22, 2020 7:58:22 PM java.util.logging.LogManager$RootLogger log
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:607)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:464)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:491)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:59)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:35)
at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:103)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1691)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:1910)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5701)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5643)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1872)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4248)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4226)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdateLocal(GridCacheMapEntry.java:2106)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.updateAllInternal(GridLocalAtomicCache.java:929)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.access$100(GridLocalAtomicCache.java:86)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache$6.call(GridLocalAtomicCache.java:776)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=6032, loaded=2365, maxDirtyPages=1774, dirtyPages=2365, cpPages=0, pinnedInSegment=0, failedToPrepare=2366]
Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.tryToFindSequentially(PageMemoryImpl.java:2427)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2321)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.access$900(PageMemoryImpl.java:1930)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:552)
... 30 more

Sep 22, 2020 7:58:23 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]

From: [hidden email] At: 09/23/20 12:34:36
To: [hidden email]
Subject: OutOfMemoryException with Persistence and Eviction Enabled

We currently have a cache that is structured with a key of record type and a value that is a map from field id to field. So to update this cache, which has persistence enabled, we need to atomically load the value map for a key, add to that map, and write the map back to the cache. This can be done using invokeAll and a CacheEntryProcessor. However, when I test with a higher load (100k records with 50 fields each), I run into an OOM exception that I will post below. The cause of the exception is reported to be the failure to find a page to evict. However, even when I set the DataRegion's eviction threshold to .5 and the page eviction mode to RANDOM_2_LRU, I am still getting the same error. I have 2 main questions from this:

1. Why is it failing to evict a page even with a lower threshold and eviction enabled? Is it failing to reach the threshold somehow? Are non-data pages like metadata and index pages taken into account when determining if the threshold has been reached?

2. We don't have this issue when using IgniteDataStreamer to write large amounts of data to the cache, we just can't get transactional support at the same time. Why is this OOME an issue with regular cache puts but not with IgniteDataStreamer? I would think that any issues with checkpointing and eviction would also occur with IgniteDataStreamer.

aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemoryException with Persistence and Eviction Enabled

Hi,

Did you try to increase the DataRegion size a little bit? It looks like 190 MB isn't enough for some internal structures that Ignite stores in OFF-HEAP except the data.

I suggest you increase the data region size to for example 512 MB - 1024 MB and take a look at how it will work.

If you still will see the issue then I guess we should create the ticket:

1)Collect the logs
2)Provide the java code example
3)Provide the configuration of the nodes

After that, we can take a look more deeply into it and if it's an issue then file JIRA.

BR,
Andrei

9/23/2020 7:36 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) пишет:
Here is the exception:
Sep 22, 2020 7:58:22 PM java.util.logging.LogManager$RootLogger log
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:607)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:464)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:491)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:59)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:35)
at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:103)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1691)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:1910)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5701)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5643)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1872)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4248)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4226)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdateLocal(GridCacheMapEntry.java:2106)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.updateAllInternal(GridLocalAtomicCache.java:929)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.access$100(GridLocalAtomicCache.java:86)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache$6.call(GridLocalAtomicCache.java:776)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=6032, loaded=2365, maxDirtyPages=1774, dirtyPages=2365, cpPages=0, pinnedInSegment=0, failedToPrepare=2366]
Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.tryToFindSequentially(PageMemoryImpl.java:2427)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2321)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.access$900(PageMemoryImpl.java:1930)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:552)
... 30 more
Sep 22, 2020 7:58:23 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
From: [hidden email] At: 09/23/20 12:34:36
To: [hidden email] Subject: OutOfMemoryException with Persistence and Eviction Enabled
We currently have a cache that is structured with a key of record type and a value that is a map from field id to field. So to update this cache, which has persistence enabled, we need to atomically load the value map for a key, add to that map, and write the map back to the cache. This can be done using invokeAll and a CacheEntryProcessor. However, when I test with a higher load (100k records with 50 fields each), I run into an OOM exception that I will post below. The cause of the exception is reported to be the failure to find a page to evict. However, even when I set the DataRegion's eviction threshold to .5 and the page eviction mode to RANDOM_2_LRU, I am still getting the same error. I have 2 main questions from this:
1. Why is it failing to evict a page even with a lower threshold and eviction enabled? Is it failing to reach the threshold somehow? Are non-data pages like metadata and index pages taken into account when determining if the threshold has been reached?
2. We don't have this issue when using IgniteDataStreamer to write large amounts of data to the cache, we just can't get transactional support at the same time. Why is this OOME an issue with regular cache puts but not with IgniteDataStreamer? I would think that any issues with checkpointing and eviction would also occur with IgniteDataStreamer.
Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemoryException with Persistence and Eviction Enabled

In reply to this post by Mitchell Rathbun (BLOOMBERG/ 731 LEX)
I tried doubling it from 200 MB to 400 MB. My initial test worked, but as I increased the number of fields that I was writing per entry, the same issue occurred again. So it seems to just increase the capacity of what can be written, not actually prevent the exception from occurring. I guess my main question is why is it possible for Ignite to get an OOME when persistence and eviction are enabled? It seems like if there are a lot of writes, performance should degrade as the in memory cache evicts members of the cache, but no exceptions should occur. The error is always "Failed to find a page for eviction", which doesn't really make sense when eviction is enabled. What are the internal structures that Ignite holds in Off-heap memory?

Also, why isn't this an issue when using IgniteDataStreamer if the issue has to do with space for internal structures? Wouldn't Ignite need the same internal structures for either case?

From: [hidden email] At: 09/24/20 11:42:21
To: [hidden email]
Subject: Re: OutOfMemoryException with Persistence and Eviction Enabled

Hi,

Did you try to increase the DataRegion size a little bit? It looks like 190 MB isn't enough for some internal structures that Ignite stores in OFF-HEAP except the data.

I suggest you increase the data region size to for example 512 MB - 1024 MB and take a look at how it will work.

If you still will see the issue then I guess we should create the ticket:

1)Collect the logs
2)Provide the java code example
3)Provide the configuration of the nodes

After that, we can take a look more deeply into it and if it's an issue then file JIRA.

BR,
Andrei

9/23/2020 7:36 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) пишет:
Here is the exception:
Sep 22, 2020 7:58:22 PM java.util.logging.LogManager$RootLogger log
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:607)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:464)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:491)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:59)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:35)
at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:103)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1691)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:1910)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5701)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5643)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1872)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4248)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4226)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdateLocal(GridCacheMapEntry.java:2106)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.updateAllInternal(GridLocalAtomicCache.java:929)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.access$100(GridLocalAtomicCache.java:86)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache$6.call(GridLocalAtomicCache.java:776)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=6032, loaded=2365, maxDirtyPages=1774, dirtyPages=2365, cpPages=0, pinnedInSegment=0, failedToPrepare=2366]
Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.tryToFindSequentially(PageMemoryImpl.java:2427)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2321)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.access$900(PageMemoryImpl.java:1930)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:552)
... 30 more
Sep 22, 2020 7:58:23 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
From: [hidden email] At: 09/23/20 12:34:36
To: [hidden email] Subject: OutOfMemoryException with Persistence and Eviction Enabled
We currently have a cache that is structured with a key of record type and a value that is a map from field id to field. So to update this cache, which has persistence enabled, we need to atomically load the value map for a key, add to that map, and write the map back to the cache. This can be done using invokeAll and a CacheEntryProcessor. However, when I test with a higher load (100k records with 50 fields each), I run into an OOM exception that I will post below. The cause of the exception is reported to be the failure to find a page to evict. However, even when I set the DataRegion's eviction threshold to .5 and the page eviction mode to RANDOM_2_LRU, I am still getting the same error. I have 2 main questions from this:
1. Why is it failing to evict a page even with a lower threshold and eviction enabled? Is it failing to reach the threshold somehow? Are non-data pages like metadata and index pages taken into account when determining if the threshold has been reached?
2. We don't have this issue when using IgniteDataStreamer to write large amounts of data to the cache, we just can't get transactional support at the same time. Why is this OOME an issue with regular cache puts but not with IgniteDataStreamer? I would think that any issues with checkpointing and eviction would also occur with IgniteDataStreamer.

aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemoryException with Persistence and Eviction Enabled

Hi,

It looks like that I found the issue:

https://issues.apache.org/jira/browse/IGNITE-8917

When you use put or removeAll in persistence cache with more data than data region size throw IgniteOutOfMemoryException. Data streamer looks like don't affected by this ticket.

The WA is pretty simple - use bigger data region.

BR,
Andrei

9/24/2020 11:44 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) пишет:
I tried doubling it from 200 MB to 400 MB. My initial test worked, but as I increased the number of fields that I was writing per entry, the same issue occurred again. So it seems to just increase the capacity of what can be written, not actually prevent the exception from occurring. I guess my main question is why is it possible for Ignite to get an OOME when persistence and eviction are enabled? It seems like if there are a lot of writes, performance should degrade as the in memory cache evicts members of the cache, but no exceptions should occur. The error is always "Failed to find a page for eviction", which doesn't really make sense when eviction is enabled. What are the internal structures that Ignite holds in Off-heap memory?
Also, why isn't this an issue when using IgniteDataStreamer if the issue has to do with space for internal structures? Wouldn't Ignite need the same internal structures for either case?
From: [hidden email] At: 09/24/20 11:42:21
To: [hidden email] Subject: Re: OutOfMemoryException with Persistence and Eviction Enabled

Hi, Did you try to increase the DataRegion size a little bit? It looks like 190 MB isn't enough for some internal structures that Ignite stores in OFF-HEAP except the data. I suggest you increase the data region size to for example 512 MB - 1024 MB and take a look at how it will work. If you still will see the issue then I guess we should create the ticket: 1)Collect the logs 2)Provide the java code example 3)Provide the configuration of the nodes After that, we can take a look more deeply into it and if it's an issue then file JIRA. BR, Andrei

9/23/2020 7:36 PM, Mitchell Rathbun (BLOOMBERG/ 731 LEX) пишет:
Here is the exception:
Sep 22, 2020 7:58:22 PM java.util.logging.LogManager$RootLogger log
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:607)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:464)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:491)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:59)
at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.insertDataRow(CacheFreeListImpl.java:35)
at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:103)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1691)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:1910)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5701)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5643)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3719)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5900(BPlusTree.java:3613)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1895)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1872)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1779)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1638)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1621)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1935)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:428)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4248)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4226)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdateLocal(GridCacheMapEntry.java:2106)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.updateAllInternal(GridLocalAtomicCache.java:929)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.access$100(GridLocalAtomicCache.java:86)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache$6.call(GridLocalAtomicCache.java:776)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=6032, loaded=2365, maxDirtyPages=1774, dirtyPages=2365, cpPages=0, pinnedInSegment=0, failedToPrepare=2366]
Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.tryToFindSequentially(PageMemoryImpl.java:2427)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.removePageForReplacement(PageMemoryImpl.java:2321)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl$Segment.access$900(PageMemoryImpl.java:1930)
at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.allocatePage(PageMemoryImpl.java:552)
... 30 more
Sep 22, 2020 7:58:23 PM java.util.logging.LogManager$RootLogger log
SEVERE: JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Out of memory in data region [name=customformulacalcrts, initSize=190.7 MiB, maxSize=190.7 MiB, persistenceEnabled=true] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies]]
From: [hidden email] At: 09/23/20 12:34:36
To: [hidden email] Subject: OutOfMemoryException with Persistence and Eviction Enabled
We currently have a cache that is structured with a key of record type and a value that is a map from field id to field. So to update this cache, which has persistence enabled, we need to atomically load the value map for a key, add to that map, and write the map back to the cache. This can be done using invokeAll and a CacheEntryProcessor. However, when I test with a higher load (100k records with 50 fields each), I run into an OOM exception that I will post below. The cause of the exception is reported to be the failure to find a page to evict. However, even when I set the DataRegion's eviction threshold to .5 and the page eviction mode to RANDOM_2_LRU, I am still getting the same error. I have 2 main questions from this:
1. Why is it failing to evict a page even with a lower threshold and eviction enabled? Is it failing to reach the threshold somehow? Are non-data pages like metadata and index pages taken into account when determining if the threshold has been reached?
2. We don't have this issue when using IgniteDataStreamer to write large amounts of data to the cache, we just can't get transactional support at the same time. Why is this OOME an issue with regular cache puts but not with IgniteDataStreamer? I would think that any issues with checkpointing and eviction would also occur with IgniteDataStreamer.