Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

classic Classic list List threaded Threaded
5 messages Options
pvprsd pvprsd
Reply | Threaded
Open this post in threaded view
|

Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

This post was updated on .
Hi,



evictDataPage() always leads to ignite cluster blocked, due to some reason.

This method does not seem to consider the freelist, which is still have some/many pages available. But evictDataPage() still trying to evict few entries from filled pages, and after sometime (in few mins, after it reached evictionThreshold memory); it is not getting any pages/entries to evict. It started reporting "Too many failed attempts to evict page: 30".



My igniteconfiguration as follows:

    DataRegionConfiguration

        dataRegionConfig.setMaxSize(8L * 1024 * 1024 * 1024)//8GB



dataRegionConfig.setPageEvictionMode(DataPageEvictionMode.RANDOM_LRU)//tried

LRU2 as well

        ...

        igniteDataCfg.setPageSize(pageSizeKB)//16KB



       Ignite version - 2.8.0



Using only Off-Heap for caching. DataRegion persistence is disabled, as we have 3rd party persistence configured with read-through & write-through enabled.



When I tried different evictionThreshold, still got the same result. Not sure, what is the problem with my configuration.



Many thanks in advance for your help.


ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

Hi Prasad,

What operations do you run on the cluster? What is the size of objects? Is it possible to share full logs from nodes? Do you have some kind of small reproducer for this issue? It would be really helpful.

Thanks,
Evgenii

пн, 5 окт. 2020 г. в 07:53, Prasad Pillala <[hidden email]>:

Hi,

 

evictDataPage() always leads to ignite cluster blocked, due to some reason.

This method does not seem to consider the freelist, which is still have some/many pages available. But evictDataPage() still trying to evict few entries from filled pages, and after sometime (in few mins, after it reached evictionThreshold memory); it is not getting any pages/entries to evict. It started reporting "Too many failed attempts to evict page: 30".

 

My igniteconfiguration as follows:

    DataRegionConfiguration

        dataRegionConfig.setMaxSize(8L * 1024 * 1024 * 1024)//8GB

      

dataRegionConfig.setPageEvictionMode(DataPageEvictionMode.RANDOM_LRU)//tried

LRU2 as well

        ...

        igniteDataCfg.setPageSize(pageSizeKB)//16KB

 

       Ignite version - 2.8.0

 

Using only Off-Heap for caching. DataRegion persistence is disabled, as we have 3rd party persistence configured with read-through & write-through enabled.

 

When I tried different evictionThreshold, still got the same result. Not sure, what is the problem with my configuration.

 

Many thanks in advance for your help.

 

 

Stay ahead of today’s supply chain complexities with Luminate Control Tower. Start a free 30-day trial here!

pvprsd pvprsd
Reply | Threaded
Open this post in threaded view
|

Re: Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

This post was updated on .
Hi Evgenii,

We are using EntryProcessor to update/insert objects into cache, in addition
to GET cache object calls. Multiple services are making these calls using
IgniteClients. We have enabled NearCache on Ignite Clients.
Size of the objects: We will figure out today, and we will post it here
today.
Share Logs: I am attaching the logs here. Currently I am running 4 ignite
server nodes, out of 1 ignite node reported this problem.

Reproducer: I will try to create it.

Many thanks for your help.
ignite3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t3000/ignite3.zip

Few more questions on this:

Looks like the evictInternal is taking lot of time executing below lines of code. Do you have any idea, what is the functionality inside evictInternal? Does it involve any network calls in this section, which might be delaying this operation. Network calls to other ignite server nodes/client nodes to check the entry?



Thread [name="sys-stripe-12-#13", id=65, state=RUNNABLE, blockCnt=57, waitCnt=14996]
        at java.lang.ThreadLocal$ThreadLocalMap.cleanSomeSlots(ThreadLocal.java:661)
        at java.lang.ThreadLocal$ThreadLocalMap.set(ThreadLocal.java:483)
        at java.lang.ThreadLocal$ThreadLocalMap.access$100(ThreadLocal.java:298)
        at java.lang.ThreadLocal.set(ThreadLocal.java:203)
        at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:483)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.node(TcpDiscoveryNodesRing.java:336)
        at o.a.i.spi.discovery.tcp.ServerImpl.getNode(ServerImpl.java:331)
        at o.a.i.spi.discovery.tcp.TcpDiscoverySpi.getNode(TcpDiscoverySpi.java:478)
        at o.a.i.i.managers.discovery.GridDiscoveryManager.getAlive(GridDiscoveryManager.java:1627)
        at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheEntry.checkReadersLocked(GridDhtCacheEntry.java:740)
        at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheEntry.hasReaders(GridDhtCacheEntry.java:773)
        at o.a.i.i.processors.cache.GridCacheMapEntry.evictInternal(GridCacheMapEntry.java:4559)
        at o.a.i.i.processors.cache.persistence.evict.PageAbstractEvictionTracker.evictDataPage(PageAbstractEvictionTracker.java:164)

--------------------------

Some of the calls got stuck/delayed, which are reported in log file:
--------------------------
2020-10-06 01:48:17.035  WARN 11 [eout-worker-#31,[]] o.apache.ignite.internal.util.typedef.G  : >>> Possible starvation in striped pool.
    Thread name: sys-stripe-1-#2
    Queue: []
    Deadlock: false
    Completed: 8129
Thread [name="sys-stripe-1-#2", id=54, state=RUNNABLE, blockCnt=12, waitCnt=8668]
        at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
        at java.lang.ThreadLocal.get(ThreadLocal.java:170)
        at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:539)
        at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.node(TcpDiscoveryNodesRing.java:336)
        at o.a.i.spi.discovery.tcp.ServerImpl.getNode(ServerImpl.java:331)
        at o.a.i.spi.discovery.tcp.TcpDiscoverySpi.getNode(TcpDiscoverySpi.java:478)
        at o.a.i.i.managers.discovery.GridDiscoveryManager.getAlive(GridDiscoveryManager.java:1627)
        at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheEntry.checkReadersLocked(GridDhtCacheEntry.java:740)
        at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheEntry.hasReaders(GridDhtCacheEntry.java:773)
        at o.a.i.i.processors.cache.GridCacheMapEntry.evictInternal(GridCacheMapEntry.java:4559)
        at o.a.i.i.processors.cache.persistence.evict.PageAbstractEvictionTracker.evictDataPage(PageAbstractEvictionTracker.java:164)
        at o.a.i.i.processors.cache.persistence.evict.RandomLruPageEvictionTracker.evictDataPage(RandomLruPageEvictionTracker.java:163)
        at o.a.i.i.processors.cache.persistence.IgniteCacheDatabaseSharedManager.ensureFreeSpace(IgniteCacheDatabaseSharedManager.java:1086)

-----------------------

2020-10-06 01:48:17.133  WARN 11 [eout-worker-#31,[]] o.apache.ignite.internal.util.typedef.G  : >>> Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5
    Queue: []
    Deadlock: false
    Completed: 10407
Thread [name="sys-stripe-4-#5", id=57, state=RUNNABLE, blockCnt=22, waitCnt=10921]
        at sun.nio.cs.UTF_8.newDecoder(UTF_8.java:68)
        at java.lang.StringCoding.decode(StringCoding.java:213)
        at java.lang.String.<init>(String.java:463)
        at o.a.i.i.binary.BinaryObjectImpl.fieldByOrder(BinaryObjectImpl.java:424)
        at o.a.i.i.binary.BinaryFieldImpl.value(BinaryFieldImpl.java:112)
        at o.a.i.i.processors.cache.CacheDefaultBinaryAffinityKeyMapper.affinityKey(CacheDefaultBinaryAffinityKeyMapper.java:81)
        at o.a.i.i.processors.cache.GridCacheAffinityManager.affinityKey(GridCacheAffinityManager.java:201)
        at o.a.i.i.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:185)
        at o.a.i.i.processors.cache.GridCacheAffinityManager.partition(GridCacheAffinityManager.java:160)
        at o.a.i.i.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.localPartition(GridCachePartitionedConcurrentMap.java:68)
        at o.a.i.i.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:89)
        at o.a.i.i.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:1049)
        at o.a.i.i.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:550)
        at o.a.i.i.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:1040)
        at o.a.i.i.processors.cache.persistence.evict.PageAbstractEvictionTracker.evictDataPage(PageAbstractEvictionTracker.java:162)
        at o.a.i.i.processors.cache.persistence.evict.RandomLruPageEvictionTracker.evictDataPage(RandomLruPageEvictionTracker.java:163)

------------------------


--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

Hi,

From logs, I see that all server nodes are pinging node on host 10.180.48.14
all the time. It can be caused by the lack of the connection between part of
the nodes(for example, if node 10.180.48.14 can't directly communicate with
client nodes). I think that client nodes might ask server nodes to check if
node on 10.180.48.14 is still alive. I would recommend checking connectivity
between all the machines in the cluster, including client machines.

Best Regards,
Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
pvprsd pvprsd
Reply | Threaded
Open this post in threaded view
|

Re: Eviction policy enablement leads to ignite cluster blocking, it does not use pages from freelist

Hi Evgenii,

Later investigations, we found that this is happening due to the near-cache
"rdrs" not released from entries i.e. GridDhtCacheEntry objects.

When near-cache entry is getting evicted from a ignite-client node; it is
not clearing the rdrs from Server. Due to these rdrs not empty, the entries
are not getting evicted; when eviction started.

As per NearCache documentation, the rdrs will be maintained to track the
client-nodes, so that entry changes will be posted to these near-cache
entries. But what I have observed is, the rdrs are not getting cleared; when
the local node evicts the entry. I have confirmed the local-node eviction by
monitoring the local-node metrics.
When we turned-off the near-cache completely, the eviction working well, as
the rdrs are not blocking the eviction this time.

Is there any specific configuration required to be enabled to clear the rdrs
from GridDhtCacheEntry objects?

Thanks,
Prasad




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/