Apache Ignite Persistence Issues/Warnings

classic Classic list List threaded Threaded
6 messages Options
adipro adipro
Reply | Threaded
Open this post in threaded view
|

Apache Ignite Persistence Issues/Warnings

We are running a write-intensive load test on Apache Ignite. We are also
doing reads. We are getting the following warning messages from sys-stripe
threads quite often after a few hours of testing.

DB Config ->
RAM - 8GB, CPU cores - 64,
Persistence - ON,
Heap - 2GB, Durable memory Off-heap - 2GB, XX:MaxDirectMemorySize - 1GB,
WAL Archiving - Off,
checkpointBufferSize - 1GB,
walSegmentSize - 256mb

We initialized 70 threads(which query) in App Server (Client) and the client
has 64 cores CPU with 2GB maxheap configuration.


```
[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped
pool.
    Thread name: sys-stripe-1-#2
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridNearGetRequest
[futId=49990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=59990d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridNearAtomicSingleUpdateFilterRequest
[filter=[o.a.i.i.processors.cache.CacheEntrySerializablePredicate@653d7f85],
parent=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=129,
val=null, hasValBytes=true],
parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null,
futId=29605669, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
parent=GridNearAtomicAbstractUpdateRequest [res=null,
flags=needRes|keepBinary]]]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridNearTxFinishRequest [miniId=1,
mvccSnapshot=null, super=GridDistributedTxFinishRequest
[topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
futId=97a90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, threadId=1251,
commitVer=null, invalidate=false, commit=false, baseVer=null, txSize=0,
sys=true, plc=2, subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e,
taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=null,
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669,
order=1588652521556, nodeOrder=3], committedVers=null, rolledbackVers=null,
cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage
[msgId=59316508, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion
[topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest
[futId=cca90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1,
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null,
flags=, super=GridDistributedTxPrepareRequest [threadId=2528,
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ,
writeVer=GridCacheVersion [topVer=200084669, order=1588652521620,
nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry
[txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null,
hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null,
op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP],
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
explicitVer=null, dhtVer=null, filters=null, filtersPassed=false,
filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null,
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0,
partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0,
plc=2, txState=null, flags=onePhase|last|sys,
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669,
order=1588652521620, nodeOrder=3], committedVers=null, rolledbackVers=null,
cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage
[msgId=59316562, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion
[topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest
[futId=36c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=46c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest
[futId=39c90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1,
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null,
flags=, super=GridDistributedTxPrepareRequest [threadId=2141,
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ,
writeVer=GridCacheVersion [topVer=200084669, order=1588652522004,
nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry
[txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null,
hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null,
op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP],
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
explicitVer=null, dhtVer=null, filters=null, filtersPassed=false,
filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null,
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0,
partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0,
plc=2, txState=null, flags=onePhase|last|sys,
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669,
order=1588652522004, nodeOrder=3], committedVers=null, rolledbackVers=null,
cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage
[msgId=59316868, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion
[topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest
[futId=8cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=9cc90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest
[futId=4cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=5cd90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearGetRequest
[futId=28e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=38e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearTxPrepareRequest
[futId=d9e90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, miniId=1,
topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0, txLbl=null,
flags=, super=GridDistributedTxPrepareRequest [threadId=3430,
concurrency=PESSIMISTIC, isolation=REPEATABLE_READ,
writeVer=GridCacheVersion [topVer=200084669, order=1588652522452,
nodeOrder=3], timeout=0, reads=ArrayList [], writes=ArrayList [IgniteTxEntry
[txKey=null, val=TxEntryValueHolder [val=CacheObjectImpl [val=null,
hasValBytes=true], op=UPDATE], prevVal=TxEntryValueHolder [val=null,
op=NOOP], oldVal=TxEntryValueHolder [val=null, op=NOOP],
entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
explicitVer=null, dhtVer=null, filters=null, filtersPassed=false,
filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null,
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0,
partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0,
plc=2, txState=null, flags=onePhase|last|sys,
super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=200084669,
order=1588652522452, nodeOrder=3], committedVers=null, rolledbackVers=null,
cnt=0, super=GridCacheIdMessage [cacheId=0, super=GridCacheMessage
[msgId=59317224, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion
[topVer=4, minorTopVer=0], err=null, skipPrepare=false]]]]]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearSingleGetRequest
[futId=1588665032453, key=KeyCacheObjectImpl [part=385, val=null,
hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
    Deadlock: false
    Completed: 1094133
Thread [name="sys-stripe-1-#2", id=29, state=RUNNABLE, blockCnt=32,
waitCnt=1280827]
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.dataSize(RecordDataV1Serializer.java:2083)
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.plainSize(RecordDataV1Serializer.java:386)
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV2Serializer.plainSize(RecordDataV2Serializer.java:101)
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordDataV1Serializer.size(RecordDataV1Serializer.java:181)
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer$2.sizeWithHeaders(RecordV2Serializer.java:96)
        at
o.a.i.i.processors.cache.persistence.wal.serializer.RecordV2Serializer.size(RecordV2Serializer.java:226)
        at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:837)
        at
o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:796)
        at
o.a.i.i.processors.cache.GridCacheMapEntry.logUpdate(GridCacheMapEntry.java:4307)
        at
o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.remove(GridCacheMapEntry.java:6505)
        at
o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:6177)
        at
o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5863)
        at
o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3820)
        at
o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3714)
        at
o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1969)
        at
o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1940)
        at
o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1847)
        at
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1654)
        at
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1637)
        at
o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2436)
        at
o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:433)
        at
o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2309)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2576)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2036)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3241)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
        at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
        at
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
        at
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at
o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
        at
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
        at
o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
        at
o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
        at
o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
        at
o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
        at java.lang.Thread.run(Thread.java:748)
```

Another type of warning
```
[WARNING][grid-timeout-worker-#135][G] >>> Possible starvation in striped
pool.
    Thread name: sys-stripe-0-#1
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridNearGetRequest
[futId=a1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd,
miniId=b1f90d3e171-cd1ebc9c-a326-4009-851b-1a3f2a703edd, ver=null,
keyMap=null, flags=1, topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], subjId=2ab5710f-6568-4940-b3cc-ce756a634f4e, taskNameHash=0,
createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]]]
    Deadlock: false
    Completed: 1396153
Thread [name="sys-stripe-0-#1", id=28, state=TIMED_WAITING, blockCnt=61,
waitCnt=1874266]
    Lock
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@1b08480,
ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
        at
o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1638)
        at
o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:368)
        at
o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1304)
        at
o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:709)
        at
o.a.i.i.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1102)
        at
o.a.i.i.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:410)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:576)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:373)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:182)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:160)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:122)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:204)
        at
o.a.i.i.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:202)
        at
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
        at
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
        at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
        at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
        at
o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
        at
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
        at
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1635)
        at
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1255)
        at
o.a.i.i.managers.communication.GridIoManager.access$4300(GridIoManager.java:144)
        at
o.a.i.i.managers.communication.GridIoManager$8.execute(GridIoManager.java:1144)
        at
o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
        at
o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:119)
        at java.lang.Thread.run(Thread.java:748)
```

Pool metric
```
Striped thread pool [active=7, idle=57, qSize=9]
```


I am assuming because of the above warnings we are getting the below delay
on SQL queries.
```
[WARNING][long-qry-#170][LongRunningQueryManager] Query execution is too
long [duration=3424ms, type=MAP, distributedJoin=false,
enforceJoinOrder=false, lazy=false, schema=PUBLIC, sql='SELECT
"__Z0"."ID" "__C0_0",
"__Z0"."URL" "__C0_1",
"__Z0"."SCORE" "__C0_2",
"__Z0"."APPNAME_ID" "__C0_3"
FROM "PUBLIC"."URLS" "__Z0"
WHERE "__Z0"."APPNAME_ID" = ?1
ORDER BY 3 FETCH FIRST ?2 ROWS ONLY', plan=SELECT
    __Z0.ID AS __C0_0,
    __Z0.URL AS __C0_1,
    __Z0.SCORE AS __C0_2,
    __Z0.APPNAME_ID AS __C0_3
FROM PUBLIC.URLS __Z0
    /* PUBLIC.IDX_2_URLS */
    /* scanCount: 101020 */
WHERE __Z0.APPNAME_ID = ?1
ORDER BY 3
FETCH FIRST ?2 ROWS ONLY
/* index sorted */, node=TcpDiscoveryNode
[id=2ab5710f-6568-4940-b3cc-ce756a634f4e,
consistentId=2ab5710f-6568-4940-b3cc-ce756a634f4e, addrs=ArrayList
[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.20.46.195], sockAddrs=HashSet
[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.20.46.195:0], discPort=0,
order=3, intOrder=3, lastExchangeTime=1588604898769, loc=false,
ver=8.7.10#20191227-sha1:c481441d, isClient=true], reqId=292181, segment=0]
```
We don't think the there is any issue with query because we ran a read/write
speed test on big data and found the query time to be under 10ms. But here
it's getting delayed with even less data in database.

After 12 hours of uptime, checkpointing is taking approximately 1 minute to
finish.

There are a few more warnings that are being printed in logs often. Please
find them belo
```
2 checkpoint pages were not written yet due to unsuccessful page write lock
acquisition and will be retried
```
```
Throttling is applied to page modifications [percentOfPartTime=0.62,
markDirty=2440 pages/sec, checkpointWrite=1971 pages/sec,
estIdealMarkDirty=0 pages/sec, curDirty=0.00, maxDirty=0.02,
avgParkTime=253807 ns, pages: (total=132474, evicted=0, written=831,
synced=0, cpBufUsed=543, cpBufTotal=259107)]
```
```
[sys-stripe-38-#39][GridContinuousProcessor] Failed to wait for ack message.
[node=2ab5710f-6568-4940-b3cc-ce756a634f4e,
routine=b24f7959-546b-4242-81d4-c51de3ce0fc2]
```
```
Page replacements started, pages will be rotated with disk, this will affect
storage performance (consider increasing DataRegionConfiguration#setMaxSize
for data region)
```



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Apache Ignite Persistence Issues/Warnings

Hi,
Throttling is applied to page modifications [percentOfPartTime=0.62,
markDirty=2440 pages/sec, checkpointWrite=1971 pages/sec,
estIdealMarkDirty=0 pages/sec, curDirty=0.00, maxDirty=0.02,
avgParkTime=253807 ns, pages: (total=132474, evicted=0, written=831,
synced=0, cpBufUsed=543, cpBufTotal=259107)]


   This means that your disk is not fast enough or your throughput is too
high.
see:
https://apacheignite.readme.io/docs/durable-memory-tuning#pages-writes-throttling
and :  
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-PagesWriteThrottling

Here:  percentOfPartTime=0.62  --  - writing threads are stuck in
LockSupport.parkNanos() for 62% of the time, which is significant,

Page replacements started, pages will be rotated with disk, this will affect
storage performance (consider increasing DataRegionConfiguration#setMaxSize
for data region)

Here Ignite is using the disk as a swap space. Consider increasing the size
of that particular memory region.
 SQL queries can be fully utilized from Ignite applications when only a
subset of pages is in memory. In this case required pages will be taken from
disk during query execution.

If memory amount is less than whole size of B+tree for SQL index, Ignite
still can operate. When Ignite runs out of RAM memory for the new pages
allocation, some of the pages will be purged from RAM (as it can be loaded
from disk later).

see:
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk)

and: https://apacheignite.readme.io/docs/durable-memory-tuning

Query execution is too long
  This is likely related to the above issues.

Possible starvation in striped
pool.
    Thread name: sys-stripe-1-#2
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,

This means that these threads are stuck. Possible causes are network issues,
or very large objects.
see:
http://apache-ignite-users.70518.x6.nabble.com/Possible-starvation-in-striped-pool-td14892.html
and:
http://apache-ignite-users.70518.x6.nabble.com/Possible-starvation-in-striped-pool-message-td15993.html

but I would likely fix the above issues -- page replacements, memory and
come back to this later.

Thanks, Alex




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
adipro adipro
Reply | Threaded
Open this post in threaded view
|

Re: Apache Ignite Persistence Issues/Warnings

Hi Alex, thanks for the reply.

But as our application is write/update intensive, we want a good
performance/high availability application solution along with disk
persistence.

In that case, what configuration would you suggest for the following case,

1. Disk storing functionality can run periodically and while
that(checkpointing) is running it can be delayed to some extent but not too
much delay/starvations.
2. Faster access to data is needed. Queries should run quicker. (which is
actually happening now when checkpointing is not present).
3. All logical operations should be updated in WAL at any cost and even if
we configure checkpointing to be done at an interval of 1hour per 1
checkpoint, WAL should have all those operations logged.
4. We are disabling WAL archiving. Is it needed if point 3 is the case?

Thanks,
Aditya.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Apache Ignite Persistence Issues/Warnings

Hi,
  You need to be careful w/disabling WAL archiving.

  It is best to put the WAL on a separate (quicker) disk:
https://apacheignite.readme.io/docs/durable-
memory-tuning#separate-disk-device-for-wal

  This section will be of benefit to you when tuning persistence:  
https://apacheignite.readme.io/docs/durable-memory-tuning#general-tuning 


  Tips to follow: https://apacheignite.readme.io/docs/performance-tips


   Try to achieve the best performance you can w/out changing wal archiving.
If that doesn't help, the
   following section will guide you:

   WAL archive tuning:
https://apacheignite.readme.io/docs/write-ahead-log#tuning-wal-archive


Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
adipro adipro
Reply | Threaded
Open this post in threaded view
|

Re: Apache Ignite Persistence Issues/Warnings

Hi Alex,

Are you saying if we disable WAL Archiving, will there be any problem?

Let's say if I disable WAL archiving and if 6 out of 10 WAL segments are
filled then what happens during OS crash?

When we restart the server, will all the operations in those 6 segments be
updated on the disc data by the process of checkpointing? And what happens
if crash happened during checkpointing in the case of disabled WAL
Archiving?

Thanks,
Aditya.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Apache Ignite Persistence Issues/Warnings

Hello!

In both cases Ignite will walk through the segments it needs, and adjust persistence for any data that was updated in already completed operations.

Archive or no archive is a logistical choice, as far as my understanding goes.

Regards,
--
Ilya Kasnacheev


вт, 12 мая 2020 г. в 07:28, adipro <[hidden email]>:
Hi Alex,

Are you saying if we disable WAL Archiving, will there be any problem?

Let's say if I disable WAL archiving and if 6 out of 10 WAL segments are
filled then what happens during OS crash?

When we restart the server, will all the operations in those 6 segments be
updated on the disc data by the process of checkpointing? And what happens
if crash happened during checkpointing in the case of disabled WAL
Archiving?

Thanks,
Aditya.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/