Corrupted B+ Tree Causing Repeated Crashes

classic Classic list List threaded Threaded
8 messages Options
Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Corrupted B+ Tree Causing Repeated Crashes

We are encountering the following error repeatedly, which causes our node to crash:

2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]]]
class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=
com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6106)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1367)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1344)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:637)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAllInternal(GridLocalAtomicCache.java:410)
at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAll(GridLocalAtomicCache.java:323)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(GridCacheAdapter.java:4907)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAdapter.java:1617)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCacheProxyImpl.java:1157)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(GatewayProtectedCacheProxy.java:724)
at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3DataCache.java:1535)
at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurityKeyAccountAsync$11(Ts3DataCache.java:895)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Item not found: 1
at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:351)
at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:459)
at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:501)
at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(CacheDataTree.java:447)
at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:386)
at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5377)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5297)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1100(BPlusTree.java:98)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:302)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5888)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:282)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5874)
at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:169)
at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:364)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:6075)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1424)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1433)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1433)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1391)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1359)
... 16 more
2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: A critical problem with persistence data structures was detected. Please make backup of persistence storage and WAL files for further analysis. Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive


I think we can fix this by just clearing the persistent storage and restarting our node, but we can't have this happen in production so I want to understand two things:

1. How can this happen?

2. How can we prevent this from happening/best respond when it does happen? We don't want our process to crash as a result of this, we would rather just invalidate the cache and clear it if at all possible.
Данилов Семён Данилов Семён
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

Hello! What version of Apache Ignite are you using?  

19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)" <[hidden email]>:

> We are encountering the following error repeatedly, which causes our node to crash:
>
> 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]]]
> class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey [idHash=1436767547, hash=-931214342, accountCusip=
> com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954, hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6106)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1367)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1344)
> at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(IgniteCacheOffheapManagerImpl.java:637)
> at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAllInternal(GridLocalAtomicCache.java:410)
> at org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.getAll(GridLocalAtomicCache.java:323)
> at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(GridCacheAdapter.java:4907)
> at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAdapter.java:1617)
> at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCacheProxyImpl.java:1157)
> at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(GatewayProtectedCacheProxy.java:724)
> at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3DataCache.java:1535)
> at com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurityKeyAccountAsync$11(Ts3DataCache.java:895)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: Item not found: 1
> at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:459)
> at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:501)
> at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(CacheDataTree.java:447)
> at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:386)
> at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheDataTree.java:63)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5377)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5297)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1100(BPlusTree.java:98)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:302)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5888)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:282)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5874)
> at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:169)
> at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:364)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:6075)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1424)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1433)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1433)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1391)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1359)
> ... 16 more
> 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: A critical problem with persistence data structures was detected. Please make backup of persistence storage and WAL files for further analysis. Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive
>
> I think we can fix this by just clearing the persistent storage and restarting our node, but we can't have this happen in production so I want to understand two things:
>
> 1. How can this happen?
>
> 2. How can we prevent this from happening/best respond when it does happen? We don't want our process to crash as a result of this, we would rather just invalidate the cache and clear it if at all possible.
Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

In reply to this post by Mitchell Rathbun (BLOOMBERG/ 731 LEX)
2.9.1

From: [hidden email] At: 02/19/21 14:18:44
To: [hidden email], [hidden email]
Subject: Re: Corrupted B+ Tree Causing Repeated Crashes

Hello! What version of Apache Ignite are you using?  

19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
<[hidden email]>:
> We are encountering the following error repeatedly, which causes our node to
crash:
>
> 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342,
accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
[idHash=316813954, hash=343304888, accountId=0,
cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
subflag=2]]], hash=-931214342, cacheId=0]]]]
> class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
[val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342, accountCusip=
> com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
[idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
TreeException(BPlusTree.java:6106)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1367)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1344)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
gniteCacheOffheapManagerImpl.java:637)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAllInternal(GridLocalAtomicCache.java:410)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAll(GridLocalAtomicCache.java:323)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
idCacheAdapter.java:4907)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
pter.java:1617)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
cheProxyImpl.java:1157)
> at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
tewayProtectedCacheProxy.java:724)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
Cache.java:1535)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
KeyAccountAsync$11(Ts3DataCache.java:895)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
ava:1128)
> at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: Item not found: 1
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.getDataOffset(AbstractDataPageIO.java:459)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.readPayload(AbstractDataPageIO.java:501)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
DataTree.java:447)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:386)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:63)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
PlusTree.java:5377)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
tionPoint(BPlusTree.java:5297)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
00(BPlusTree.java:98)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n0(BPlusTree.java:302)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5888)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n(BPlusTree.java:282)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5874)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
adPage(PageHandler.java:169)
> at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
tructure.java:364)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
sTree.java:6075)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1424)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
lusTree.java:1391)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1359)
> ... 16 more
> 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: A critical problem with persistence data structures was detected.
Please make backup of persistence storage and WAL files for further analysis.
Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive
>
> I think we can fix this by just clearing the persistent storage and
restarting our node, but we can't have this happen in production so I want to
understand two things:
>
> 1. How can this happen?
>
> 2. How can we prevent this from happening/best respond when it does happen?
We don't want our process to crash as a result of this, we would rather just
invalidate the cache and clear it if at all possible.

Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

In reply to this post by Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Actually, we were using 2.7.5 when the data was corrupted. We upgraded to 2.9.1 without clearing the corrupted data and got the error that was posted in the first message.

From: [hidden email] At: 02/19/21 14:18:44
To: [hidden email], [hidden email]
Subject: Re: Corrupted B+ Tree Causing Repeated Crashes

Hello! What version of Apache Ignite are you using?  

19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
<[hidden email]>:
> We are encountering the following error repeatedly, which causes our node to
crash:
>
> 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342,
accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
[idHash=316813954, hash=343304888, accountId=0,
cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
subflag=2]]], hash=-931214342, cacheId=0]]]]
> class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
[val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342, accountCusip=
> com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
[idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
TreeException(BPlusTree.java:6106)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1367)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1344)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
gniteCacheOffheapManagerImpl.java:637)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAllInternal(GridLocalAtomicCache.java:410)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAll(GridLocalAtomicCache.java:323)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
idCacheAdapter.java:4907)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
pter.java:1617)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
cheProxyImpl.java:1157)
> at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
tewayProtectedCacheProxy.java:724)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
Cache.java:1535)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
KeyAccountAsync$11(Ts3DataCache.java:895)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
ava:1128)
> at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: Item not found: 1
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.getDataOffset(AbstractDataPageIO.java:459)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.readPayload(AbstractDataPageIO.java:501)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
DataTree.java:447)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:386)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:63)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
PlusTree.java:5377)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
tionPoint(BPlusTree.java:5297)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
00(BPlusTree.java:98)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n0(BPlusTree.java:302)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5888)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n(BPlusTree.java:282)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5874)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
adPage(PageHandler.java:169)
> at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
tructure.java:364)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
sTree.java:6075)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1424)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
lusTree.java:1391)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1359)
> ... 16 more
> 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: A critical problem with persistence data structures was detected.
Please make backup of persistence storage and WAL files for further analysis.
Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive
>
> I think we can fix this by just clearing the persistent storage and
restarting our node, but we can't have this happen in production so I want to
understand two things:
>
> 1. How can this happen?
>
> 2. How can we prevent this from happening/best respond when it does happen?
We don't want our process to crash as a result of this, we would rather just
invalidate the cache and clear it if at all possible.

Mitchell Rathbun (BLOOMBERG/ 731 LEX) Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

In reply to this post by Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Any other thoughts on this? The data corruption occurred when we were using version 2.7.5. I have looked at a couple of tickets involving corrupted trees, but it doesn't seem like any of them apply to our use case of Ignite. Would like to understand at least how we get into this corrupted state in the first place, and how to handle it when it happens. Is there a way to detect and log this error while avoiding crashing the process?

From: [hidden email] At: 02/19/21 14:18:44
To: [hidden email], [hidden email]
Subject: Re: Corrupted B+ Tree Causing Repeated Crashes

Hello! What version of Apache Ignite are you using?  

19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
<[hidden email]>:
> We are encountering the following error repeatedly, which causes our node to
crash:
>
> 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342,
accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
[idHash=316813954, hash=343304888, accountId=0,
cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
subflag=2]]], hash=-931214342, cacheId=0]]]]
> class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
[val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342, accountCusip=
> com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
[idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
TreeException(BPlusTree.java:6106)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1367)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1344)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
gniteCacheOffheapManagerImpl.java:637)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAllInternal(GridLocalAtomicCache.java:410)
> at
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAll(GridLocalAtomicCache.java:323)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
idCacheAdapter.java:4907)
> at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
pter.java:1617)
> at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
cheProxyImpl.java:1157)
> at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
tewayProtectedCacheProxy.java:724)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
Cache.java:1535)
> at
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
KeyAccountAsync$11(Ts3DataCache.java:895)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
ava:1128)
> at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.IllegalStateException: Item not found: 1
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.getDataOffset(AbstractDataPageIO.java:459)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.readPayload(AbstractDataPageIO.java:501)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
DataTree.java:447)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:386)
> at
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:63)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
PlusTree.java:5377)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
tionPoint(BPlusTree.java:5297)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
00(BPlusTree.java:98)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n0(BPlusTree.java:302)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5888)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n(BPlusTree.java:282)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5874)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
adPage(PageHandler.java:169)
> at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
tructure.java:364)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
sTree.java:6075)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1424)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
lusTree.java:1391)
> at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1359)
> ... 16 more
> 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> SEVERE: A critical problem with persistence data structures was detected.
Please make backup of persistence storage and WAL files for further analysis.
Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive
>
> I think we can fix this by just clearing the persistent storage and
restarting our node, but we can't have this happen in production so I want to
understand two things:
>
> 1. How can this happen?
>
> 2. How can we prevent this from happening/best respond when it does happen?
We don't want our process to crash as a result of this, we would rather just
invalidate the cache and clear it if at all possible.

Maxim Muzafarov Maxim Muzafarov
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

Mitchell,

Can you provide the full log and the cache configuration?

On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
<[hidden email]> wrote:

>
> Any other thoughts on this? The data corruption occurred when we were using version 2.7.5. I have looked at a couple of tickets involving corrupted trees, but it doesn't seem like any of them apply to our use case of Ignite. Would like to understand at least how we get into this corrupted state in the first place, and how to handle it when it happens. Is there a way to detect and log this error while avoiding crashing the process?
>
> From: [hidden email] At: 02/19/21 14:18:44
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , [hidden email]
> Subject: Re: Corrupted B+ Tree Causing Repeated Crashes
>
> Hello! What version of Apache Ignite are you using?
>
> 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
> <[hidden email]>:
> > We are encountering the following error repeatedly, which causes our node to
> crash:
> >
> > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > SEVERE: Critical system error detected. Will be handled accordingly to
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
> val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
> [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> [idHash=1436767547, hash=-931214342,
> accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
> [idHash=316813954, hash=343304888, accountId=0,
> cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
> hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
> subflag=2]]], hash=-931214342, cacheId=0]]]]
> > class
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
> ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
> [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
> SearchRow
> [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> [idHash=1436767547, hash=-931214342, accountCusip=
> > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
> hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
> [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
> cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
> TreeException(BPlusTree.java:6106)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1367)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1344)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
> ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> > at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
> GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
> gniteCacheOffheapManagerImpl.java:637)
> > at
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> tAllInternal(GridLocalAtomicCache.java:410)
> > at
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> tAll(GridLocalAtomicCache.java:323)
> > at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
> idCacheAdapter.java:4907)
> > at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
> pter.java:1617)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
> cheProxyImpl.java:1157)
> > at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
> tewayProtectedCacheProxy.java:724)
> > at
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
> Cache.java:1535)
> > at
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
> KeyAccountAsync$11(Ts3DataCache.java:895)
> > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1128)
> > at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:628)
> > at java.base/java.lang.Thread.run(Thread.java:834)
> > Caused by: java.lang.IllegalStateException: Item not found: 1
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.getDataOffset(AbstractDataPageIO.java:459)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.readPayload(AbstractDataPageIO.java:501)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
> DataTree.java:447)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> Tree.java:386)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> Tree.java:63)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
> PlusTree.java:5377)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
> tionPoint(BPlusTree.java:5297)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
> 00(BPlusTree.java:98)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> n0(BPlusTree.java:302)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> ndler.run(BPlusTree.java:5888)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> n(BPlusTree.java:282)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> ndler.run(BPlusTree.java:5874)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
> adPage(PageHandler.java:169)
> > at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
> tructure.java:364)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
> sTree.java:6075)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1424)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1433)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1433)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
> lusTree.java:1391)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1359)
> > ... 16 more
> > 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > SEVERE: A critical problem with persistence data structures was detected.
> Please make backup of persistence storage and WAL files for further analysis.
> Persistence storage path: null WAL path: db/wal WAL archive path: db/wal/archive
> >
> > I think we can fix this by just clearing the persistent storage and
> restarting our node, but we can't have this happen in production so I want to
> understand two things:
> >
> > 1. How can this happen?
> >
> > 2. How can we prevent this from happening/best respond when it does happen?
> We don't want our process to crash as a result of this, we would rather just
> invalidate the cache and clear it if at all possible.
>
>
Maxim Muzafarov Maxim Muzafarov
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

In reply to this post by Mitchell Rathbun (BLOOMBERG/ 731 LEX)
Mitchell,

I've created the issue [1] for your case, but it's really hard to
define the root cause without additional information (the exception
stack trace isn't enough for analysis). Do you have the issue
reproducer? Does this issue still appear?

There are some issues already been fixed since the 2.7.5 version of
the persistent data store, but maybe some of them still exist. So, I
see the following options here:
- as a WA: cleanup the node pds and wait for data being rebalanced.
- trying to reproduce the issue on another cluster - take a snapshot,
restore it on another environment, run idle_verify check (probably it
helps)


[1] https://issues.apache.org/jira/browse/IGNITE-14252

On Fri, 26 Feb 2021 at 06:44, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
<[hidden email]> wrote:

>
> The rest of the logs are mixed in with the rest of our process logs, so I can't really share that. The configuration looks as follows:
>
> DataRegionConfiguration dataRegionCfg = new DataRegionConfiguration();
> dataRegionCfg.setName(DATA_REGION_NAME)
> .setInitialSize(200_000_000)
> .setMaxSize(200_000_000)
> .setPersistenceEnabled(true)
> .setMetricsEnabled(true);
>
> DataStorageConfiguration storageCfg = new DataStorageConfiguration();
> storageCfg.setDataRegionConfigurations(dataRegionCfg)
> .setWriteThrottlingEnabled(true)
> .setMetricsEnabled(true);
>
> IgniteConfiguration ignCfg = new IgniteConfiguration();
> ignCfg.setWorkDirectory(workDirectory)
> .setDataStorageConfiguration(storageCfg)
> .setIgniteInstanceName("instanceName")
> .setSystemWorkerBlockedTimeout(10000)
> .setFailureDetectionTimeout(10000)
>
>
> From: [hidden email] At: 02/25/21 18:19:15
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , [hidden email]
> Subject: Re: Corrupted B+ Tree Causing Repeated Crashes
>
> Mitchell,
>
> Can you provide the full log and the cache configuration?
>
> On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
> <[hidden email]> wrote:
> >
> > Any other thoughts on this? The data corruption occurred when we were using
> version 2.7.5. I have looked at a couple of tickets involving corrupted trees,
> but it doesn't seem like any of them apply to our use case of Ignite. Would
> like to understand at least how we get into this corrupted state in the first
> place, and how to handle it when it happens. Is there a way to detect and log
> this error while avoiding crashing the process?
> >
> > From: [hidden email] At: 02/19/21 14:18:44
> > To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , [hidden email]
> > Subject: Re: Corrupted B+ Tree Causing Repeated Crashes
> >
> > Hello! What version of Apache Ignite are you using?
> >
> > 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
> > <[hidden email]>:
> > > We are encountering the following error repeatedly, which causes our node to
> > crash:
> > >
> > > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> > 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > > SEVERE: Critical system error detected. Will be handled accordingly to
> > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0,
> > super=AbstractFailureHandler
> > > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> > SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> > [type=CRITICAL_ERROR, err=class
> > o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> > corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
> > val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
> > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> > [idHash=1436767547, hash=-931214342,
> > accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
> > [idHash=316813954, hash=343304888, accountId=0,
> > cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
> > hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
> > subflag=2]]], hash=-931214342, cacheId=0]]]]
> > > class
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
> > ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
> > [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
> > SearchRow
> > [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> > [idHash=1436767547, hash=-931214342, accountCusip=
> > > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
> > hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
> > [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
> > cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
> > TreeException(BPlusTree.java:6106)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> > PlusTree.java:1367)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> > PlusTree.java:1344)
> > > at
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
> > ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
> > GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> > > at
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
> > gniteCacheOffheapManagerImpl.java:637)
> > > at
> >
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> > tAllInternal(GridLocalAtomicCache.java:410)
> > > at
> >
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> > tAll(GridLocalAtomicCache.java:323)
> > > at
> >
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
> > idCacheAdapter.java:4907)
> > > at
> >
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
> > pter.java:1617)
> > > at
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
> > cheProxyImpl.java:1157)
> > > at
> >
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
> > tewayProtectedCacheProxy.java:724)
> > > at
> >
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
> > Cache.java:1535)
> > > at
> >
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
> > KeyAccountAsync$11(Ts3DataCache.java:895)
> > > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > > at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1128)
> > > at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:628)
> > > at java.base/java.lang.Thread.run(Thread.java:834)
> > > Caused by: java.lang.IllegalStateException: Item not found: 1
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> > IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> > IO.getDataOffset(AbstractDataPageIO.java:459)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> > IO.readPayload(AbstractDataPageIO.java:501)
> > > at
> >
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
> > DataTree.java:447)
> > > at
> >
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> > Tree.java:386)
> > > at
> >
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> > Tree.java:63)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
> > PlusTree.java:5377)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
> > tionPoint(BPlusTree.java:5297)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
> > 00(BPlusTree.java:98)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> > n0(BPlusTree.java:302)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> > ndler.run(BPlusTree.java:5888)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> > n(BPlusTree.java:282)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> > ndler.run(BPlusTree.java:5874)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
> > adPage(PageHandler.java:169)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
> > tructure.java:364)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
> > sTree.java:6075)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> > BPlusTree.java:1424)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> > BPlusTree.java:1433)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> > BPlusTree.java:1433)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
> > lusTree.java:1391)
> > > at
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> > PlusTree.java:1359)
> > > ... 16 more
> > > 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> > 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > > SEVERE: A critical problem with persistence data structures was detected.
> > Please make backup of persistence storage and WAL files for further analysis.
> > Persistence storage path: null WAL path: db/wal WAL archive path:
> db/wal/archive
> > >
> > > I think we can fix this by just clearing the persistent storage and
> > restarting our node, but we can't have this happen in production so I want to
> > understand two things:
> > >
> > > 1. How can this happen?
> > >
> > > 2. How can we prevent this from happening/best respond when it does happen?
> > We don't want our process to crash as a result of this, we would rather just
> > invalidate the cache and clear it if at all possible.
> >
> >
>
>
18624049226 18624049226
Reply | Threaded
Open this post in threaded view
|

Re: Corrupted B+ Tree Causing Repeated Crashes

Hello,

Has the following code fix solved this issue?

https://issues.apache.org/jira/browse/IGNITE-12489

在 2021/2/27 上午2:22, Maxim Muzafarov 写道:
Mitchell,

I've created the issue [1] for your case, but it's really hard to
define the root cause without additional information (the exception
stack trace isn't enough for analysis). Do you have the issue
reproducer? Does this issue still appear?

There are some issues already been fixed since the 2.7.5 version of
the persistent data store, but maybe some of them still exist. So, I
see the following options here:
- as a WA: cleanup the node pds and wait for data being rebalanced.
- trying to reproduce the issue on another cluster - take a snapshot,
restore it on another environment, run idle_verify check (probably it
helps)


[1] https://issues.apache.org/jira/browse/IGNITE-14252

On Fri, 26 Feb 2021 at 06:44, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
[hidden email] wrote:
The rest of the logs are mixed in with the rest of our process logs, so I can't really share that. The configuration looks as follows:

DataRegionConfiguration dataRegionCfg = new DataRegionConfiguration();
dataRegionCfg.setName(DATA_REGION_NAME)
.setInitialSize(200_000_000)
.setMaxSize(200_000_000)
.setPersistenceEnabled(true)
.setMetricsEnabled(true);

DataStorageConfiguration storageCfg = new DataStorageConfiguration();
storageCfg.setDataRegionConfigurations(dataRegionCfg)
.setWriteThrottlingEnabled(true)
.setMetricsEnabled(true);

IgniteConfiguration ignCfg = new IgniteConfiguration();
ignCfg.setWorkDirectory(workDirectory)
.setDataStorageConfiguration(storageCfg)
.setIgniteInstanceName("instanceName")
.setSystemWorkerBlockedTimeout(10000)
.setFailureDetectionTimeout(10000)


From: [hidden email] At: 02/25/21 18:19:15
To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: Re: Corrupted B+ Tree Causing Repeated Crashes

Mitchell,

Can you provide the full log and the cache configuration?

On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
[hidden email] wrote:
Any other thoughts on this? The data corruption occurred when we were using
version 2.7.5. I have looked at a couple of tickets involving corrupted trees,
but it doesn't seem like any of them apply to our use case of Ignite. Would
like to understand at least how we get into this corrupted state in the first
place, and how to handle it when it happens. Is there a way to detect and log
this error while avoiding crashing the process?
From: [hidden email] At: 02/19/21 14:18:44
To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: Re: Corrupted B+ Tree Causing Repeated Crashes

Hello! What version of Apache Ignite are you using?

19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
[hidden email]:
We are encountering the following error repeatedly, which causes our node to
crash:
2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0,
super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342,
accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
[idHash=316813954, hash=343304888, accountId=0,
cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
subflag=2]]], hash=-931214342, cacheId=0]]]]
class

        
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
[val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
SearchRow
[key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
[idHash=1436767547, hash=-931214342, accountCusip=
com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
[idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
TreeException(BPlusTree.java:6106)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1367)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1344)
at

        
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
at

        
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
at

        
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
gniteCacheOffheapManagerImpl.java:637)
at

        
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAllInternal(GridLocalAtomicCache.java:410)
at

        
org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
tAll(GridLocalAtomicCache.java:323)
at

        
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
idCacheAdapter.java:4907)
at

        
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
pter.java:1617)
at

        
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
cheProxyImpl.java:1157)
at

        
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
tewayProtectedCacheProxy.java:724)
at

        
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
Cache.java:1535)
at

        
com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
KeyAccountAsync$11(Ts3DataCache.java:895)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at

        
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
ava:1128)
at

        
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Item not found: 1
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.getDataOffset(AbstractDataPageIO.java:459)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
IO.readPayload(AbstractDataPageIO.java:501)
at

        
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
DataTree.java:447)
at

        
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:386)
at

        
org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
Tree.java:63)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
PlusTree.java:5377)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
tionPoint(BPlusTree.java:5297)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
00(BPlusTree.java:98)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n0(BPlusTree.java:302)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5888)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
n(BPlusTree.java:282)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
ndler.run(BPlusTree.java:5874)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
adPage(PageHandler.java:169)
at

        
org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
tructure.java:364)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
sTree.java:6075)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1424)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
BPlusTree.java:1433)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
lusTree.java:1391)
at

        
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
PlusTree.java:1359)
... 16 more
2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
SEVERE: A critical problem with persistence data structures was detected.
Please make backup of persistence storage and WAL files for further analysis.
Persistence storage path: null WAL path: db/wal WAL archive path:
db/wal/archive
I think we can fix this by just clearing the persistent storage and
restarting our node, but we can't have this happen in production so I want to
understand two things:
1. How can this happen?

2. How can we prevent this from happening/best respond when it does happen?
We don't want our process to crash as a result of this, we would rather just
invalidate the cache and clear it if at all possible.