Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

classic Classic list List threaded Threaded
4 messages Options
Kamlesh Joshi Kamlesh Joshi
Reply | Threaded
Open this post in threaded view
|

Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hello Igniters,

 

We recently upgraded to 2.7.6, suddenly one server node left from running cluster. We are seeing below exception in logs,

 

Caused by: org.h2.jdbc.JdbcSQLException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val:

 

 

We are not using any sql to fetch data, always use keys to get data from cluster. Here look like key index corrupted, could please help use with what are factors leads to index corruption.

 

Below is full error log,

 

[tibusr@NVMBD2BMZ160D00 server]$ grep -i ERROR ignite-e6ad3f74.log

[2020-07-24T18:06:19,601][ERROR][sys-stripe-97-#98%EDIFCustomer%][GridCacheIoManager] Failed processing message [senderId=af763e24-8a4c-4554-b5f6-f3f43275fd69, msg=GridDhtAtomicUpdateRequest [keys=[com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1047464579, hash=603340567, physicalResourceId=8991869040000399251, identifierId=8991869040000399251], com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2]], vals=[null, null], prevVals=null, ttls=null, conflictExpireTimes=null, nearTtls=null, nearExpireTimes=null, nearKeys=null, nearVals=null, obsoleteIndexes=null, forceTransformBackups=false, updateCntrs=GridLongList [idx=2, arr=[4475535,4403959]], super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null, nearFutId=0, flags=keepBinary]]]

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]

Caused by: org.h2.jdbc.JdbcSQLException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]

[2020-07-24T18:06:19,612][ERROR][sys-stripe-97-#98%EDIFCustomer%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=org.h2.message.DbException: General error: "class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]]]

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]

Caused by: org.h2.jdbc.JdbcSQLException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]

[2020-07-24T18:06:35,427][ERROR][sys-stripe-97-#98%EDIFCustomer%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=org.h2.message.DbException: General error: "class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@6d3d2e1d[ key: com.jio.digitalapi.edif.customer.model.PhysicalResourceKey [idHash=1472026580, hash=755487125, physicalResourceId=2, identifierId=2], val: com.jio.digitalapi.edif.customer.model.PhysicalResource [idHash=755138422, hash=2012536510, updatedby=NO000000FBVT, identifierValue=2, identifierType=, characteristicValue=null, characteristicUom=null, identifierSubcategory=4, identifierName=SERIAL_NO, isManageable=true, resourceVendor=, resourceModel=, characteristicDesc=null, reasonCode=, serviceId=7010312653, characteristicName=SKU_NUMBER:491190045, identifierCategory=3, physicalResourceId=null, quantity=1, resourceInstallationDate=, resourceName=MIFI, deviceCode=DEV100012, characteristicId=null, identifierPriceId=null, fixedMobile=2, companyId=null, identifierContext=null, identifierId=null, warrantyExpiryDate=, updateddatetime=2017-08-12 16:01:50.399, priceId=CMP40001, reasonDesc=, resourceType=MIFI], ver: GridCacheVersion [topVer=155468517, order=0, nodeOrder=1] ][ 2, 7010312653, MIFI, , , , , null, true, 2, CMP40001, 1, , , DEV100012, MIFI, 2, SERIAL_NO, 2, , 3, 4, null, null, null, SKU_NUMBER:491190045, null, null, null, 2017-08-12 16:01:50.399, NO000000FBVT ]" [50000-197]]]

 

 

 

 

Thanks and Regards,

Kamlesh Joshi

 


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hi,

Can you please clarify the version from which you upgraded?

I know that it's possible that if you used a pretty old version then you
should also rebuild your indexes during the upgrade because inline size
calculation logic was changed in last releases.

It can be done by removing index.bin files. So the logic should be the
following:

1)Stop the Ignite nodes.
2)Remove index.bin files from
$IGNITE_HOME/db/<NODE_ID>/<CACHE_NAME>/index.bin
3)Start Ignite nodes one by one. For each node wait for the following
message:

[2020-05-29 07:55:04,984][INFO
][build-idx-runner-#61][GridCacheDatabaseSharedManager] Indexes rebuilding
completed for all caches.

I guess that it can help you with your upgrade. Please test it before
applying it to your production because the reason can be different.

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Kamlesh Joshi Kamlesh Joshi
Reply | Threaded
Open this post in threaded view
|

RE: [External]Re: Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Thanks for the update Andrei. We upgraded from 2.6.0 to 2.7.6.

We followed the same approach and it worked properly.

Can we change INLINE SIZE for every cache at runtime ?

Thanks and Regards,
Kamlesh Joshi


-----Original Message-----
From: aealexsandrov <[hidden email]>
Sent: 27 July 2020 19:35
To: [hidden email]
Subject: [External]Re: Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

The e-mail below is from an external source. Please do not open attachments or click links from an unknown or suspicious origin.

Hi,

Can you please clarify the version from which you upgraded?

I know that it's possible that if you used a pretty old version then you should also rebuild your indexes during the upgrade because inline size calculation logic was changed in last releases.

It can be done by removing index.bin files. So the logic should be the
following:

1)Stop the Ignite nodes.
2)Remove index.bin files from
$IGNITE_HOME/db/<NODE_ID>/<CACHE_NAME>/index.bin
3)Start Ignite nodes one by one. For each node wait for the following
message:

[2020-05-29 07:55:04,984][INFO
][build-idx-runner-#61][GridCacheDatabaseSharedManager] Indexes rebuilding completed for all caches.

I guess that it can help you with your upgrade. Please test it before applying it to your production because the reason can be different.

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s).
are confidential and may be privileged. If you are not the intended recipient. you are hereby notified that any
review. re-transmission. conversion to hard copy. copying. circulation or other use of this message and any attachments is
strictly prohibited. If you are not the intended recipient. please notify the sender immediately by return email.
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email.
The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: [External]Re: Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hello!

You can't do that in runtime, but you can certainly remove index.bin as bulk operation.

Regards,
--
Ilya Kasnacheev


вт, 28 июл. 2020 г. в 08:48, Kamlesh Joshi <[hidden email]>:
Thanks for the update Andrei. We upgraded from 2.6.0 to 2.7.6.

We followed the same approach and it worked properly.

Can we change INLINE SIZE for every cache at runtime ?

Thanks and Regards,
Kamlesh Joshi


-----Original Message-----
From: aealexsandrov <[hidden email]>
Sent: 27 July 2020 19:35
To: [hidden email]
Subject: [External]Re: Node left from running cluster due to org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

The e-mail below is from an external source. Please do not open attachments or click links from an unknown or suspicious origin.

Hi,

Can you please clarify the version from which you upgraded?

I know that it's possible that if you used a pretty old version then you should also rebuild your indexes during the upgrade because inline size calculation logic was changed in last releases.

It can be done by removing index.bin files. So the logic should be the
following:

1)Stop the Ignite nodes.
2)Remove index.bin files from
$IGNITE_HOME/db/<NODE_ID>/<CACHE_NAME>/index.bin
3)Start Ignite nodes one by one. For each node wait for the following
message:

[2020-05-29 07:55:04,984][INFO
][build-idx-runner-#61][GridCacheDatabaseSharedManager] Indexes rebuilding completed for all caches.

I guess that it can help you with your upgrade. Please test it before applying it to your production because the reason can be different.

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s).
are confidential and may be privileged. If you are not the intended recipient. you are hereby notified that any
review. re-transmission. conversion to hard copy. copying. circulation or other use of this message and any attachments is
strictly prohibited. If you are not the intended recipient. please notify the sender immediately by return email.
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email.
The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."