index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

classic Classic list List threaded Threaded
4 messages Options
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hi all,
I have deployed 3 node Ignite cluster with native persistence on Kubernetes and one of the node crashed with below error message, 

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [50000-197]|

Please find attached file [index_corruption.txt] for complete backtrace.

It looks like the Index got corrupted, I am not sure what exactly caused the index to corrupt. Any knows issues related to this?

In my cluster, many applications write into many tables simultaneously and some queries run on many tables simultaneously and frequently application deletes unwanted rows[old data] in the tables using delete from table SQL operation.
 

index_corruption.txt (11K) Download Attachment
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

Re: index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hi,

Could you please clarify what version of Ignite you're currently using?
Also can you attach full logs from all nodes and if it's possible provide your persistence data for the cache with corrupted index tree ("epro_model_abcd")? 
By default Ii should be in ${IGNITE_HOME}/work/db/{node}/{cache} directory.

Regards,
Igor

On Fri, Sep 20, 2019 at 1:21 PM Shiva Kumar <[hidden email]> wrote:
Hi all,
I have deployed 3 node Ignite cluster with native persistence on Kubernetes and one of the node crashed with below error message, 

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [50000-197]|

Please find attached file [index_corruption.txt] for complete backtrace.

It looks like the Index got corrupted, I am not sure what exactly caused the index to corrupt. Any knows issues related to this?

In my cluster, many applications write into many tables simultaneously and some queries run on many tables simultaneously and frequently application deletes unwanted rows[old data] in the tables using delete from table SQL operation.
 
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hi Igor,
Thanks for the response!
The version I am using is 2.7.0
Unfortunately, I do not have logs of all the nodes, but I have much more extra logs (along with thread dump) of the node which reported index corruption and attached the same.
Sorry as of now I can't share persistence data here.
I have 4 cache groups each cache groups having many tables.

Here are all index.bin files under the persistence directory.

[ignite@ignite-cluster-ignite-e-1 persistence]$
[ignite@ignite-cluster-ignite-esoc-1 persistence]$ find /opt/ignite/persistence/ -name index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/metastorage/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cache-ignite-sys-cache/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cache-PUBLIC/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupEternal/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin15/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin1/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin5/index.bin
[ignite@ignite-cluster-ignite-e-1 persistence]$
 

In this ticket https://issues.apache.org/jira/browse/IGNITE-11252, the steps to recover from index corruption is documented but what exactly caused the index corruption in my case is unknown.

I think it would be great If index gets corrupted for some reason then the node should delete the index and rebuild it again without shutting down the node.


On Fri, Sep 20, 2019 at 4:19 PM Igor Belyakov <[hidden email]> wrote:
Hi,

Could you please clarify what version of Ignite you're currently using?
Also can you attach full logs from all nodes and if it's possible provide your persistence data for the cache with corrupted index tree ("epro_model_abcd")? 
By default Ii should be in ${IGNITE_HOME}/work/db/{node}/{cache} directory.

Regards,
Igor

On Fri, Sep 20, 2019 at 1:21 PM Shiva Kumar <[hidden email]> wrote:
Hi all,
I have deployed 3 node Ignite cluster with native persistence on Kubernetes and one of the node crashed with below error message, 

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [50000-197]|

Please find attached file [index_corruption.txt] for complete backtrace.

It looks like the Index got corrupted, I am not sure what exactly caused the index to corrupt. Any knows issues related to this?

In my cluster, many applications write into many tables simultaneously and some queries run on many tables simultaneously and frequently application deletes unwanted rows[old data] in the tables using delete from table SQL operation.
 

threaddump_index_corruption.txt (538K) Download Attachment
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

Hello!

It's recommended to upgrate to 2.7.6 because it contains persistence corruption fixes.

Regards,
--
Ilya Kasnacheev


чт, 26 сент. 2019 г. в 12:04, Shiva Kumar <[hidden email]>:
Hi Igor,
Thanks for the response!
The version I am using is 2.7.0
Unfortunately, I do not have logs of all the nodes, but I have much more extra logs (along with thread dump) of the node which reported index corruption and attached the same.
Sorry as of now I can't share persistence data here.
I have 4 cache groups each cache groups having many tables.

Here are all index.bin files under the persistence directory.

[ignite@ignite-cluster-ignite-e-1 persistence]$
[ignite@ignite-cluster-ignite-esoc-1 persistence]$ find /opt/ignite/persistence/ -name index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/metastorage/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cache-ignite-sys-cache/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cache-PUBLIC/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupEternal/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin15/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin1/index.bin
/opt/ignite/persistence/node00-a6103519-fb67-45fd-8646-2b6d8cfac53e/cacheGroup-groupmin5/index.bin
[ignite@ignite-cluster-ignite-e-1 persistence]$
 

In this ticket https://issues.apache.org/jira/browse/IGNITE-11252, the steps to recover from index corruption is documented but what exactly caused the index corruption in my case is unknown.

I think it would be great If index gets corrupted for some reason then the node should delete the index and rebuild it again without shutting down the node.


On Fri, Sep 20, 2019 at 4:19 PM Igor Belyakov <[hidden email]> wrote:
Hi,

Could you please clarify what version of Ignite you're currently using?
Also can you attach full logs from all nodes and if it's possible provide your persistence data for the cache with corrupted index tree ("epro_model_abcd")? 
By default Ii should be in ${IGNITE_HOME}/work/db/{node}/{cache} directory.

Regards,
Igor

On Fri, Sep 20, 2019 at 1:21 PM Shiva Kumar <[hidden email]> wrote:
Hi all,
I have deployed 3 node Ignite cluster with native persistence on Kubernetes and one of the node crashed with below error message, 

org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [50000-197]|

Please find attached file [index_corruption.txt] for complete backtrace.

It looks like the Index got corrupted, I am not sure what exactly caused the index to corrupt. Any knows issues related to this?

In my cluster, many applications write into many tables simultaneously and some queries run on many tables simultaneously and frequently application deletes unwanted rows[old data] in the tables using delete from table SQL operation.