checkpoint marker is present on disk, but checkpoint record is missed in WAL

classic Classic list List threaded Threaded
4 messages Options
KR Kumar KR Kumar
Reply | Threaded
Open this post in threaded view
|

checkpoint marker is present on disk, but checkpoint record is missed in WAL

Hi Guys - I am using ignite persistence with a 8 node cluster. Currently in
dev/poc  stages. I get following exception when i try to restart the node
after I killed the process with "kill <pid>. I have a shutdown hook to the
code in which I am shutting down Ignite with G.stop(false). I read in a blog
that When you stop ignite with cancel false, it will checkpoint the data and
the stop the cluster and should not have any issues with restart. Any help
is greatly appreciated.

Invocation of init method failed; nested exception is class
org.apache.ignite.IgniteCheckedException: Failed to restore memory state
(checkpoint marker is present on disk, but checkpoint record is missed in
WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
06:55:09.341 [main] WARN
org.springframework.context.support.ClassPathXmlApplicationContext -
Exception encountered during context initialization - cancelling refresh
attempt: org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'igniteContainer' defined in class path resource
[mihi-gridworker-s.xml]: Invocation of init method failed; nested exception
is class org.apache.ignite.IgniteCheckedException: Failed to restore memory
state (checkpoint marker is present on disk, but checkpoint record is missed
in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1628)
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
        at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
        at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
        at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
        at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
        at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
        at
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
        at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
        at
org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
        at
org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
        at
com.pointillist.gridworker.agent.MihiGridWorker.start(MihiGridWorker.java:32)
        at com.pointillist.gridworker.MihiWorker.main(MihiWorker.java:20)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to restore
memory state (checkpoint marker is present on disk, but checkpoint record is
missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1433)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:539)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:616)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)


Appreciate your help??

Thanx and Regars,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint marker is present on disk, but checkpoint record is missed in WAL

Hi,

This should never happen in BACKGROUND mode unless you have a hard power kill for your Ignite node (which is not your case). I've reviewed the related parts of the code and found that there were a few tickets fixed in 2.3 that may have caused this issue (e.g. IGNITE-5772). Can you try building a custom Ignite build from ignite-2.3 branch and check if the issue is still present?

Thanks,
AG

2017-10-09 14:18 GMT+03:00 KR Kumar <[hidden email]>:
Hi Guys - I am using ignite persistence with a 8 node cluster. Currently in
dev/poc  stages. I get following exception when i try to restart the node
after I killed the process with "kill <pid>. I have a shutdown hook to the
code in which I am shutting down Ignite with G.stop(false). I read in a blog
that When you stop ignite with cancel false, it will checkpoint the data and
the stop the cluster and should not have any issues with restart. Any help
is greatly appreciated.

Invocation of init method failed; nested exception is class
org.apache.ignite.IgniteCheckedException: Failed to restore memory state
(checkpoint marker is present on disk, but checkpoint record is missed in
WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
06:55:09.341 [main] WARN
org.springframework.context.support.ClassPathXmlApplicationContext -
Exception encountered during context initialization - cancelling refresh
attempt: org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'igniteContainer' defined in class path resource
[mihi-gridworker-s.xml]: Invocation of init method failed; nested exception
is class org.apache.ignite.IgniteCheckedException: Failed to restore memory
state (checkpoint marker is present on disk, but checkpoint record is missed
in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1628)
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555)
        at
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:483)
        at
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
        at
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
        at
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
        at
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
        at
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:761)
        at
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:866)
        at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
        at
org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
        at
org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
        at
com.pointillist.gridworker.agent.MihiGridWorker.start(MihiGridWorker.java:32)
        at com.pointillist.gridworker.MihiWorker.main(MihiWorker.java:20)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to restore
memory state (checkpoint marker is present on disk, but checkpoint record is
missed in WAL) [cpStatus=CheckpointStatus [cpStartTs=1507546382988,
cpStartId=abeb760a-0388-4ad5-8473-62ed9c7bc0f3, startPtr=FileWALPointer
[idx=6, fileOffset=33982453, len=2380345, forceFlush=false],
cpEndId=c257dd1f-c350-4b0d-aefc-cad6d2c2082b, endPtr=FileWALPointer [idx=4,
fileOffset=38761373, len=1586221, forceFlush=false]], lastRead=null]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1433)
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:539)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:616)
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:745)


Appreciate your help??

Thanx and Regars,
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

KR Kumar KR Kumar
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint marker is present on disk, but checkpoint record is missed in WAL

Hi AG,

Thanks for responding to the thread. I have tried with 2.3 and I still face
the same problem.

Just to further explore, I killed ignite instance with kill -9 and a reboot,
both situations, ignite just hangs during restart.

Thanx and Regards
KR Kumar



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: checkpoint marker is present on disk, but checkpoint record is missed in WAL

KR, any chance you can provide a reproducer? It would really help us properly debug your issue. If not, can we get a copy of your configuration?

On Thu, Oct 12, 2017 at 10:31 AM, KR Kumar <[hidden email]> wrote:
Hi AG,

Thanks for responding to the thread. I have tried with 2.3 and I still face
the same problem.

Just to further explore, I killed ignite instance with kill -9 and a reboot,
both situations, ignite just hangs during restart.

Thanx and Regards
KR Kumar