Creating off-heap REPLICATED cache with Eviction policies borks the cluster.

classic Classic list List threaded Threaded
4 messages Options
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Creating off-heap REPLICATED cache with Eviction policies borks the cluster.

Hi running 2.7.0,

I have a 4 node cluster running with off-heap persistence works great!

I then by mistake tried to create a REPLICATED cache with LruEvictionPolicy. So we know if the cache is off-heap mode it cannot be created.

But this seems to have borked the cluster, it shut down and now it will no longer start... THE ONLY WAY TO RECOVER IS TO DELETE THE WORK folder.

[19:47:01,758][SEVERE][exchange-worker-#43%xxxxxx%][CacheAffinitySharedManager] Failed to initialize cache. Will try to rollback cache start routine. [cacheName=xxxxxx]
class org.apache.ignite.IgniteCheckedException: Onheap cache must be enabled if eviction policy is configured [cacheName=xxxxxx]
...
[19:47:01,759][INFO][exchange-worker-#43%xxxxxx%][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=xxxxxx, localNodeId=a45103eb-4fe1-4b10-8d2a-3e46b5186068
...
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=178ms]
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Logical recovery performed in 173 ms.
...
[19:47:01,978][SEVERE][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=ChangeGlobalStateMessage [id=994851a7a61-2637d9af-479a-455f-8408-c3f6e4a28782, reqId=746a7c79-2921-4e3f-8b2c-22d43c0c0c6d, initiatingNodeId=cda68e29-5639-4c41-bd50-a9d398f8d7f2, activate=true, baselineTopology=BaselineTopology [id=0, branchingHash=-999309033, branchingType='Cluster activation', baselineNodes=[01a3df31-6d32-4386-bb46-847a16b1dea3, a532b052-9d42-4342-a007-26437525f209, 5369a6f5-edc6-4895-affb-d43b04e2e914, 5b205b68-c665-43ef-937d-90082e22f15e]], forceChangeBaselineTopology=false, timestamp=1556826421083], affTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=cda68e29-5639-4c41-bd50-a9d398f8d7f2, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.17.65], sockAddrs=[ignite-dev-v-0003/172.17.17.65:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1556826394021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], topVer=4, nodeId8=a45103eb, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1556826421114]], nodeId=cda68e29, evt=DISCOVERY_CUSTOM_EVT]
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
...
[19:47:02,081][WARNING][exchange-worker-#43%xxxxxx%][FailureProcessor] Thread dump at 2019/05/02 19:47:02 UTC
Thread [name="ttl-cleanup-worker-#72%xxxxxx%", id=125, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
        at java.lang.Thread.sleep(Native Method)
        at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:7774)
        at o.a.i.i.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:149)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#71%xxxxxx%", id=124, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@15dc7079, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
...
[19:47:02,086][SEVERE][exchange-worker-#43%xxxxxx%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Index: 0, Size: 0]]



ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Creating off-heap REPLICATED cache with Eviction policies borks the cluster.

Hi

I believe that you can just remove folder related to the newly created cache, have you tried to do this?

Evgenii

чт, 2 мая 2019 г. в 23:15, John Smith <[hidden email]>:
Hi running 2.7.0,

I have a 4 node cluster running with off-heap persistence works great!

I then by mistake tried to create a REPLICATED cache with LruEvictionPolicy. So we know if the cache is off-heap mode it cannot be created.

But this seems to have borked the cluster, it shut down and now it will no longer start... THE ONLY WAY TO RECOVER IS TO DELETE THE WORK folder.

[19:47:01,758][SEVERE][exchange-worker-#43%xxxxxx%][CacheAffinitySharedManager] Failed to initialize cache. Will try to rollback cache start routine. [cacheName=xxxxxx]
class org.apache.ignite.IgniteCheckedException: Onheap cache must be enabled if eviction policy is configured [cacheName=xxxxxx]
...
[19:47:01,759][INFO][exchange-worker-#43%xxxxxx%][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=xxxxxx, localNodeId=a45103eb-4fe1-4b10-8d2a-3e46b5186068
...
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=178ms]
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Logical recovery performed in 173 ms.
...
[19:47:01,978][SEVERE][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=ChangeGlobalStateMessage [id=994851a7a61-2637d9af-479a-455f-8408-c3f6e4a28782, reqId=746a7c79-2921-4e3f-8b2c-22d43c0c0c6d, initiatingNodeId=cda68e29-5639-4c41-bd50-a9d398f8d7f2, activate=true, baselineTopology=BaselineTopology [id=0, branchingHash=-999309033, branchingType='Cluster activation', baselineNodes=[01a3df31-6d32-4386-bb46-847a16b1dea3, a532b052-9d42-4342-a007-26437525f209, 5369a6f5-edc6-4895-affb-d43b04e2e914, 5b205b68-c665-43ef-937d-90082e22f15e]], forceChangeBaselineTopology=false, timestamp=1556826421083], affTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=cda68e29-5639-4c41-bd50-a9d398f8d7f2, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.17.65], sockAddrs=[ignite-dev-v-0003/172.17.17.65:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1556826394021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], topVer=4, nodeId8=a45103eb, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1556826421114]], nodeId=cda68e29, evt=DISCOVERY_CUSTOM_EVT]
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
...
[19:47:02,081][WARNING][exchange-worker-#43%xxxxxx%][FailureProcessor] Thread dump at 2019/05/02 19:47:02 UTC
Thread [name="ttl-cleanup-worker-#72%xxxxxx%", id=125, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
        at java.lang.Thread.sleep(Native Method)
        at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:7774)
        at o.a.i.i.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:149)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#71%xxxxxx%", id=124, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@15dc7079, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
...
[19:47:02,086][SEVERE][exchange-worker-#43%xxxxxx%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Index: 0, Size: 0]]



ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Creating off-heap REPLICATED cache with Eviction policies borks the cluster.

Yes, I tried to do this and it works. Just delete folder work/db/node-NODEID/CACHE_NAME. Also, I created a ticket for adding validation for this situation: https://issues.apache.org/jira/browse/IGNITE-11832

Evgenii

пт, 3 мая 2019 г. в 13:39, Evgenii Zhuravlev <[hidden email]>:
Hi

I believe that you can just remove folder related to the newly created cache, have you tried to do this?

Evgenii

чт, 2 мая 2019 г. в 23:15, John Smith <[hidden email]>:
Hi running 2.7.0,

I have a 4 node cluster running with off-heap persistence works great!

I then by mistake tried to create a REPLICATED cache with LruEvictionPolicy. So we know if the cache is off-heap mode it cannot be created.

But this seems to have borked the cluster, it shut down and now it will no longer start... THE ONLY WAY TO RECOVER IS TO DELETE THE WORK folder.

[19:47:01,758][SEVERE][exchange-worker-#43%xxxxxx%][CacheAffinitySharedManager] Failed to initialize cache. Will try to rollback cache start routine. [cacheName=xxxxxx]
class org.apache.ignite.IgniteCheckedException: Onheap cache must be enabled if eviction policy is configured [cacheName=xxxxxx]
...
[19:47:01,759][INFO][exchange-worker-#43%xxxxxx%][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=xxxxxx, localNodeId=a45103eb-4fe1-4b10-8d2a-3e46b5186068
...
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=178ms]
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Logical recovery performed in 173 ms.
...
[19:47:01,978][SEVERE][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=ChangeGlobalStateMessage [id=994851a7a61-2637d9af-479a-455f-8408-c3f6e4a28782, reqId=746a7c79-2921-4e3f-8b2c-22d43c0c0c6d, initiatingNodeId=cda68e29-5639-4c41-bd50-a9d398f8d7f2, activate=true, baselineTopology=BaselineTopology [id=0, branchingHash=-999309033, branchingType='Cluster activation', baselineNodes=[01a3df31-6d32-4386-bb46-847a16b1dea3, a532b052-9d42-4342-a007-26437525f209, 5369a6f5-edc6-4895-affb-d43b04e2e914, 5b205b68-c665-43ef-937d-90082e22f15e]], forceChangeBaselineTopology=false, timestamp=1556826421083], affTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=cda68e29-5639-4c41-bd50-a9d398f8d7f2, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.17.65], sockAddrs=[ignite-dev-v-0003/172.17.17.65:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1556826394021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], topVer=4, nodeId8=a45103eb, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1556826421114]], nodeId=cda68e29, evt=DISCOVERY_CUSTOM_EVT]
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
...
[19:47:02,081][WARNING][exchange-worker-#43%xxxxxx%][FailureProcessor] Thread dump at 2019/05/02 19:47:02 UTC
Thread [name="ttl-cleanup-worker-#72%xxxxxx%", id=125, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
        at java.lang.Thread.sleep(Native Method)
        at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:7774)
        at o.a.i.i.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:149)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#71%xxxxxx%", id=124, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@15dc7079, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
...
[19:47:02,086][SEVERE][exchange-worker-#43%xxxxxx%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Index: 0, Size: 0]]



javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Creating off-heap REPLICATED cache with Eviction policies borks the cluster.

Yes, correct that works! I tried work folder first cause I wasn't sure and then tried the cache folder itself.

Let me know if you need more info for the issue itself. But it is ve.ry easy to reproduce and does exactly what described.

Thanks for your help.

On Fri., May 3, 2019, 6:58 a.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Yes, I tried to do this and it works. Just delete folder work/db/node-NODEID/CACHE_NAME. Also, I created a ticket for adding validation for this situation: https://issues.apache.org/jira/browse/IGNITE-11832

Evgenii

пт, 3 мая 2019 г. в 13:39, Evgenii Zhuravlev <[hidden email]>:
Hi

I believe that you can just remove folder related to the newly created cache, have you tried to do this?

Evgenii

чт, 2 мая 2019 г. в 23:15, John Smith <[hidden email]>:
Hi running 2.7.0,

I have a 4 node cluster running with off-heap persistence works great!

I then by mistake tried to create a REPLICATED cache with LruEvictionPolicy. So we know if the cache is off-heap mode it cannot be created.

But this seems to have borked the cluster, it shut down and now it will no longer start... THE ONLY WAY TO RECOVER IS TO DELETE THE WORK folder.

[19:47:01,758][SEVERE][exchange-worker-#43%xxxxxx%][CacheAffinitySharedManager] Failed to initialize cache. Will try to rollback cache start routine. [cacheName=xxxxxx]
class org.apache.ignite.IgniteCheckedException: Onheap cache must be enabled if eviction policy is configured [cacheName=xxxxxx]
...
[19:47:01,759][INFO][exchange-worker-#43%xxxxxx%][GridCacheProcessor] Can not finish proxy initialization because proxy does not exist, cacheName=xxxxxx, localNodeId=a45103eb-4fe1-4b10-8d2a-3e46b5186068
...
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=178ms]
[19:47:01,972][INFO][exchange-worker-#43%xxxxxx%][GridCacheDatabaseSharedManager] Logical recovery performed in 173 ms.
...
[19:47:01,978][SEVERE][exchange-worker-#43%xxxxxx%][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=ChangeGlobalStateMessage [id=994851a7a61-2637d9af-479a-455f-8408-c3f6e4a28782, reqId=746a7c79-2921-4e3f-8b2c-22d43c0c0c6d, initiatingNodeId=cda68e29-5639-4c41-bd50-a9d398f8d7f2, activate=true, baselineTopology=BaselineTopology [id=0, branchingHash=-999309033, branchingType='Cluster activation', baselineNodes=[01a3df31-6d32-4386-bb46-847a16b1dea3, a532b052-9d42-4342-a007-26437525f209, 5369a6f5-edc6-4895-affb-d43b04e2e914, 5b205b68-c665-43ef-937d-90082e22f15e]], forceChangeBaselineTopology=false, timestamp=1556826421083], affTopVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=cda68e29-5639-4c41-bd50-a9d398f8d7f2, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.17.65], sockAddrs=[ignite-dev-v-0003/172.17.17.65:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1556826394021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=false], topVer=4, nodeId8=a45103eb, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1556826421114]], nodeId=cda68e29, evt=DISCOVERY_CUSTOM_EVT]
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
...
[19:47:02,081][WARNING][exchange-worker-#43%xxxxxx%][FailureProcessor] Thread dump at 2019/05/02 19:47:02 UTC
Thread [name="ttl-cleanup-worker-#72%xxxxxx%", id=125, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
        at java.lang.Thread.sleep(Native Method)
        at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:7774)
        at o.a.i.i.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:149)
        at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
        at java.lang.Thread.run(Thread.java:748)

Thread [name="sys-#71%xxxxxx%", id=124, state=TIMED_WAITING, blockCnt=0, waitCnt=1]
    Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@15dc7079, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
...
[19:47:02,086][SEVERE][exchange-worker-#43%xxxxxx%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Index: 0, Size: 0]]