What does all partition owners have left the grid on the client side mean?

classic Classic list List threaded Threaded
17 messages Options
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

What does all partition owners have left the grid on the client side mean?

The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Because in between it's all the business logs. Let me make sure I didn't filter anything relevant. So maybe in those 13 hours nothing happened?


On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[hidden email]> wrote:
This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?


On Thu, 25 Jun 2020 at 11:01, John Smith <[hidden email]> wrote:
Because in between it's all the business logs. Let me make sure I didn't filter anything relevant. So maybe in those 13 hours nothing happened?


On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[hidden email]> wrote:
This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Hi Evgenii, did you have a chance to look at the latest logs?

On Thu, 25 Jun 2020 at 11:32, John Smith <[hidden email]> wrote:

On Thu, 25 Jun 2020 at 11:01, John Smith <[hidden email]> wrote:
Because in between it's all the business logs. Let me make sure I didn't filter anything relevant. So maybe in those 13 hours nothing happened?


On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[hidden email]> wrote:
This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

John,

Unfortunately, I didn't find messages about lost partitions for this cache, there is a chance that it happened before. What Partition Loss policy do you have?

Logs says that there is a problem with partition distribution:
 "Local node affinity assignment distribution is not ideal [cache=cache1, expectedPrimary=512.00, actualPrimary=493, expectedBackups=512.00, actualBackups=171, warningThreshold=50.00%]"
How do you restart nodes? Do you wait until rebalance completed?

Evgenii



пт, 3 июл. 2020 г. в 09:03, John Smith <[hidden email]>:
Hi Evgenii, did you have a chance to look at the latest logs?

On Thu, 25 Jun 2020 at 11:32, John Smith <[hidden email]> wrote:

On Thu, 25 Jun 2020 at 11:01, John Smith <[hidden email]> wrote:
Because in between it's all the business logs. Let me make sure I didn't filter anything relevant. So maybe in those 13 hours nothing happened?


On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[hidden email]> wrote:
This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: What does all partition owners have left the grid on the client side mean?

Yeah I restarted the server nodes. But I guess the client didn't reconnect.... Hummmmm....

On Tue., Jul. 7, 2020, 5:52 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John,

Unfortunately, I didn't find messages about lost partitions for this cache, there is a chance that it happened before. What Partition Loss policy do you have?

Logs says that there is a problem with partition distribution:
 "Local node affinity assignment distribution is not ideal [cache=cache1, expectedPrimary=512.00, actualPrimary=493, expectedBackups=512.00, actualBackups=171, warningThreshold=50.00%]"
How do you restart nodes? Do you wait until rebalance completed?

Evgenii



пт, 3 июл. 2020 г. в 09:03, John Smith <[hidden email]>:
Hi Evgenii, did you have a chance to look at the latest logs?

On Thu, 25 Jun 2020 at 11:32, John Smith <[hidden email]> wrote:

On Thu, 25 Jun 2020 at 11:01, John Smith <[hidden email]> wrote:
Because in between it's all the business logs. Let me make sure I didn't filter anything relevant. So maybe in those 13 hours nothing happened?


On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[hidden email]> wrote:
This doesn't seem to be a full log. There is a gap for more than 13 hours in the log :
{"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New version is available at ignite.apache.org: 2.8.1"}
{"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73], sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632, lastExchangeTime=1592890182021, loc=false, ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}

I don't see any exceptions in the log. When did the issue happen? Can you share the full log?

Evgenii

чт, 25 июн. 2020 г. в 07:36, John Smith <[hidden email]>:
Hi Evgenii, same folder shared stdout.copy

Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[hidden email]> wrote:
No, it's not. It's not clear when it happened and what was with the cluster and the client node itself at this moment.

Evgenii

ср, 24 июн. 2020 г. в 18:16, John Smith <[hidden email]>:
Ok I'll try... The stack trace isn't enough?

On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
John, right, didn't notice them before. Can you share the full log for the client node with an issue?

Evgenii

ср, 24 июн. 2020 г. в 12:29, John Smith <[hidden email]>:
I thought I did! The link doesn't have them?

On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <[hidden email]> wrote:
Can you share full log files from server nodes?

ср, 24 июн. 2020 г. в 10:47, John Smith <[hidden email]>:
The logs for server are here: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0

The error from the client:

javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
at org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=cache1, part=580, key=UserKeyCacheObjectImpl [part=580, val=14385045508, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
at com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
at com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
at com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
at io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
at io.reactivex.Completable.subscribe(Completable.java:2309)
at io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
at io.reactivex.Flowable.subscribe(Flowable.java:14918)
at io.reactivex.Flowable.subscribe(Flowable.java:14865)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
at io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
at io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
at io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
at io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
at io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
at io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
at io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
at com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
at com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
at io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
at io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
at io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
at io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
at io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
... 7 common frames omitted

On Wed, 24 Jun 2020 at 13:28, John Smith <[hidden email]> wrote:
Not sure about the wrong configuration... All the apps work this seems to happen every few weeks. We don't have any particular heavy load.

I just bounced the client application and the errors went away.

On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <[hidden email]> wrote:
Hi, 

It means that there are no nodes in the cluster that holds certain partitions. So, probably you have a wrong configuration or some of the nodes left the cluster and you don't have backups in the cluster for these partitions. I believe more can be found from logs.

Evgenii

ср, 24 июн. 2020 г. в 09:52, John Smith <[hidden email]>:
Also I'm assuming that the thin client wouldn't be susceptible to this error?

On Wed, 24 Jun 2020 at 12:38, John Smith <[hidden email]> wrote:
The cluster is showing active when running control.sh

But the client is showing "all partition owners have left the grid"

The client node is marked as client=true so it's not a server node.

Is this split brain as well?