Cache atomic conditional remove returns error when node leaves cluster

classic Classic list List threaded Threaded
3 messages Options
Shaun Mcginnity Shaun Mcginnity
Reply | Threaded
Open this post in threaded view
|

Cache atomic conditional remove returns error when node leaves cluster

Hi,

I'm attempting to create a distributed locking mechanism using a distributed cache with the atomic putIfAbsent(key, value) and conditional remove(key, value) operations.

The cache configuration is as follows:

CacheConfiguration [name=lock_data0, storeConcurrentLoadAllThreshold=5, rebalancePoolSize=2, rebalanceTimeout=10000, evictPlc=FifoEvictionPolicy [max=10, batchSize=1, maxMemSize=0, memSize=0], evictSync=false, evictKeyBufSize=1024, evictSyncConcurrencyLvl=4, evictSyncTimeout=10000, evictFilter=null, evictMaxOverflowRatio=10.0, eagerTtl=true, dfltLockTimeout=0, startSize=1500000, nearCfg=null, writeSync=FULL_SYNC, storeFactory=null, storeKeepBinary=false, loadPrevVal=false, aff=org.apache.ignite.cache.affinity.fair.FairAffinityFunction@4e31276e, cacheMode=PARTITIONED, atomicityMode=ATOMIC, atomicWriteOrderMode=null, backups=1, invalidate=false, tmLookupClsName=null, rebalanceMode=ASYNC, rebalanceOrder=0, rebalanceBatchSize=524288, rebalanceBatchesPrefetchCount=2, offHeapMaxMem=0, swapEnabled=false, maxConcurrentAsyncOps=500, writeBehindEnabled=false, writeBehindFlushSize=10240, writeBehindFlushFreq=5000, writeBehindFlushThreadCnt=1, writeBehindBatchSize=512, memMode=OFFHEAP_TIERED, affMapper=null, rebalanceDelay=0, rebalanceThrottle=0, interceptor=null, longQryWarnTimeout=3000, readFromBackup=false, nodeFilter=null, sqlSchema=null, sqlEscapeAll=false, sqlOnheapRowCacheSize=10240, snapshotableIdx=false, cpOnRead=true, topValidator=null]

The code to put is:

while(lockCache.putIfAbsent(entry, lockId) == false) {
try {
attempt++;
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

I have wrapped the code to remove in a loop to monitor the error:

while(attempt < maxAttempts && lockCache.remove(entry, lockId) == false) {
String v = (String) lockCache.get(entry);
System.err.println("ERROR : " + entry + " : lock invalid " + (v == null ? "null" : v) + " expecting " + lockId);
try {
Thread.sleep(2);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
attempt++;
}

I have 4 nodes in the cluster. If one node shuts down (e.g. after ctrl-c) then I get a small number of errors trying to conditionally remove a key just at the point when the node is detected to have closed:

INFO: Node left topology: TcpDiscoveryNode [id=fcea159d-2183-45b1-a985-1fb1ffa8b552, addrs=[0:0:0:0:0:0:0:1%lo, 10.20.50.160, 10.20.75.17, 127.0.0.1, 2a00:2381:757:50:2e76:8aff:fe57:3714%eth2, 2a00:2381:757:75:ea39:35ff:fec4:6d98%eth0], sockAddrs=[/2a00:2381:757:50:2e76:8aff:fe57:3714%eth2:47503, /0:0:0:0:0:0:0:1%lo:47503, bfs-dl380pg8-03.bfs.openwave.com/10.20.50.160:47503, /10.20.50.160:47503, /2a00:2381:757:75:ea39:35ff:fec4:6d98%eth0:47503, /10.20.75.17:47503, bfs-dl380pg8-03t.bfs.openwave.com/10.20.75.17:47503, /127.0.0.1:47503, /2a00:2381:757:50:2e76:8aff:fe57:3714%eth2:47503, /2a00:2381:757:75:ea39:35ff:fec4:6d98%eth0:47503], discPort=47503, order=4, intOrder=4, lastExchangeTime=1460716761999, loc=false, ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false]
Apr 15, 2016 11:41:17 AM org.apache.ignite.logger.java.JavaLogger info
INFO: Topology snapshot [ver=5, servers=3, clients=0, CPUs=32, heap=6.0GB]
ERROR : 0_k0001162115 : lock invalid null expecting 428696898473974
ERROR : 0_k0001132312 : lock invalid null expecting 428696898473973

So remove(entry, lockId) returns false even though the putIfAbsent was successful.  I don't see any exception being thrown by remove.

Is there an explanation for this, and any workaround?

Regards,

Shaun
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Cache atomic conditional remove returns error when node leaves cluster

Hi,

This is possible in ATOMIC cache, because if the client doesn't receive the response from the failed node, it doesn't know if the value was removed or not. It will then get an exception and retry, but if the value was removed on the first iteration, false will be removed.

Can you try switching to TRANSACTIONAL cache [1]? It guarantees strict semantics and should fix your case.

[1] https://apacheignite.readme.io/docs/transactions#atomicity-mode

-Val
Shaun Mcginnity Shaun Mcginnity
Reply | Threaded
Open this post in threaded view
|

Re: Cache atomic conditional remove returns error when node leaves cluster

Hi Val,

thanks, yes, this fixes the error case and I see an exception when I kill one of the nodes.

Regards,

Shaun

On Sat, Apr 16, 2016 at 5:40 AM, vkulichenko <[hidden email]> wrote:
Hi,

This is possible in ATOMIC cache, because if the client doesn't receive the
response from the failed node, it doesn't know if the value was removed or
not. It will then get an exception and retry, but if the value was removed
on the first iteration, false will be removed.

Can you try switching to TRANSACTIONAL cache [1]? It guarantees strict
semantics and should fix your case.

[1] https://apacheignite.readme.io/docs/transactions#atomicity-mode

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cache-atomic-conditional-remove-returns-error-when-node-leaves-cluster-tp4223p4245.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.