Quantcast

Pessimistic TXN did not release lock on a key, all subsequent txns failed

classic Classic list List threaded Threaded
51 messages Options
123
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Pessimistic TXN did not release lock on a key, all subsequent txns failed

Guys,

We use pessimistic txns in ignite and they have been working great for months now, however yesterday we ran into an issue, where on multiple updates to the same key (although in interval of minutes), many updates went through, but at some point, we started getting TransactionTimeoutException because the key was locked. After that any update for the same key failed with the same error.

We checked the logs for the last successful update and all looks good. Something happened that although the txn was completed, they key was still not released. We had to actually restart the cluster post business hours to resolve the issue.

1) Is there a way to avoid this?
2) Can we monitor objects via visor or jconsole to see if some keys are locked for a prolonged period of time?
3) Is there a way to free up the lock remotely or by running some salvaging code?

Please advise, your help is much appreciated.

Thanks,
Binti
Andrey Mashenkov Andrey Mashenkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi Binti,

It looks like smth happed while transaction was rollingback and some locks was not released properly.
What Ignite version do you use?

Is it happened on stable topology or not? 
Did you see hanging threads?
Are there any clients on topology that was involved in transaction?

Please provide more information, logs and thread-dumps if possible.

On Fri, Feb 10, 2017 at 1:36 AM, bintisepaha <[hidden email]> wrote:
Guys,

We use pessimistic txns in ignite and they have been working great for
months now, however yesterday we ran into an issue, where on multiple
updates to the same key (although in interval of minutes), many updates went
through, but at some point, we started getting TransactionTimeoutException
because the key was locked. After that any update for the same key failed
with the same error.

We checked the logs for the last successful update and all looks good.
Something happened that although the txn was completed, they key was still
not released. We had to actually restart the cluster post business hours to
resolve the issue.

1) Is there a way to avoid this?
2) Can we monitor objects via visor or jconsole to see if some keys are
locked for a prolonged period of time?
3) Is there a way to free up the lock remotely or by running some salvaging
code?

Please advise, your help is much appreciated.

Thanks,
Binti




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi Andrey,

We are using ignite 1.7.0.
I am attaching the thread dumps. As far as we can tell the grid was working fine, no threads were hanging. Only subsequent updates to this key were hung.

The topology was stable. And the client sends an object to be saved to the server and the transaction is actually only on the server side.
prodThreadDump20170208.zip

Please let us know if you see something useful.

Thanks,
Binti
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hello, any help on this would be very much appreciated.
Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi,

I see nothing in thread dump. Possibly, there is a deadlock, as there are no known issues with pessimistic TX on stable topology for ignite 1.7. 
Deadlock detection implemented in ignite 1.8. You can try to upgrade to 1.8.


On Thu, Feb 16, 2017 at 9:32 PM, bintisepaha <[hidden email]> wrote:
Hello, any help on this would be very much appreciated.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10683.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Best regards,
Andrey V. Mashenkov
Regards, Andrew.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Thanks Andrew,

The same thing happened again today. Clearly the key is locked, we get the timeout exceptions. But prior update to the same has not thrown any exceptions. Suddenly one update fails with timeout exceptions and we are notified due to those exceptions that the key is locked.

We will upgrade to 1.8, but in the meantime is there a way to free up this locked key using some code?

We try killing nodes, but we have one back up and it looks like the lock is carried over too, which would be the right thing to do.

Outside the transaction we can read this key (dirty read). This is becoming an issue for us, since its a production system and the only way to free up is to restart the cluster.

Please point us in a direction where we can avoid this or free it up.

Thanks,
Binti
agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi,

Actualy dead lock detection for pessimistic transactions was
introduced in Apache Ignite 1.7 release. But anyway it is good idea to
try the same code on the newest release.

In case of trasaction timeout dead lock detection should be triggered
and TransactionTimoutException should contain
TransactionDeadlockException as cause if dead lock was detected. Is it
your case or not?

If transaction doesn't release any locks (or non transactional locks)
then you will get messages in logs about long running transactions or
futures because exchange can't be finished in this case. Do you have
something like to this messages?

Is it possible to create some reproducer project that we can analyze?


On Sat, Feb 18, 2017 at 1:42 AM, bintisepaha <[hidden email]> wrote:

> Thanks Andrew,
>
> The same thing happened again today. Clearly the key is locked, we get the
> timeout exceptions. But prior update to the same has not thrown any
> exceptions. Suddenly one update fails with timeout exceptions and we are
> notified due to those exceptions that the key is locked.
>
> We will upgrade to 1.8, but in the meantime is there a way to free up this
> locked key using some code?
>
> We try killing nodes, but we have one back up and it looks like the lock is
> carried over too, which would be the right thing to do.
>
> Outside the transaction we can read this key (dirty read). This is becoming
> an issue for us, since its a production system and the only way to free up
> is to restart the cluster.
>
> Please point us in a direction where we can avoid this or free it up.
>
> Thanks,
> Binti
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10713.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

This post was updated on .
Andrey, thanks for getting back.
I am attaching the stack trace. Don't think the cause is a deadloc, but the trace is long so maybe I am missing out something, let me know if you find something useful.

LockedKeyException.txt

We cannot ourselves reproduce this issue as there are no errors on the prior successful update. It feels like the txn was marked successful, but on one of the keys the lock was not released. and later when we try to access the key, its locked, hence the exceptions.

No messages in the logs for long running txns or futures.
By killing the node that holds the key, the lock is not released.

Is there a way to query ignite to see if the locks are being held on a particular key? Any code we can run to salvage such locks?

Any other suggestions?

Thanks,
Binti


agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

There are no something suspicious in stack trace.

You can check that key is locked using IgniteCache.isLocalLocked()
method. For remote nodes you can run task that performs this checking.

Could you please provide full logs for analysis?

On Tue, Feb 21, 2017 at 6:48 PM, bintisepaha <[hidden email]> wrote:

> Andrey, thanks for getting back.
> I am attaching the stack trace. Don't think the cause is a deadloc, but the
> trace is long so maybe I am missing out something, let me know if you find
> something useful.
>
> We cannot ourselves reproduce this issue as there are no errors on the prior
> successful update. It feels like the txn was marked successful, but on one
> of the keys the lock was not released. and later when we try to access the
> key, its locked, hence the exceptions.
>
> No messages in the logs for long running txns or futures.
> By killing the node that holds the key, the lock is not released.
>
> Is there a way to query ignite to see if the locks are being held on a
> particular key? Any code we can run to salvage such locks?
>
> Any other suggestions?
>
> Thanks,
> Binti
>
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10764.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi, I will try this code the next time this issue happens.

Attaching one nodes full logs. It has a lot of info.
Ignite-11221.gz

However, I found this on the console logs on the first exception that occurred. Is this usual?

 14764 Feb 17, 2017 1:16:04 PM org.apache.ignite.logger.java.JavaLogger error
 14765 SEVERE: Failed to execute job [jobId=fcedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable, dep=LocalDeployment [super=GridDeployment        [ts=1486832074045, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=678f81e2a51-b602d584-5565-434b-9727-94e218108073, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]],        taskClsName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable, sesId=ecedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, startTime=1487355352478, endTime=9223372036854775807, taskNodeId=9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, clsLdr=sun.misc       .Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=9aa7d7c5-f6fa-4bdd-9473-0439b889d46f, mapFut=IgniteFuture [orig=GridFutureAdapter [resFlag=0, res=null, startTime=1487355352483,        endTime=0, ignoreInterrupts=false, state=INIT]]], jobId=fcedb9c4a51-9aa7d7c5-f6fa-4bdd-9473-0439b889d46f]]
 14766 class org.apache.ignite.IgniteException: Transaction has been already completed.
 14767         at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:908)
 14768         at org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.rollback(TransactionProxyImpl.java:299)
 14769         at com.tudor.datagridI.server.cache.transaction.IgniteCacheTransaction.rollback(IgniteCacheTransaction.java:19)
 14770         at com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.processOrderHolders(OrderHolderSaveRunnable.java:509)
 14771         at com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable.run(OrderHolderSaveRunnable.java:105)
 14772         at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4V2.execute(GridClosureProcessor.java:2184)
 14773         at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:509)
 14774         at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6521)
 14775         at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:503)
 14776         at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:456)
 14777         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 14778         at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1161)
 14779         at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
 14780         at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
 14781         at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
 14782         at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
 14783         at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
 14784         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 14785         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 14786         at java.lang.Thread.run(Thread.java:745)
 14787 Caused by: class org.apache.ignite.IgniteCheckedException: Transaction has been already completed.
 14788         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:776)
 14789         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:718)
 14790         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:681)
 14791         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:156)
 14792         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:154)
 14793         at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:748)
 14794         at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:353)
 14795         at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:277)
 14796         at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:88)
 14797         at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1$2$1.run(GridCacheIoManager.java:222)
 14798         at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6577)
 14799         at org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:781)
 14800         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 14801         ... 3 more
 14802
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Attaching the console log file too, it has the above error.
Unfortunately we lost the files for older updates to this key now, they rolled over.
Ignite-Console-1.zip

but the "Transaction has been already completed" error could also happen because maybe we call explicit rollback in case of any Runtime Exception.

tx.rollback();

We do get genuine timeouts in the code, but at that time, we never get this exception.

Hope this helps.

Thanks,
Binti

bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Andrey,

I finally have an error that might help. this happened again in production today for us.
Ignite-Console-3.zip

This is the last update that threw an error, after this error every update just times out.

The timeout=9990 in this error, none of our transactions have this timeout.
Do you think this is an ignite bug? If you look at the stack trace, this is not happening due to our code.

Although there is an error on marshaling. How can we further narrow it down?

Thanks,
Binti
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

This is the actual error that looks like it is not coming from our code

109714 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger error
109715 SEVERE: <Position> Failed to acquire lock for request: GridNearLockRequest [topVer=AffinityTopologyVersion [topVer=3153, minorTopVer=32], miniId=acdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859, implicitTx=f       alse, implicitSingleTx=false, onePhaseCommit=false, dhtVers=[null], subjId=f6663b00-24fc-4515-91ac-20c3b47d90ec, taskNameHash=0, hasTransforms=false, syncCommit=true, accessTtl=-1, retVal=true, firstClientRe       q=false, filter=null, super=GridDistributedLockRequest [nodeId=f6663b00-24fc-4515-91ac-20c3b47d90ec, nearXidVer=GridCacheVersion [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7], threa       dId=57, futId=8cdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859, timeout=9990, isInTx=true, isInvalidate=false, isRead=true, isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1, super=Grid       DistributedBaseMessage [ver=GridCacheVersion [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheMessage [msgId=2688455, de       pInfo=null, err=null, skipPrepare=false, cacheId=812449097, cacheId=812449097]]]]
109716 class org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed to acquire lock within provided timeout for transaction [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis       tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
109717         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
109718         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
109719         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
109720         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
109721         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
109722         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
109723         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
109724         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
109725         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
109726         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
109727         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
109728         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
109729         at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
109730         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
109731         at java.lang.Thread.run(Thread.java:745)
109732
109733 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger error
109716 class org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed to acquire lock within provided timeout for transaction [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis       tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
109717         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
109718         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
109719         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
109720         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
109721         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
109722         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
109723         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
109724         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
109725         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
109726         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
109727         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
109728         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
109729         at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
109730         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
109731         at java.lang.Thread.run(Thread.java:745)
109732
109733 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger error
109734 SEVERE: <Position> Future execution resulted in error: GridDhtEmbeddedFuture [super=GridEmbeddedFuture [embedded=GridEmbeddedFuture [embedded=GridDhtLockFuture [nearNodeId=f6663b00-24fc-4515-91ac-20c3b47d90e       c, nearLockVer=GridCacheVersion [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7], topVer=AffinityTopologyVersion [topVer=3153, minorTopVer=32], threadId=57, futId=7e221e25a51-8c03dfe0-       0f6e-4200-aed7-2233c421e0a3, lockVer=GridCacheVersion [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14], read=true, err=null, timedOut=true, timeout=9990, tx=GridDhtTxLocal [nearNodeId       =f6663b00-24fc-4515-91ac-20c3b47d90ec, nearFutId=8cdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859, nearMiniId=acdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859, nearFinFutId=null, nearFinMiniId=null, nearXidV       er=GridCacheVersion [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false, super=Ign       iteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=GridLongList [idx=1, arr=[812449097]], txMap={IgniteTxKey [key=KeyCacheObjectImpl        [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true], cacheId=812449097]=IgniteTxEntry [key=KeyCacheObjectImpl [val=PositionId        [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true], cacheId=812449097, partId=-1, txKey=IgniteTxKey [key=KeyCacheObjectImpl [val=PositionId        [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true], cacheId=812449097], val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], entryProcesso       rsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=[], filtersPassed=false, filtersSet=false, entry=GridDhtColocatedCacheEntry [super=GridDhtCacheEntry [rdrs=       [], locPart=GridDhtLocalPartition [id=961, map=o.a.i.i.processors.cache.GridCacheConcurrentMapImpl@4cd4bfee, rmvQueue=GridCircularBuffer [sizeMask=127, idxGen=0], cntr=1181, state=OWNING, reservations=0, emp       ty=false, createTime=02/18/2017 10:01:50], super=GridDistributedCacheEntry [super=GridCacheMapEntry [key=KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, tra       derId=5142, valueDate=19000101], hasValBytes=true], val=CacheObjectImpl [val=Position [positionId=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, traderId=5142, valueDate=19000101]       , quantity=104864.0], hasValBytes=true], startVer=1487430990114, ver=GridCacheVersion [topVer=98913252, time=1487796336445, order=1487785730286, nodeOrder=14], hash=534941633, extras=GridCacheMvccEntryExtras        [mvcc=GridCacheMvcc [locs=[GridCacheMvccCandidate [nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, ver=GridCacheVersion [topVer=98913253, time=1487796361851, order=1487785731568, nodeOrder=14], timeout=9990, t       s=1487796361850, threadId=76, id=1817320, topVer=AffinityTopologyVersion [topVer=3152, minorTopVer=41], reentry=null, otherNodeId=14cb9e24-0902-48b7-a016-f572db04838f, otherVer=GridCacheVersion [topVer=98913       253, time=1487796349651, order=1487785731495, nodeOrder=13], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI, clearBrokerId=100       32, insIid=681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true], masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0, prevVer=null, ne       xtVer=null], GridCacheMvccCandidate [nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, ver=GridCacheVersion [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14], timeout=9990, ts=1487796367150       , threadId=57, id=1817427, topVer=AffinityTopologyVersion [topVer=3153, minorTopVer=32], reentry=null, otherNodeId=f6663b00-24fc-4515-91ac-20c3b47d90ec, otherVer=GridCacheVersion [topVer=98913254, time=14877       96367155, order=1487785731866, nodeOrder=7], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=GridCacheVersion [topVer=98913253, time=1487796361851, order=1487785731568, nodeOrder=14], serOrder=null, key=       KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true], masks=local=1|owner=0|ready=1|reentry=0|used=0|tx=1|s       ingle_implicit=0|dht_local=1|near_local=0|removed=0, prevVer=null, nextVer=null]], rmts=null]], flags=0]]]], prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, f       lags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14]]}], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=98913254       , time=1487796367159, order=1487785731867, nodeOrder=14], writeVer=null, implicit=false, loc=true, threadId=57, startTime=1487796367150, nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, startVer=GridCacheVersion        [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=9990, sysInvalidate=false, sys=false, plc=2, commitVer=null       , finalizing=NONE, preparing=false, invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, topVer=AffinityTopologyVersion [topVer=3153, minorTopVer=32], duration=10008ms, onePhaseCommit=false], size=1]]],        mapped=false, trackable=true, accessTtl=-1, needReturnVal=true, skipStore=false, keepBinary=false, innerFuts=[], pendingLocks=[KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid       =681871, strategy=, traderId=5142, valueDate=19000101], hasValBytes=true]], super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=0, lsnrCalls=0, done=true, cancelled=f       alse, err=null, futs=[]]]]]]]
109735 class org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed to acquire lock within provided timeout for transaction [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis       tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
109736         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
109737         at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
109738         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
109739         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
109740         at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
109741         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
109742         at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
109743         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
109744         at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
109745         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
109746         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
109747         at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
109748         at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
109749         at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
109750         at java.lang.Thread.run(Thread.java:745)
109751
109752 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger error
109753 SEVERE: Failed to initialize job [jobId=73652976a51-2c4ae429-3284-4588-a96e-7e487f6653a1, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunn       able, dep=LocalDeployment [super=GridDeployment [ts=1487430097277, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, clsLdrId=1051eb15a51-985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, userVer=0,        loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable, sesId=63652976a51-2c4ae429-3284-4588       -a96e-7e487f6653a1, startTime=1487796377472, endTime=9223372036854775807, taskNodeId=2c4ae429-3284-4588-a96e-7e487f6653a1, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null, fail       Spi=null, loadSpi=null, usage=1, fullSup=false, subjId=2c4ae429-3284-4588-a96e-7e487f6653a1, mapFut=IgniteFuture [orig=GridFutureAdapter [resFlag=0, res=null, startTime=1487796377491, endTime=0, ignoreIn       terrupts=false, state=INIT]]], jobId=73652976a51-2c4ae429-3284-4588-a96e-7e487f6653a1]]
109754 class org.apache.ignite.IgniteCheckedException: Failed to deserialize object with given class loader: sun.misc.Launcher$AppClassLoader@18b4aac2
109755         at org.apache.ignite.marshaller.optimized.OptimizedMarshaller.unmarshal(OptimizedMarshaller.java:250)
109756         at org.apache.ignite.internal.processors.job.GridJobWorker.initialize(GridJobWorker.java:409)
109757         at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1089)
109758         at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
109759         at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
109760         at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
109761         at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
109762         at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
109763         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
109764         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
109765         at java.lang.Thread.run(Thread.java:745)
109766 Caused by: java.io.EOFException: Attempt to read beyond the end of the stream [pos=2216, more=1, max=2215]
109767         at org.apache.ignite.internal.util.io.GridUnsafeDataInput.offset(GridUnsafeDataInput.java:171)
109768         at org.apache.ignite.internal.util.io.GridUnsafeDataInput.readByte(GridUnsafeDataInput.java:398)
109769         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:197)
109770         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
109771         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
109772         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
109773         at org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
109774         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
109775         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
109776         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
109777         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
109778         at org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
109779         at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
109780         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
109781         at org.apache.ignite.marshaller.optimized.OptimizedMarshaller.unmarshal(OptimizedMarshaller.java:247)
109782         ... 10 more
109783
agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

In reply to this post by bintisepaha
Hi

Timeout with value 9990 is ok because it is just the next step of
transaction and 10 ms was spend already for the previous steps of
transactions.

On Thu, Feb 23, 2017 at 8:48 AM, bintisepaha <[hidden email]> wrote:

> Andrey,
>
> I finally have an error that might help. this happened again in production
> today for us.
> Ignite-Console-3.zip
> <http://apache-ignite-users.70518.x6.nabble.com/file/n10828/Ignite-Console-3.zip>
>
> This is the last update that threw an error, after this error every update
> just times out.
>
> The timeout=9990 in this error, none of our transactions have this timeout.
> Do you think this is an ignite bug? If you look at the stack trace, this is
> not happening due to our code.
>
> Although there is an error on marshaling. How can we further narrow it down?
>
> Thanks,
> Binti
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10828.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

In reply to this post by bintisepaha
I still don't see any cause of timeout in logs. But there is one
suspiciously long pause between two log records:

[INFO ] 2017-02-22 15:45:52.807 [pub-#1%DataGridServer-Production%]
OrderHolderSaveRunnable - Begin TXN for orderHolder save, saving 1
orders
[ERROR] 2017-02-22 15:46:01.406 [pub-#2%DataGridServer-Production%]
OrderHolderSaveRunnable - javax.cache.CacheException: class
org.apache.ignite.transactions.TransactionTimeoutException: Cache
transaction time
d out (was rolled back automatically)

It would be great to get thread dumps during this pause.

Did you try Apache Ignite 1.8 release?


On Thu, Feb 23, 2017 at 5:03 PM, bintisepaha <[hidden email]> wrote:

> This is the actual error that looks like it is not coming from our code
>
> 109714 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
> error
> 109715 SEVERE: <Position> Failed to acquire lock for request:
> GridNearLockRequest [topVer=AffinityTopologyVersion [topVer=3153,
> minorTopVer=32], miniId=acdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
> implicitTx=f       alse, implicitSingleTx=false, onePhaseCommit=false,
> dhtVers=[null], subjId=f6663b00-24fc-4515-91ac-20c3b47d90ec, taskNameHash=0,
> hasTransforms=false, syncCommit=true, accessTtl=-1, retVal=true,
> firstClientRe       q=false, filter=null, super=GridDistributedLockRequest
> [nodeId=f6663b00-24fc-4515-91ac-20c3b47d90ec, nearXidVer=GridCacheVersion
> [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7],
> threa       dId=57, futId=8cdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
> timeout=9990, isInTx=true, isInvalidate=false, isRead=true,
> isolation=REPEATABLE_READ, retVals=[true], txSize=0, flags=0, keysCnt=1,
> super=Grid       DistributedBaseMessage [ver=GridCacheVersion
> [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7],
> committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheMessage
> [msgId=2688455, de       pInfo=null, err=null, skipPrepare=false,
> cacheId=812449097, cacheId=812449097]]]]
> 109716 class
> org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
> Failed to acquire lock within provided timeout for transaction
> [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis
> tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
> 109717         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
> 109718         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
> 109719         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
> 109720         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
> 109721         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
> 109722         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
> 109723         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
> 109724         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
> 109725         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
> 109726         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
> 109727         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
> 109728         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
> 109729         at
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
> 109730         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> 109731         at java.lang.Thread.run(Thread.java:745)
> 109732
> 109733 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
> error
> 109716 class
> org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
> Failed to acquire lock within provided timeout for transaction
> [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis
> tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
> 109717         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
> 109718         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
> 109719         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
> 109720         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
> 109721         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
> 109722         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
> 109723         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
> 109724         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
> 109725         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
> 109726         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
> 109727         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
> 109728         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
> 109729         at
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
> 109730         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> 109731         at java.lang.Thread.run(Thread.java:745)
> 109732
> 109733 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
> error
> 109734 SEVERE: <Position> Future execution resulted in error:
> GridDhtEmbeddedFuture [super=GridEmbeddedFuture [embedded=GridEmbeddedFuture
> [embedded=GridDhtLockFuture [nearNodeId=f6663b00-24fc-4515-91ac-20c3b47d90e
> c, nearLockVer=GridCacheVersion [topVer=98913254, time=1487796367155,
> order=1487785731866, nodeOrder=7], topVer=AffinityTopologyVersion
> [topVer=3153, minorTopVer=32], threadId=57, futId=7e221e25a51-8c03dfe0-
> 0f6e-4200-aed7-2233c421e0a3, lockVer=GridCacheVersion [topVer=98913254,
> time=1487796367159, order=1487785731867, nodeOrder=14], read=true, err=null,
> timedOut=true, timeout=9990, tx=GridDhtTxLocal [nearNodeId
> =f6663b00-24fc-4515-91ac-20c3b47d90ec,
> nearFutId=8cdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
> nearMiniId=acdced25a51-c5e64ee6-1079-4b90-bb7b-5ec14032a859,
> nearFinFutId=null, nearFinMiniId=null, nearXidV       er=GridCacheVersion
> [topVer=98913254, time=1487796367155, order=1487785731866, nodeOrder=7],
> super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[],
> dhtNodes=[], explicitLock=false, super=Ign       iteTxLocalAdapter
> [completedBase=null, sndTransformedVals=false, depEnabled=false,
> txState=IgniteTxStateImpl [activeCacheIds=GridLongList [idx=1,
> arr=[812449097]], txMap={IgniteTxKey [key=KeyCacheObjectImpl
> [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871,
> strategy=, traderId=5142, valueDate=19000101], hasValBytes=true],
> cacheId=812449097]=IgniteTxEntry [key=KeyCacheObjectImpl [val=PositionId
> [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=,
> traderId=5142, valueDate=19000101], hasValBytes=true], cacheId=812449097,
> partId=-1, txKey=IgniteTxKey [key=KeyCacheObjectImpl [val=PositionId
> [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871, strategy=,
> traderId=5142, valueDate=19000101], hasValBytes=true], cacheId=812449097],
> val=[op=NOOP, val=null], prevVal=[op=NOOP, val=null], entryProcesso
> rsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null,
> explicitVer=null, dhtVer=null, filters=[], filtersPassed=false,
> filtersSet=false, entry=GridDhtColocatedCacheEntry [super=GridDhtCacheEntry
> [rdrs=       [], locPart=GridDhtLocalPartition [id=961,
> map=o.a.i.i.processors.cache.GridCacheConcurrentMapImpl@4cd4bfee,
> rmvQueue=GridCircularBuffer [sizeMask=127, idxGen=0], cntr=1181,
> state=OWNING, reservations=0, emp       ty=false, createTime=02/18/2017
> 10:01:50], super=GridDistributedCacheEntry [super=GridCacheMapEntry
> [key=KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI,
> clearBrokerId=10032, insIid=681871, strategy=, tra       derId=5142,
> valueDate=19000101], hasValBytes=true], val=CacheObjectImpl [val=Position
> [positionId=PositionId [fundAbbrev=BVI, clearBrokerId=10032, insIid=681871,
> strategy=, traderId=5142, valueDate=19000101]       , quantity=104864.0],
> hasValBytes=true], startVer=1487430990114, ver=GridCacheVersion
> [topVer=98913252, time=1487796336445, order=1487785730286, nodeOrder=14],
> hash=534941633, extras=GridCacheMvccEntryExtras        [mvcc=GridCacheMvcc
> [locs=[GridCacheMvccCandidate [nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62,
> ver=GridCacheVersion [topVer=98913253, time=1487796361851,
> order=1487785731568, nodeOrder=14], timeout=9990, t       s=1487796361850,
> threadId=76, id=1817320, topVer=AffinityTopologyVersion [topVer=3152,
> minorTopVer=41], reentry=null,
> otherNodeId=14cb9e24-0902-48b7-a016-f572db04838f, otherVer=GridCacheVersion
> [topVer=98913       253, time=1487796349651, order=1487785731495,
> nodeOrder=13], mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null,
> serOrder=null, key=KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI,
> clearBrokerId=100       32, insIid=681871, strategy=, traderId=5142,
> valueDate=19000101], hasValBytes=true],
> masks=local=1|owner=1|ready=1|reentry=0|used=0|tx=1|single_implicit=0|dht_local=1|near_local=0|removed=0,
> prevVer=null, ne       xtVer=null], GridCacheMvccCandidate
> [nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, ver=GridCacheVersion
> [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14],
> timeout=9990, ts=1487796367150       , threadId=57, id=1817427,
> topVer=AffinityTopologyVersion [topVer=3153, minorTopVer=32], reentry=null,
> otherNodeId=f6663b00-24fc-4515-91ac-20c3b47d90ec, otherVer=GridCacheVersion
> [topVer=98913254, time=14877       96367155, order=1487785731866,
> nodeOrder=7], mappedDhtNodes=null, mappedNearNodes=null,
> ownerVer=GridCacheVersion [topVer=98913253, time=1487796361851,
> order=1487785731568, nodeOrder=14], serOrder=null, key=
> KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI, clearBrokerId=10032,
> insIid=681871, strategy=, traderId=5142, valueDate=19000101],
> hasValBytes=true], masks=local=1|owner=0|ready=1|reentry=0|used=0|tx=1|s
> ingle_implicit=0|dht_local=1|near_local=0|removed=0, prevVer=null,
> nextVer=null]], rmts=null]], flags=0]]]], prepared=0, locked=false,
> nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, f
> lags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
> [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14]]}],
> super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=98913254       ,
> time=1487796367159, order=1487785731867, nodeOrder=14], writeVer=null,
> implicit=false, loc=true, threadId=57, startTime=1487796367150,
> nodeId=985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, startVer=GridCacheVersion
> [topVer=98913254, time=1487796367159, order=1487785731867, nodeOrder=14],
> endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC,
> timeout=9990, sysInvalidate=false, sys=false, plc=2, commitVer=null       ,
> finalizing=NONE, preparing=false, invalidParts=null, state=MARKED_ROLLBACK,
> timedOut=false, topVer=AffinityTopologyVersion [topVer=3153,
> minorTopVer=32], duration=10008ms, onePhaseCommit=false], size=1]]],
> mapped=false, trackable=true, accessTtl=-1, needReturnVal=true,
> skipStore=false, keepBinary=false, innerFuts=[],
> pendingLocks=[KeyCacheObjectImpl [val=PositionId [fundAbbrev=BVI,
> clearBrokerId=10032, insIid       =681871, strategy=, traderId=5142,
> valueDate=19000101], hasValBytes=true]], super=GridCompoundIdentityFuture
> [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=0, lsnrCalls=0,
> done=true, cancelled=f       alse, err=null, futs=[]]]]]]]
> 109735 class
> org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException:
> Failed to acquire lock within provided timeout for transaction
> [timeout=9990, tx=org.apache.ignite.internal.processors.cache.dis
> tributed.dht.GridDhtTxLocalAdapter$1@7f2a8c8a]
> 109736         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3924)
> 109737         at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:3874)
> 109738         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:91)
> 109739         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:297)
> 109740         at
> org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:290)
> 109741         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:263)
> 109742         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:251)
> 109743         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:381)
> 109744         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:347)
> 109745         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:752)
> 109746         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$600(GridDhtLockFuture.java:79)
> 109747         at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1116)
> 109748         at
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:159)
> 109749         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> 109750         at java.lang.Thread.run(Thread.java:745)
> 109751
> 109752 Feb 22, 2017 3:46:17 PM org.apache.ignite.logger.java.JavaLogger
> error
> 109753 SEVERE: Failed to initialize job
> [jobId=73652976a51-2c4ae429-3284-4588-a96e-7e487f6653a1,
> ses=GridJobSessionImpl [ses=GridTaskSessionImpl
> [taskName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunn
> able, dep=LocalDeployment [super=GridDeployment [ts=1487430097277,
> depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2,
> clsLdrId=1051eb15a51-985b44be-ea4a-4d0b-a2a5-ab2aa441ac62, userVer=0,
> loc=true, sampleClsName=java.lang.String, pendingUndeploy=false,
> undeployed=false, usage=0]],
> taskClsName=com.tudor.datagridI.server.tradegen.OrderHolderSaveRunnable,
> sesId=63652976a51-2c4ae429-3284-4588       -a96e-7e487f6653a1,
> startTime=1487796377472, endTime=9223372036854775807,
> taskNodeId=2c4ae429-3284-4588-a96e-7e487f6653a1,
> clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, closed=false, cpSpi=null,
> fail       Spi=null, loadSpi=null, usage=1, fullSup=false,
> subjId=2c4ae429-3284-4588-a96e-7e487f6653a1, mapFut=IgniteFuture
> [orig=GridFutureAdapter [resFlag=0, res=null, startTime=1487796377491,
> endTime=0, ignoreIn       terrupts=false, state=INIT]]],
> jobId=73652976a51-2c4ae429-3284-4588-a96e-7e487f6653a1]]
> 109754 class org.apache.ignite.IgniteCheckedException: Failed to deserialize
> object with given class loader: sun.misc.Launcher$AppClassLoader@18b4aac2
> 109755         at
> org.apache.ignite.marshaller.optimized.OptimizedMarshaller.unmarshal(OptimizedMarshaller.java:250)
> 109756         at
> org.apache.ignite.internal.processors.job.GridJobWorker.initialize(GridJobWorker.java:409)
> 109757         at
> org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1089)
> 109758         at
> org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1766)
> 109759         at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1238)
> 109760         at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:866)
> 109761         at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:106)
> 109762         at
> org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:829)
> 109763         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 109764         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 109765         at java.lang.Thread.run(Thread.java:745)
> 109766 Caused by: java.io.EOFException: Attempt to read beyond the end of
> the stream [pos=2216, more=1, max=2215]
> 109767         at
> org.apache.ignite.internal.util.io.GridUnsafeDataInput.offset(GridUnsafeDataInput.java:171)
> 109768         at
> org.apache.ignite.internal.util.io.GridUnsafeDataInput.readByte(GridUnsafeDataInput.java:398)
> 109769         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:197)
> 109770         at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
> 109771         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
> 109772         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
> 109773         at
> org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
> 109774         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
> 109775         at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
> 109776         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:491)
> 109777         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:579)
> 109778         at
> org.apache.ignite.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:841)
> 109779         at
> org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:324)
> 109780         at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:367)
> 109781         at
> org.apache.ignite.marshaller.optimized.OptimizedMarshaller.unmarshal(OptimizedMarshaller.java:247)
> 109782         ... 10 more
> 109783
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10833.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Andrey, thanks for getting back.
The long pause is between 2 different threads, so isn't that normal?

Also the 9990 ms and 10 ms used earlier for some previous step in the tn, is this how ignite breaks out the time? we have always seen the timeouts for the value in ms that we set in our code. Also the stack trace does not come from our code which it usually does on genuine timeouts that do not leave the key locked.

We are trying 1.8 in UAT environment and will release to production soon. Unfortunately this issue does not happen in UAT and we have no way of reproducing it.

Is there any way we can force release a locked key without restarting the whole cluster?

Thanks,
Binti

agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

> The long pause is between 2 different threads, so isn't that normal?

Usually it is not normal. So need some investigation in order to
understand why this pause happens.

> Also the 9990 ms and 10 ms used earlier for some previous step in the tn, is
this how ignite breaks out the time?

When Ignite starts transaction on initial node it pass timeout to
tranasction. After mapping keys on nodes of cluster Ignite starts
transactions on remote nodes and pass remaining transaction time to
it. So it is possible that it is your case.

> Is there any way we can force release a locked key without restarting the
whole cluster?

There is no such way.

On Mon, Feb 27, 2017 at 5:39 PM, bintisepaha <[hidden email]> wrote:

> Andrey, thanks for getting back.
> The long pause is between 2 different threads, so isn't that normal?
>
> Also the 9990 ms and 10 ms used earlier for some previous step in the tn, is
> this how ignite breaks out the time? we have always seen the timeouts for
> the value in ms that we set in our code. Also the stack trace does not come
> from our code which it usually does on genuine timeouts that do not leave
> the key locked.
>
> We are trying 1.8 in UAT environment and will release to production soon.
> Unfortunately this issue does not happen in UAT and we have no way of
> reproducing it.
>
> Is there any way we can force release a locked key without restarting the
> whole cluster?
>
> Thanks,
> Binti
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p10910.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Andrey, Could you please tell us a little bit more about the deadlock detection feature in 1.8?

https://issues.apache.org/jira/browse/IGNITE-2969

How would we able to know that there is a deadlock? Do you think that is the case for us now in 1.7 but we can't be sure because we have no such feature?

Thanks,
Binti
agura agura
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Hi,

you could to know more about deadlock detection here [1]

Mentioned ticket about optiomistic transactions. Deadlock detection
for pessimistic transactions (your case) is implemented in 1.7
release.

I don't think that it is your case because in case of
TransactionDeadlockException transaction should be rolled back and all
locks should be released. But it is possible that there is bug in this
place.

By the way, do you have the same issue on 1.8?

[1] https://apacheignite.readme.io/docs/transactions#deadlock-detection

On Mon, Mar 6, 2017 at 5:16 PM, bintisepaha <[hidden email]> wrote:

> Andrey, Could you please tell us a little bit more about the deadlock
> detection feature in 1.8?
>
> https://issues.apache.org/jira/browse/IGNITE-2969
>
> How would we able to know that there is a deadlock? Do you think that is the
> case for us now in 1.7 but we can't be sure because we have no such feature?
>
> Thanks,
> Binti
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Pessimistic-TXN-did-not-release-lock-on-a-key-all-subsequent-txns-failed-tp10536p11038.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
bintisepaha bintisepaha
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pessimistic TXN did not release lock on a key, all subsequent txns failed

Could you please tell us what the logs would say if there was a deadlock?
we rollback the txn on any exception explicitly.

We are running 1.8 in UAT environment this week. It is not a simple upgrade to production, we need to let it burn in. But in UAT we never saw this issue with 1.7 either. It already happened in production, so far 3 times.
123
Loading...