Rebalancing issue in Ignite 2.7

classic Classic list List threaded Threaded
2 messages Options
Kamlesh.Joshi Kamlesh.Joshi
Reply | Threaded
Open this post in threaded view
|

Rebalancing issue in Ignite 2.7

HI Igniters,

 

                Facing some issues in 2.7 while rebalancing, below are the rebalancing and server config parameters:

walMode=BACKGROUND

walFlushFrequency=30000

rebalanceThreadPoolSize=8

rebalanceThrottle=100

rebalanceBatchSize=#{16 * 1024 * 1024}

 

a.       Tried adding a new node to existing cluster it fails

b.       Tried removing one of the server node from cluster still it fails. Below is the stack trace for reference :

 

[2019-02-07T12:44:18,910][INFO ][wal-file-archiver%EDIFCustomer-#152%EDIFCustomer%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=476, segIdx=6, origFile=/app/tibco/Ignite/datastore/wal/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000006.wal, dstFile=/app/tibco/Ignite/datastore/archive/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000476.wal]

[2019-02-07T12:44:18,913][ERROR][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to write buffer.]]

org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to write buffer.

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3484) [ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3301) [ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.7.0.jar:2.7.0]

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

Caused by: java.nio.channels.ClosedChannelException

        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[?:1.8.0_151]

        at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) ~[?:1.8.0_151]

        at org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3477) ~[ignite-core-2.7.0.jar:2.7.0]

        ... 3 more

[2019-02-07T12:44:18,918][WARN ][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][FailureProcessor] No deadlocked threads detected.

[2019-02-07T12:44:21,509][WARN ][jvm-pause-detector-worker][IgniteKernal%EDIFCustomer] Possible too long JVM pause: 2560 milliseconds.

[2019-02-07T12:44:21,560][INFO ][wal-file-archiver%EDIFCustomer-#152%EDIFCustomer%][FileWriteAheadLogManager] Copied file [src=/app/tibco/Ignite/datastore/wal/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000006.wal, dst=/app/tibco/Ignite/datastore/archive/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000476.wal]

[2019-02-07T12:44:21,615][WARN ][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][FailureProcessor] Thread dump at 2019/02/07 12:44:21 IST

…..

[2019-02-07T12:44:21,625][ERROR][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to write buffer.]]

 

Any help or any pointers to correct would help.

 

Thanks in advance !

 

 

Thanks!,

Kamlesh Joshi

 


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Rebalancing issue in Ignite 2.7

Hello!

ClosedChannelException is usually caused by being interrupted when using channel. So you should avoid interrupting Ignite threads. It's hard to say why it happened exactly.

In this case I think you should try restarting problematic threads and hope to rely on PDS. Note that it might not be intact due to interruption.

Regards,
--
Ilya Kasnacheev


чт, 7 февр. 2019 г. в 13:46, <[hidden email]>:

HI Igniters,

 

                Facing some issues in 2.7 while rebalancing, below are the rebalancing and server config parameters:

walMode=BACKGROUND

walFlushFrequency=30000

rebalanceThreadPoolSize=8

rebalanceThrottle=100

rebalanceBatchSize=#{16 * 1024 * 1024}

 

a.       Tried adding a new node to existing cluster it fails

b.       Tried removing one of the server node from cluster still it fails. Below is the stack trace for reference :

 

[2019-02-07T12:44:18,910][INFO ][wal-file-archiver%EDIFCustomer-#152%EDIFCustomer%][FileWriteAheadLogManager] Starting to copy WAL segment [absIdx=476, segIdx=6, origFile=/app/tibco/Ignite/datastore/wal/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000006.wal, dstFile=/app/tibco/Ignite/datastore/archive/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000476.wal]

[2019-02-07T12:44:18,913][ERROR][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to write buffer.]]

org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to write buffer.

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3484) [ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3301) [ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.7.0.jar:2.7.0]

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

Caused by: java.nio.channels.ClosedChannelException

        at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) ~[?:1.8.0_151]

        at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) ~[?:1.8.0_151]

        at org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111) ~[ignite-core-2.7.0.jar:2.7.0]

        at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3477) ~[ignite-core-2.7.0.jar:2.7.0]

        ... 3 more

[2019-02-07T12:44:18,918][WARN ][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][FailureProcessor] No deadlocked threads detected.

[2019-02-07T12:44:21,509][WARN ][jvm-pause-detector-worker][IgniteKernal%EDIFCustomer] Possible too long JVM pause: 2560 milliseconds.

[2019-02-07T12:44:21,560][INFO ][wal-file-archiver%EDIFCustomer-#152%EDIFCustomer%][FileWriteAheadLogManager] Copied file [src=/app/tibco/Ignite/datastore/wal/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000006.wal, dst=/app/tibco/Ignite/datastore/archive/node00-818d836d-47a6-4a8e-9c0b-8837b04d72a8/0000000000000476.wal]

[2019-02-07T12:44:21,615][WARN ][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][FailureProcessor] Thread dump at 2019/02/07 12:44:21 IST

…..

[2019-02-07T12:44:21,625][ERROR][wal-write-worker%EDIFCustomer-#154%EDIFCustomer%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.StorageException: Failed to write buffer.]]

 

Any help or any pointers to correct would help.

 

Thanks in advance !

 

 

Thanks!,

Kamlesh Joshi

 


"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."