WAL iteration exceptions

classic Classic list List threaded Threaded
4 messages Options
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

WAL iteration exceptions

We see instances of the below error appearing in our logs (using Ignite.Net v2.6 with persistent storage) on server node startup:

 

2018-10-04 08:38:24:0079             2827983                2018-10-04 08:38:24,046 [1] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31062042, ptr=FileWALPointer [idx=33, fileOff=31062042, len=0]

 

The type of the log message is INFO, but it appears to indicate an issue with the WAL files in our persistent store. All our data regions are configured with WalMode = Fsync, so I’m a little surprised to be seeing what looks like WAL integrity errors.

 

The next message in the log is this, which indicates Ignite was not affected:

 

2018-10-04 08:39:02,116 [6] INFO  MutableCacheComputeServer. Resuming logging to WAL segment [file=[… very long path…]\0000000000000003.wal, offset=31062042, ver=2]

A few lines later there is a repeat of the initial exception related message:

 

2018-10-04 08:39:03,142 [6] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31190597, ptr=FileWALPointer [idx=33, fileOff=31190597, len=0]

Then this line appears:

 

2018-10-04 08:39:03,813 [6] INFO  MutableCacheComputeServer. Finished applying WAL changes [updatesApplied=146, time=1272ms]

The node appears to initialize correctly after this logging.

 

Does anyone else see this behaviour? Is it something to be concerned about or normal logging to be expected after termination of running server instances?

 

The

Thanks,

Raymond.

 

Mikael Mikael
Reply | Threaded
Open this post in threaded view
|

Re: WAL iteration exceptions

Hi!

I got a few of them before with 2.5, have not seen any yet with 2.6, it was actually a warning before but was changed to info, I was told it's not a problem and nothing to worry about, as far as I can tell no data was ever lost.

Mikael


Den 2018-10-03 kl. 21:48, skrev Raymond Wilson:

We see instances of the below error appearing in our logs (using Ignite.Net v2.6 with persistent storage) on server node startup:

 

2018-10-04 08:38:24:0079             2827983                2018-10-04 08:38:24,046 [1] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31062042, ptr=FileWALPointer [idx=33, fileOff=31062042, len=0]

 

The type of the log message is INFO, but it appears to indicate an issue with the WAL files in our persistent store. All our data regions are configured with WalMode = Fsync, so I’m a little surprised to be seeing what looks like WAL integrity errors.

 

The next message in the log is this, which indicates Ignite was not affected:

 

2018-10-04 08:39:02,116 [6] INFO  MutableCacheComputeServer. Resuming logging to WAL segment [file=[… very long path…]\0000000000000003.wal, offset=31062042, ver=2]

A few lines later there is a repeat of the initial exception related message:

 

2018-10-04 08:39:03,142 [6] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31190597, ptr=FileWALPointer [idx=33, fileOff=31190597, len=0]

Then this line appears:

 

2018-10-04 08:39:03,813 [6] INFO  MutableCacheComputeServer. Finished applying WAL changes [updatesApplied=146, time=1272ms]

The node appears to initialize correctly after this logging.

 

Does anyone else see this behaviour? Is it something to be concerned about or normal logging to be expected after termination of running server instances?

 

The

Thanks,

Raymond.

 


Dmitry Pavlov Dmitry Pavlov
Reply | Threaded
Open this post in threaded view
|

Re: WAL iteration exceptions

Hi,

It is totally ok if some WAL segment causes such exception in case of the end of a file was reached.

Messages may disappear because some newer version has special records showing the end of a segment, so WAL reader almost always knows that the end of file reached.

Sincerely,
Dmitriy Pavlov

ср, 3 окт. 2018 г. в 23:38, Mikael <[hidden email]>:

Hi!

I got a few of them before with 2.5, have not seen any yet with 2.6, it was actually a warning before but was changed to info, I was told it's not a problem and nothing to worry about, as far as I can tell no data was ever lost.

Mikael


Den 2018-10-03 kl. 21:48, skrev Raymond Wilson:

We see instances of the below error appearing in our logs (using Ignite.Net v2.6 with persistent storage) on server node startup:

 

2018-10-04 08:38:24:0079             2827983                2018-10-04 08:38:24,046 [1] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31062042, ptr=FileWALPointer [idx=33, fileOff=31062042, len=0]

 

The type of the log message is INFO, but it appears to indicate an issue with the WAL files in our persistent store. All our data regions are configured with WalMode = Fsync, so I’m a little surprised to be seeing what looks like WAL integrity errors.

 

The next message in the log is this, which indicates Ignite was not affected:

 

2018-10-04 08:39:02,116 [6] INFO  MutableCacheComputeServer. Resuming logging to WAL segment [file=[… very long path…]\0000000000000003.wal, offset=31062042, ver=2]

A few lines later there is a repeat of the initial exception related message:

 

2018-10-04 08:39:03,142 [6] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31190597, ptr=FileWALPointer [idx=33, fileOff=31190597, len=0]

Then this line appears:

 

2018-10-04 08:39:03,813 [6] INFO  MutableCacheComputeServer. Finished applying WAL changes [updatesApplied=146, time=1272ms]

The node appears to initialize correctly after this logging.

 

Does anyone else see this behaviour? Is it something to be concerned about or normal logging to be expected after termination of running server instances?

 

The

Thanks,

Raymond.

 


Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

RE: WAL iteration exceptions

I think you are saying if I had a persistent store using Ignite 2.5 and upgraded to 2.6 I might see those errors?

 

In my case the grid in question was first created with 2.6.

 

From: Dmitriy Pavlov <[hidden email]>
Sent: Friday, October 5, 2018 9:46 AM
To: [hidden email]
Subject: Re: WAL iteration exceptions

 

Hi,

 

It is totally ok if some WAL segment causes such exception in case of the end of a file was reached.

 

Messages may disappear because some newer version has special records showing the end of a segment, so WAL reader almost always knows that the end of file reached.

 

Sincerely,

Dmitriy Pavlov

 

ср, 3 окт. 2018 г. в 23:38, Mikael <[hidden email]>:

Hi!

I got a few of them before with 2.5, have not seen any yet with 2.6, it was actually a warning before but was changed to info, I was told it's not a problem and nothing to worry about, as far as I can tell no data was ever lost.

Mikael

 

Den 2018-10-03 kl. 21:48, skrev Raymond Wilson:

We see instances of the below error appearing in our logs (using Ignite.Net v2.6 with persistent storage) on server node startup:

 

2018-10-04 08:38:24:0079             2827983                2018-10-04 08:38:24,046 [1] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31062042, ptr=FileWALPointer [idx=33, fileOff=31062042, len=0]

 

The type of the log message is INFO, but it appears to indicate an issue with the WAL files in our persistent store. All our data regions are configured with WalMode = Fsync, so I’m a little surprised to be seeing what looks like WAL integrity errors.

 

The next message in the log is this, which indicates Ignite was not affected:

 

2018-10-04 08:39:02,116 [6] INFO  MutableCacheComputeServer. Resuming logging to WAL segment [file=[… very long path…]\0000000000000003.wal, offset=31062042, ver=2]

A few lines later there is a repeat of the initial exception related message:

 

2018-10-04 08:39:03,142 [6] INFO  MutableCacheComputeServer. Stopping WAL iteration due to an exception: Failed to read WAL record at position: 31190597, ptr=FileWALPointer [idx=33, fileOff=31190597, len=0]

Then this line appears:

 

2018-10-04 08:39:03,813 [6] INFO  MutableCacheComputeServer. Finished applying WAL changes [updatesApplied=146, time=1272ms]

The node appears to initialize correctly after this logging.

 

Does anyone else see this behaviour? Is it something to be concerned about or normal logging to be expected after termination of running server instances?

 

The

Thanks,

Raymond.