WAL and WAL Archive volume size recommendation

classic Classic list List threaded Threaded
5 messages Options
facundo.maldonado facundo.maldonado
Reply | Threaded
Open this post in threaded view
|

WAL and WAL Archive volume size recommendation

Hi everyone, I'm running a POC on a small deployment in a kubernetes environment and after a few minutes of load testing, the data node fails with this message:

ss o.a.i.i.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/opt/work/wal/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000005.wal, dstFile=/opt/work/walarchive/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000065.wal.tmp]]]
org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to archive WAL segment [srcFile=/opt/work/wal/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000005.wal, dstFile=/opt/work/walarchive/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000065.wal.tmp]
.....
Caused by: java.nio.file.FileSystemException: /opt/work/wal/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000005.wal -> /opt/work/walarchive/node00-ef1e49d3-1c67-4527-9a24-bae580a5ed91/0000000000000065.wal.tmp: No space left on device


I have one data node, with a cache, persistence enabled and I have 3 PVC one for each of storage, WALand WALarchive.
I load data from a kafka topic using a Kafka Streamer running in a different pod.
Incoming load (at the topic) is about 5K records per second.
Average record size is 1.8 Kb.

Data region is configured with a maxSize of 5 Gb
Storage volumen with 10 GB
Wal volumen with 2 GB
Wal archive with 2 GB. (also tried 3 and 4)

The rest of the settings  (page size, wal segment size, etc) are with default values.
Ignite version is 2.9.0.

My question is, Is there some recommendation on the size these volumes should have respective on the storage size, record size or some other factor?
Maybe wal segment? If I increase the wal segment from 64Mb (default size) to lets say 512 Mb, How much should I increase WAL and WAL archive volumes?

Thanks,
--
Facundo Maldonado
facundo.maldonado facundo.maldonado
Reply | Threaded
Open this post in threaded view
|

Re: WAL and WAL Archive volume size recommendation

Well, I found some useful numbers between two pages in the documentation.

"By default, there are 10 active segments."  wal ref
<https://ignite.apache.org/docs/latest/persistence/native-persistence#write-ahead-log>  

"The number of segments kept in the archive is such that the total size of
all segments does not exceed the specified size of the WAL archive.
By default, the maximum size of the WAL archive (total space it occupies on
disk) is defined as 4 times the size of the checkpointing buffer."
wal-archive ref
<https://ignite.apache.org/docs/latest/persistence/native-persistence#wal-archive>  

"The default buffer size is calculated as a function of the data region
size:

Data Region Size       Default Checkpointing Buffer Size
< 1 GB                             MIN (256 MB, Data_Region_Size)
between 1 GB and 8 GB     Data_Region_Size / 4
> 8 GB                             2 GB"   checkpoint buffer size
> <https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size>  

So, if i have:
data region max size: 5Gb
storage vol size: 10Gi
I can set:
WAL vol size: 1Gb  # WAL size is 10 * wal segment 64Mb
WAL archive vol size: 5Gi
# 4 times checkpoint size
# region < 8Gb, checkpoint size is region/4 --> wal archive size is equals
to region size
# region > 8Gb, checkpoint is 2 Gb --> wal archive is at least 4*2Gb == 8GB

With those settings, I can keep the test running some more time but the pod
keeps crashing.
At least, it seems that I'm not getting the same error as before.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: WAL and WAL Archive volume size recommendation

Hello Facundo, 

Just go ahead and disable the WAL archives. You need the archives for the point-in-time-recovery feature that is supported by GridGain. I'll check with the community why we have the archives enabled by default in a separate discussion.

-
Denis


On Thu, Nov 5, 2020 at 11:37 AM facundo.maldonado <[hidden email]> wrote:
Well, I found some useful numbers between two pages in the documentation.

"By default, there are 10 active segments."  wal ref
<https://ignite.apache.org/docs/latest/persistence/native-persistence#write-ahead-log

"The number of segments kept in the archive is such that the total size of
all segments does not exceed the specified size of the WAL archive.
By default, the maximum size of the WAL archive (total space it occupies on
disk) is defined as 4 times the size of the checkpointing buffer."
wal-archive ref
<https://ignite.apache.org/docs/latest/persistence/native-persistence#wal-archive

"The default buffer size is calculated as a function of the data region
size:

Data Region Size               Default Checkpointing Buffer Size
< 1 GB                             MIN (256 MB, Data_Region_Size)
between 1 GB and 8 GB     Data_Region_Size / 4
> 8 GB                             2 GB"   checkpoint buffer size
> <https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size

So, if i have:
data region max size: 5Gb
storage vol size: 10Gi
I can set:
WAL vol size: 1Gb  # WAL size is 10 * wal segment 64Mb
WAL archive vol size: 5Gi
# 4 times checkpoint size
# region < 8Gb, checkpoint size is region/4 --> wal archive size is equals
to region size
# region > 8Gb, checkpoint is 2 Gb --> wal archive is at least 4*2Gb == 8GB

With those settings, I can keep the test running some more time but the pod
keeps crashing.
At least, it seems that I'm not getting the same error as before.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: WAL and WAL Archive volume size recommendation

Dennis

"The WAL archive is used to store WAL segments that may be needed to recover the node after a crash. The number of segments kept in the archive is such that the total size of all segments does not exceed the specified size of the WAL archive"

Given the above in documentation, if we disable WAL-Archive as mentioned in the docs, will we have trouble recovering the data in work folder on the node reboot?  

regards
mahesh
facundo.maldonado facundo.maldonado
Reply | Threaded
Open this post in threaded view
|

Re: WAL and WAL Archive volume size recommendation

In reply to this post by dmagda
Ok, will do that.

It's not clear at least for me why.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/