I deployed 5 node Ignite 2.9.0 on k8s with below configuration
Total RAM per instance 64 GB
JVM 32 GB
Default data region 12 GB
Persistence storage 500GB volume
WAL + WAL archive 30 GB volume
After this I started ingesting data to 3 tables created, the data ingestion
is using basic JDBC batch insertions.
After around 14 hours it generated around 100GB of persistence data on each
node in 3 tables (each having backup of 1).
But suddenly 2 PODs crashed and when I check the logs, *there was errors
which says no space left on the storage volume* configured for WAL+WAL
I'm not sure what exactly caused this issue, but i couldn't recover from
this POD crash on K8S as i cannot expand the volume attached to ignite PODS.
The only operation I did when pods crashed was select count(*) from table;
and there were around 21 crore records in that table.
Does WAL archive is needed ? how I can avoid these kind of issues, which end
up cluster in unusable state.
I tried disabling wal archive by setting wal archive and wal path both to
<property name="storagePath" value="/opt/ignite/persistence/"/>
<property name="walPath" value="/opt/ignite/wal/"/>
<property name="walArchivePath" value="/opt/ignite/wal/"/>
<property name="walMode" value="FSYNC"/>
But again I faced the same issue, it was running fine for 10 hours and at
this time i connected to visor to check number of records and the total
records were around 200 million records and after disconnecting from visor,
I connected to sql line and ran select count(*) from table; not sure which
one caused wal disk to fill completely.
I deployed this Ignite cluster with JAVA 11