Ignite Performance Issues when seeding data from Spark

classic Classic list List threaded Threaded
4 messages Options
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Ignite Performance Issues when seeding data from Spark

I'm trying to seed about 500 million rows from a Spark DataFrame into a clean
Ignite database, but running into serious performance issues once Ignite
runs out of durable memory. I'm running 4 Ignite Nodes on Kubernetes cluster
backed by AWS i3.2xl instances (8 CPUs per node, 60 GB Memory, 2TB SSD Local
Storage). My configuration parameters per node are as follows:

- 40GB of Available Memory
- 20GB of Durable Memory
- 8GB on the Java Heap

While the nodes still have durable memory, they're writing to the cache at
the rate of about 35k-40k a second. I'm expecting to take a performance hit
once I run out of Durable Memory, but as soon as peristence kicks in, write
performance spikes to about 15k/sec and steadily decreases over time, while
the number of writes to disk steadily increases. My CPU usage also starts to
slow down until Ignite is writing less than 5k a second and the CPU drops to
less than a half core. I haven't let the performance continue to degrade to
see what happens, but it shows no sign of beginning to stablize.

I've tried all of Ignite's performance suggestions, including

- breaking off the WAL and Storage into different disks
- using local storage instead of EBS
- checkpoint throttling
- write throttling
- adjusting the checkpoint page buffer size
- adjusting the number of threads
- adjusting swappiness

In the end, I get the same results no matter what I do. Is write performance
really this bad with persistence, and if not, what kind of performance
should I be expecting and what can I do to improve it?

Is there an alternative way to seed data that doesn't rely on the
DataStreamers?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Performance Issues when seeding data from Spark

Hello!

This is somewhat expected since memory grids are very fast. Disk-based database can't match memory grid's in-memory performance numbers.

Still, it should not degrade endlessly. What is the version that you are on? I guess you may see improvements in the coming Apache Ignite 2.7.

DataStreamer should be very fast when used properly so there's no much point of seeking an alternative.

Please also note that your cloud may start throttling IOPS once you start endless writes. They allow you a burst period and then scale throughput down. The solution here is to use more local disk-oriented instance and/or use dedicated instance. I'm unsure if it applies to the one you're using here.

Regards,
--
Ilya Kasnacheev


пн, 3 дек. 2018 г. в 01:55, kellan <[hidden email]>:
I'm trying to seed about 500 million rows from a Spark DataFrame into a clean
Ignite database, but running into serious performance issues once Ignite
runs out of durable memory. I'm running 4 Ignite Nodes on Kubernetes cluster
backed by AWS i3.2xl instances (8 CPUs per node, 60 GB Memory, 2TB SSD Local
Storage). My configuration parameters per node are as follows:

- 40GB of Available Memory
- 20GB of Durable Memory
- 8GB on the Java Heap

While the nodes still have durable memory, they're writing to the cache at
the rate of about 35k-40k a second. I'm expecting to take a performance hit
once I run out of Durable Memory, but as soon as peristence kicks in, write
performance spikes to about 15k/sec and steadily decreases over time, while
the number of writes to disk steadily increases. My CPU usage also starts to
slow down until Ignite is writing less than 5k a second and the CPU drops to
less than a half core. I haven't let the performance continue to degrade to
see what happens, but it shows no sign of beginning to stablize.

I've tried all of Ignite's performance suggestions, including

- breaking off the WAL and Storage into different disks
- using local storage instead of EBS
- checkpoint throttling
- write throttling
- adjusting the checkpoint page buffer size
- adjusting the number of threads
- adjusting swappiness

In the end, I get the same results no matter what I do. Is write performance
really this bad with persistence, and if not, what kind of performance
should I be expecting and what can I do to improve it?

Is there an alternative way to seed data that doesn't rely on the
DataStreamers?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Performance Issues when seeding data from Spark

I'm Using 2.6 on AWS. Like I mentioned, my Ignite cluster is running on i3
instances which have local storage, so burst shouldn't be a problem.

The trend I've noticed is that my writes-per-second increases, while the
size of each write decreases, and the number of PUT operations per second
and CPU usage also decreases.

I don't know if this is applicable here, but while running Kafka I ran into
problems like this when I didn't have enough memory dedicated to page cache,
but I don't know if this should be a consideration with Ignite. I'm
following Ignite's performance guidelines and dedicating less than 70% of my
available memory to Ignite Durable Memory and the Java Heap.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Performance Issues when seeding data from Spark

Hello!

Let's talk about persistence performance.

First of all, you have to understand how checkpointing works. It works by writing a whole 4k page if you happen to add or update an object to it. This means when you have a lot of updates there will be a lot of writes.
When you have a write-heavy process, you can often mitigate this by having large checkpointPageBufferSize and rare checkpoints:
You can have checkpointPageBufferSize as large as 1/3~1/2 of your durable memory size at the expense of durable memory. So you can decrease your data region, increase your checkpointing page buffer. You should also set your checkpoint frequency to something like 300000 (5 minutes):

When checkpointing is rare there is less chance than the same page will be written to disk multiple times.

I hope that you have already set WALMode to LOG_ONLY, did you? Moreover, you can try and disableWal() before populating your database and enableWal() after data ingestion is complete. This is not fit for steady state but when you have burst of writes it works nicely:

Regards,
--
Ilya Kasnacheev


пн, 3 дек. 2018 г. в 22:58, kellan <[hidden email]>:
I'm Using 2.6 on AWS. Like I mentioned, my Ignite cluster is running on i3
instances which have local storage, so burst shouldn't be a problem.

The trend I've noticed is that my writes-per-second increases, while the
size of each write decreases, and the number of PUT operations per second
and CPU usage also decreases.

I don't know if this is applicable here, but while running Kafka I ran into
problems like this when I didn't have enough memory dedicated to page cache,
but I don't know if this should be a consideration with Ignite. I'm
following Ignite's performance guidelines and dedicating less than 70% of my
available memory to Ignite Durable Memory and the Java Heap.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/