We are trying to load and transform a large amount of data using the IgniteDataStreamer using a custom StreamReceiver. We'd like this to run a lot faster, and we cannot find anything that is close to saturated, except the data-streamer threads, queues. This is 2.5, with Ignite persistence, and enough memory to fit all the data.
I was looking to turn some knob, like the size of a thread pool, to increase the throughput, but I can't find any bottleneck. If I turn up the demand, the throughput does not increase, and the per transaction latency increases. This would indicate a bottleneck somewhere.
The application has loaded about 900 million records of type A at this point, and now we would like to load 2.5B records of type B. Records of type A have a key and a unique ID. Records of type B have a different key type, plus a foreign field that is A's unique ID. The key we use in ignite for record B is (B's key, A's key as affinity key). We also maintain caches to map A's ID back to its key, and something similar for B.
For each record the stream receiver starts a pessimistic transaction, we will end up with 1 local gets and 2-3 gets with no affinity (i.e. 50% local on two nodes), and 2-4 puts, before we commit the transaction. (FULL_SYNC caches). There are a several fields with indices.
I've simplified this down to two nodes, with 4 cache caches each with one backup, all with WAL LOGGING disabled. The two nodes have 256GB of memory and 32 CPUs and local SSDs that are unmirrored (i3.8xlarge on AWS). The network is supposed to be 10 Gb. The dataset is basically in memory, and with the WAL disabled there is very little I/O.
The WAL logging disable only pushed the transaction rate from about 1750 to about 2000 TPS.
The CPU doesn't get above 20%, the network bandwidth is only about 6MB/s from each node and only about 1500 packets per second per node. The read wait time on the SSDs is only enough to lock up a single thread, and there are no writes except during checkpoints.
When I look at thread dumps, there is no obvious bottleneck except for the Datastreamer threads. Doubling the number of DataStreamer threads from current 64 to 128 has no effect on throughput.
Looking via MXbeans, where I have a fix for IGNITE-7616, the DataStreamer pool is saturated. The "Striped Executor" is not. With the WAL enabled, the "StripedExecutor" shows some bursty load, when disabled the active threads are queue low. The work is distributed across the StripedExecutor threads. The nonDataStreamer thread pools all frequently go to 0 active threads, while the DataStreamer pool stays backed up.
With the WAL on with 64 DataStreamer threads, there tended to be able 53 "Owner transactions" on the node.
A snapshot of transactions outstanding follows.
Is there another place to look? The DS threads tend top be waiting on futures, and the other threads are consistent with the relatively
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more Click Here.
It looks like the most of the time transactions in receiver are waiting for
locks. Any lock adds serialization for parallel code. And in your case I
don't think it's possible to tune throughput with settings, because ten
transactions could wait when one finish. You need to change algorithm.
The most effective way would be to stream data with DataStreamer with
disabled allowOverride and without any transactions. You need to stream data
independently if it's possible, avoid serial code and non-local cache