Effective way to pre-load data around 10 TB

classic Classic list List threaded Threaded
4 messages Options
Naveen Naveen
Reply | Threaded
Open this post in threaded view
|

Effective way to pre-load data around 10 TB

HI

We are using Ignite 2.6.

AS we already know, after the cluster restart, every GET call gets data from
DISK for the first time and loads into RAM and subsequent calls data will
read from RAM only..
First time GET calls are 10 times slower than read from RAM, which we wanted
to avoid by pre-loading the entire data into RAM after the cluster restart.

So here am exploring efficient ways to read entire data once so that it will
pre-load the data into RAM, so GET calls from client will be much faster.

Running ScanQuery on all the partitions of the cache would be good way to
read data very fast in very less time ? OR any other better ways of
achieving the same


Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Effective way to pre-load data around 10 TB

Hi,

 

Currently the best option is IgniteCache::preloadPartition method added in

https://issues.apache.org/jira/browse/IGNITE-8873.

 

There is a JIRA ticket to allow pre-loading data before the node joins the cluster:

https://issues.apache.org/jira/browse/IGNITE-10152.

 

Stan

 

From: [hidden email]
Sent: 29 ноября 2018 г. 12:39
To: [hidden email]
Subject: Effective way to pre-load data around 10 TB

 

HI

 

We are using Ignite 2.6.

 

AS we already know, after the cluster restart, every GET call gets data from

DISK for the first time and loads into RAM and subsequent calls data will

read from RAM only..

First time GET calls are 10 times slower than read from RAM, which we wanted

to avoid by pre-loading the entire data into RAM after the cluster restart.

 

So here am exploring efficient ways to read entire data once so that it will

pre-load the data into RAM, so GET calls from client will be much faster.

 

Running ScanQuery on all the partitions of the cache would be good way to

read data very fast in very less time ? OR any other better ways of

achieving the same

 

 

Thanks

Naveen

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

Naveen Naveen
Reply | Threaded
Open this post in threaded view
|

RE: Effective way to pre-load data around 10 TB

Thanks Stan, this may take little longer time to implement, we are in hurry
to build this functionality of preloading the data.

Can someone correct me how to improve this pre-load process.

This is how we are preloading.

1. Send an Async request for all the partitions with the below code, below
loop will get repeated for all the caches we have

                        for (int i = 0; i < affinity.partitions(); i++) {
                                List<String> cacheList = Arrays.asList(cacheName);
                                affinityRunAsync= compute.affinityRunAsync(cacheList, i, new
DataPreloadTask(cacheList, i));
       
                        }
                       
2. Inside DataPreloadTask which is running on the Ignite node.
I just execute scan query for the given partition and iterate thru the
cursor. not doing anything else.


                IgniteCache<Object, Object> igniteCache = localIgnite.cache(cacheName);
                try (QueryCursor<Cache.Entry&lt;K, V>> cursor = igniteCache.query(new
ScanQuery().setPartition(partitionNo))) {
                       
                        for (Cache.Entry<K, V> entry : cursor) {
                                }
                               
                        }
                }

However, this seems to be quite slow. Taking more than 3 hours to read one
cache which has 400 M records. We have 30 such caches to load data, so not
fining this so efficient.

Can we improve this, we do have very powerful machines with 128 CPU, 2 TB
RAM, HDD, our CPU utilization is also not so high when we are preloading the
data.
Changing thread pool size will have any impact this read ???

Thanks
Naveen



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Effective way to pre-load data around 10 TB

The problem might be in HDD not performing fast enough, and also suffering from random reads

(IgniteCache::preloadPartition at least tries to read sequentially).

 

Also, do you have enough RAM to store all data? If not, you shouldn’t preload all the data, just the amount that fits into RAM.

 

Anyway, I think that your best chance is to implement the same thing https://issues.apache.org/jira/browse/IGNITE-8873 does.

E.g. you can try to backport the commit on top of 2.6.

 

Stan

 

From: [hidden email]
Sent: 5 декабря 2018 г. 7:59
To: [hidden email]
Subject: RE: Effective way to pre-load data around 10 TB

 

Thanks Stan, this may take little longer time to implement, we are in hurry

to build this functionality of preloading the data.

 

Can someone correct me how to improve this pre-load process.

 

This is how we are preloading.

 

1. Send an Async request for all the partitions with the below code, below

loop will get repeated for all the caches we have

 

                                                for (int i = 0; i < affinity.partitions(); i++) {

                                                                List<String> cacheList = Arrays.asList(cacheName);

                                                                affinityRunAsync= compute.affinityRunAsync(cacheList, i, new

DataPreloadTask(cacheList, i));

               

                                                }

                                               

2. Inside DataPreloadTask which is running on the Ignite node.

I just execute scan query for the given partition and iterate thru the

cursor. not doing anything else.

 

 

                                IgniteCache<Object, Object> igniteCache = localIgnite.cache(cacheName);

                                try (QueryCursor<Cache.Entry&lt;K, V>> cursor = igniteCache.query(new

ScanQuery().setPartition(partitionNo))) {

                                               

                                                for (Cache.Entry<K, V> entry : cursor) {

                                                                }

                                                               

                                                }

                                }

 

However, this seems to be quite slow. Taking more than 3 hours to read one

cache which has 400 M records. We have 30 such caches to load data, so not

fining this so efficient.

 

Can we improve this, we do have very powerful machines with 128 CPU, 2 TB

RAM, HDD, our CPU utilization is also not so high when we are preloading the

data.

Changing thread pool size will have any impact this read ???

 

Thanks

Naveen

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/