Increase the indexing speed while loading the cache from an RDBMS

classic Classic list List threaded Threaded
6 messages Options
p.srikanta p.srikanta
Reply | Threaded
Open this post in threaded view
|

Increase the indexing speed while loading the cache from an RDBMS

Currently I'm using Apache Ignite v2.8.1 to preload a cache from the RDBMS. There are two tables with each 27M rows. The index is defined on a single column of type String in 1st table and Integer in the 2nd table. Together the total size of the two tables is around 120GB.

The preloading process (triggered using loadCacheAsync() from within a Java app) takes about 45hrs. The cache is persistence enabled and a common EBS volume (SSD) is being used for both the WAL and other locations. 

I'm unable to figure out the bottleneck for increasing the speed. 

Apart from defining a separate path for WAL and the persistence, is there any other way to load the cache faster (with indexing enabled) ?


Thanks,
Srikanta
Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|

Re: Increase the indexing speed while loading the cache from an RDBMS

Hi Srikanta,
 Have you tried to load data without indexto compare time?

I think Wal disabling for data load can save some hours for you:

Thanks,
Mike.

On Wed, Aug 26, 2020, 8:47 AM Srikanta Patanjali <[hidden email]> wrote:
Currently I'm using Apache Ignite v2.8.1 to preload a cache from the RDBMS. There are two tables with each 27M rows. The index is defined on a single column of type String in 1st table and Integer in the 2nd table. Together the total size of the two tables is around 120GB.

The preloading process (triggered using loadCacheAsync() from within a Java app) takes about 45hrs. The cache is persistence enabled and a common EBS volume (SSD) is being used for both the WAL and other locations. 

I'm unable to figure out the bottleneck for increasing the speed. 

Apart from defining a separate path for WAL and the persistence, is there any other way to load the cache faster (with indexing enabled) ?


Thanks,
Srikanta
akurbanov akurbanov
Reply | Threaded
Open this post in threaded view
|

Re: Increase the indexing speed while loading the cache from an RDBMS

In reply to this post by p.srikanta
Hi,

Which API are you using to load the entries and what is the node
configuration? I would recommend to share the configs and try utilizing the
data streamer.

https://apacheignite.readme.io/docs/data-streamers

I would recommend recording a JFR to find where the VM spends most time.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
mcherkasov mcherkasov
Reply | Threaded
Open this post in threaded view
|

Re: Increase the indexing speed while loading the cache from an RDBMS

btw, might be bottleneck is your RDBMS, it can just stream data slowly, slower then ignite can save, if make sense to check this version too.

On Thu, Aug 27, 2020 at 10:46 AM akurbanov <[hidden email]> wrote:
Hi,

Which API are you using to load the entries and what is the node
configuration? I would recommend to share the configs and try utilizing the
data streamer.

https://apacheignite.readme.io/docs/data-streamers

I would recommend recording a JFR to find where the VM spends most time.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Thanks,
Mikhail.
p.srikanta p.srikanta
Reply | Threaded
Open this post in threaded view
|

Re: Increase the indexing speed while loading the cache from an RDBMS

Thanks Anton & Mikhail for your responses.

I'll try to disable the WAL and test the cache preload performance. Also will execute a JFR to capture some info.

@Mikhail I've checked the RDBMS perf metrics. I do not see any spike in the CPU or memory usage. Is there anything in particular that could cause a bottleneck from the DB perspective ? The DB in use is a Postgres DB.

@Anton Is there a sample/best practice reference code for Using Datastreamers to directly query the DB to preload the cache ? My intention is to use Spring Data via IgniteRepositories. In order to use an Ignite repository, I would have to preload the cache. And now, to preload the cache, I would have to query the DB. A bit of a chicken and egg problem. 

Regards,
Srikanta

On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov <[hidden email]> wrote:
btw, might be bottleneck is your RDBMS, it can just stream data slowly, slower then ignite can save, if make sense to check this version too.

On Thu, Aug 27, 2020 at 10:46 AM akurbanov <[hidden email]> wrote:
Hi,

Which API are you using to load the entries and what is the node
configuration? I would recommend to share the configs and try utilizing the
data streamer.

https://apacheignite.readme.io/docs/data-streamers

I would recommend recording a JFR to find where the VM spends most time.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Thanks,
Mikhail.
Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|

Re: Increase the indexing speed while loading the cache from an RDBMS

>I've checked the RDBMS perf metrics. I do not see any spike in the CPU or memory usage. 
>Is there anything in particular that could cause a bottleneck from the DB perspective ? The DB in use is a Postgres DB.

if you use cache store, each node will read the full table you want to load, but only records which belong to a node will be saved(based on data affinity), so Ignite will do N parallel read of the same table, where N is number of nodes.
So if you have 50 nodes, then you should use data streamer, so one node will read data and stream them to the cluster.
However if you have relatively small amount of nodes, it's fine to use CacheStore to load data.

>Is there a sample/best practice reference code for Using Datastreamers to directly query the DB to preload the cache ?
> My intention is to use Spring Data via IgniteRepositories. In order to use an Ignite repository, I would have to preload the cache. 
>And now, to preload the cache, I would have to query the DB. A bit of a chicken and egg problem. 
I would say, use JDBC and Ignite Data Streamer, it's one time action that you don't need to repeat often, you can even java separate app to do this.

Thanks,
Mike.

чт, 27 авг. 2020 г. в 14:01, Srikanta Patanjali <[hidden email]>:
Thanks Anton & Mikhail for your responses.

I'll try to disable the WAL and test the cache preload performance. Also will execute a JFR to capture some info.

@Mikhail I've checked the RDBMS perf metrics. I do not see any spike in the CPU or memory usage. Is there anything in particular that could cause a bottleneck from the DB perspective ? The DB in use is a Postgres DB.

@Anton Is there a sample/best practice reference code for Using Datastreamers to directly query the DB to preload the cache ? My intention is to use Spring Data via IgniteRepositories. In order to use an Ignite repository, I would have to preload the cache. And now, to preload the cache, I would have to query the DB. A bit of a chicken and egg problem. 

Regards,
Srikanta

On Thu, Aug 27, 2020 at 7:54 PM Mikhail Cherkasov <[hidden email]> wrote:
btw, might be bottleneck is your RDBMS, it can just stream data slowly, slower then ignite can save, if make sense to check this version too.

On Thu, Aug 27, 2020 at 10:46 AM akurbanov <[hidden email]> wrote:
Hi,

Which API are you using to load the entries and what is the node
configuration? I would recommend to share the configs and try utilizing the
data streamer.

https://apacheignite.readme.io/docs/data-streamers

I would recommend recording a JFR to find where the VM spends most time.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Thanks,
Mikhail.