read-though tutorial for a big table

classic Classic list List threaded Threaded
12 messages Options
vtchernyi vtchernyi
Reply | Threaded
Open this post in threaded view
|

read-though tutorial for a big table

Hi Igniters,

My question is about well done tutorial. Recently on the devlist there was topic "Read load balancing, read-though, ttl and optimistic serializable transactions". It says ignite cache sitting on the top of RDBMS is the most often use case. I tried to implement read-though for a big table over 1100 milion rows just from scratch, the result was poor. Little model example works fine, but moving to production is not simple. It seems I should aviod some pitfalls.

Do we have tutorial to guide newbie like me?

Vladimir Tchernyi
Magnit Retail network
--

aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
vtchernyi vtchernyi
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 

https://github.com/dmagda/MicroServicesExample

It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hello Vladimir,

Just to clarify, are you suggesting to create a tutorial for data loading scenarios when data resides in an external database? 

-
Denis


On Tue, Mar 10, 2020 at 11:41 PM <[hidden email]> wrote:
Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 


It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты
vtchernyi vtchernyi
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hello Denis,

That is possible, my writing activities should be continued. The only question is to get my local project to production, there is no sense in writing another model example. So I hope there will be a progress in the nearest future

Vladimir

2:25, 12 марта 2020 г., Denis Magda <[hidden email]>:
Hello Vladimir,

Just to clarify, are you suggesting to create a tutorial for data loading scenarios when data resides in an external database? 

-
Denis


On Tue, Mar 10, 2020 at 11:41 PM <[hidden email]> wrote:
Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 


It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты


--
Отправлено из мобильного приложения Яндекс.Почты
vtchernyi vtchernyi
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hi Denis,

Some progress had happened and I have some material to share with the community. I think it will be interesting to newbies. It is about loading big tables from rdbms and creating cache entries based on table info. This approach was tested in production and showed good timing being paired with MSSQL, tables from tens to hundreds million rows.

The loading jar process:
* starts Ignite client node;
* creates user POJO according to business logic;
* converts POJOs to BinaryObjects;
* uses affinity function and creates separate key-value HashMap for every cache partition;
* uses ComputeTaskAdaper/ComputeJobAdaper to place hashMaps on corresponding data node.

I would like to publish some tutorial, say on GridGain website in english and russian version on habr.com.

WDYT?

чт, 12 мар. 2020 г. в 08:25, <[hidden email]>:
Hello Denis,

That is possible, my writing activities should be continued. The only question is to get my local project to production, there is no sense in writing another model example. So I hope there will be a progress in the nearest future

Vladimir

2:25, 12 марта 2020 г., Denis Magda <[hidden email]>:
Hello Vladimir,

Just to clarify, are you suggesting to create a tutorial for data loading scenarios when data resides in an external database? 

-
Denis


On Tue, Mar 10, 2020 at 11:41 PM <[hidden email]> wrote:
Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 


It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты


--
Отправлено из мобильного приложения Яндекс.Почты
Alex Panchenko Alex Panchenko
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hello Vladimir,

I'm building the high-load service to handle intensive read-write operations
using Apache Ignite. I need exactly the same - "loading big tables from
rdbms (Postgres) and creating cache entries based on table info".

Could you, please, share your experience and materials you mentioned in this
thread. I'd be much appreciated. I think it'd help me and others Ignite
users

BTW
"This approach was tested in production and showed good timing being paired
with MSSQL, tables from tens to hundreds million rows."
Is it possible to see some results of testing or/and performance metrics
before and after using Ignite



Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
vtchernyi vtchernyi
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hi Alex,

There is an NDA covering my work, so direct sharing is not a variant. I see tutorial post of that kind will be actual, so I should start working. Please wait some time. Right now I do have nothing to share.

About production - the first thing I faced with was turtle-slow inserting values in cache. I did some efforts and now sql queries take longer than cache inserts, but my work got in production only after it became fast. That was a must. So I have no "before and after" state, only "after" one.

Vladimir

12:52, 22 июня 2020 г., Alex Panchenko <[hidden email]>:

Hello Vladimir,

I'm building the high-load service to handle intensive read-write operations
using Apache Ignite. I need exactly the same - "loading big tables from
rdbms (Postgres) and creating cache entries based on table info".

Could you, please, share your experience and materials you mentioned in this
thread. I'd be much appreciated. I think it'd help me and others Ignite
users

BTW
"This approach was tested in production and showed good timing being paired
with MSSQL, tables from tens to hundreds million rows."
Is it possible to see some results of testing or/and performance metrics
before and after using Ignite



Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

In reply to this post by vtchernyi
Hello Vladimir, 

Sounds interesting, thanks for reaching out. Let me introduce you to [hidden email] who can help with the publication process.

-
Denis


On Sun, Jun 21, 2020 at 10:31 PM Vladimir Tchernyi <[hidden email]> wrote:
Hi Denis,

Some progress had happened and I have some material to share with the community. I think it will be interesting to newbies. It is about loading big tables from rdbms and creating cache entries based on table info. This approach was tested in production and showed good timing being paired with MSSQL, tables from tens to hundreds million rows.

The loading jar process:
* starts Ignite client node;
* creates user POJO according to business logic;
* converts POJOs to BinaryObjects;
* uses affinity function and creates separate key-value HashMap for every cache partition;
* uses ComputeTaskAdaper/ComputeJobAdaper to place hashMaps on corresponding data node.

I would like to publish some tutorial, say on GridGain website in english and russian version on habr.com.

WDYT?

чт, 12 мар. 2020 г. в 08:25, <[hidden email]>:
Hello Denis,

That is possible, my writing activities should be continued. The only question is to get my local project to production, there is no sense in writing another model example. So I hope there will be a progress in the nearest future

Vladimir

2:25, 12 марта 2020 г., Denis Magda <[hidden email]>:
Hello Vladimir,

Just to clarify, are you suggesting to create a tutorial for data loading scenarios when data resides in an external database? 

-
Denis


On Tue, Mar 10, 2020 at 11:41 PM <[hidden email]> wrote:
Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 


It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты


--
Отправлено из мобильного приложения Яндекс.Почты
Kseniya Romanova Kseniya Romanova
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

Hi Vladimir! I will be absolutely happy to help. 
Let's discuss in telegram. 

ср, 24 июн. 2020 г. в 02:31, Denis Magda <[hidden email]>:
Hello Vladimir, 

Sounds interesting, thanks for reaching out. Let me introduce you to [hidden email] who can help with the publication process.

-
Denis


On Sun, Jun 21, 2020 at 10:31 PM Vladimir Tchernyi <[hidden email]> wrote:
Hi Denis,

Some progress had happened and I have some material to share with the community. I think it will be interesting to newbies. It is about loading big tables from rdbms and creating cache entries based on table info. This approach was tested in production and showed good timing being paired with MSSQL, tables from tens to hundreds million rows.

The loading jar process:
* starts Ignite client node;
* creates user POJO according to business logic;
* converts POJOs to BinaryObjects;
* uses affinity function and creates separate key-value HashMap for every cache partition;
* uses ComputeTaskAdaper/ComputeJobAdaper to place hashMaps on corresponding data node.

I would like to publish some tutorial, say on GridGain website in english and russian version on habr.com.

WDYT?

чт, 12 мар. 2020 г. в 08:25, <[hidden email]>:
Hello Denis,

That is possible, my writing activities should be continued. The only question is to get my local project to production, there is no sense in writing another model example. So I hope there will be a progress in the nearest future

Vladimir

2:25, 12 марта 2020 г., Denis Magda <[hidden email]>:
Hello Vladimir,

Just to clarify, are you suggesting to create a tutorial for data loading scenarios when data resides in an external database? 

-
Denis


On Tue, Mar 10, 2020 at 11:41 PM <[hidden email]> wrote:
Andrei, Evgenii, thanks for answer.

Aa far as I see, there is no ready to use tutorial. I managed to do multi-threaded cache load procedure, out-of-the-box loadCache method is extremely slow.

I spent about a month studying write-through topics, and finally got the same as "capacity planning" says: 0.8Gb mssql table on disk expands to 2.3Gb, size in ram is 2.875 times bigger.

Is it beneficial to use BinaryObject instead of user pojo? If yes, how to create BinaryObject without pojo definition and deserialize it back to pojo?
It would be great to have kind of advanced github example like this 


It helped a lot in understanding. Current documentation links do not help to build a real solution, they are mostly like a reference, with no option to compile and debug

Vladimir

2:51, 11 марта 2020 г., Evgenii Zhuravlev <[hidden email]>:
When you're saying that the result was poor, do you mean that data preloading took too much time, or it's just about get operations?

Evgenii

вт, 10 мар. 2020 г. в 03:29, aealexsandrov <[hidden email]>:
Hi,

You can read the documentation articles:

https://apacheignite.readme.io/docs/3rd-party-store

In case if you are going to load the cache from 3-rd party store (RDBMS)
then the default implementation of CacheJdbcPojoStore can take a lot of time
for loading the data because it used JDBC connection inside (not pull of
these connections).

Probably you should implement your own version of CacheStore that will read
data from RDBMS in several threads, e.g using the JDBC connection pull
there. Sources are open for you, so you can copy the existed implementation
and modify it:

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/store/jdbc/CacheJdbcPojoStore.java

Otherwise, you can do the initial data loading using some streaming tools:

1)Spark integration with Ignite -
https://apacheignite-fs.readme.io/docs/ignite-data-frame
2)Kafka integration with Ignite -
https://apacheignite-mix.readme.io/docs/kafka-streamer

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
Отправлено из мобильного приложения Яндекс.Почты


--
Отправлено из мобильного приложения Яндекс.Почты
Alex Panchenko Alex Panchenko
Reply | Threaded
Open this post in threaded view
|

Re: read-though tutorial for a big table

In reply to this post by vtchernyi
Hello Vladimir,

are there some key things you can share with us? Some checklist with the
most important configuration params or things we need to review/check?
anything would be helpful

I've been playing with Ignite for the last few months, performance still
low.
I have to decide whether to switch from Ignite to some another solution or
improve the performance ASAP.  

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/