Replicated cache initialization

classic Classic list List threaded Threaded
4 messages Options
diopek diopek
Reply | Threaded
Open this post in threaded view
|

Replicated cache initialization

This post has NOT been accepted by the mailing list yet.
In my use case, I need to replicate certain reference data in all cluster nodes.
I am assuming that among all these cluster nodes, only one node needs to initialize this reference data (using some REST service) during starting up.

What is the best practice to coordinate such cache population task among the cluster nodes?

During node start-up, for example, within Spring InitializingBean::afterPropertiesSet(...) method, if I add a check as the following,  

if ( cluster.localNode() == cluster.forOldest().node() )
{
      populateCache(...);
      broadCastEvent(XYZ_CACHE_POPULATED);
}
and also other nodes, should listen some custom broadcast event that XYZ_CACHE_POPULATED.
I am also considering distributing further tasks from the same leader node (oldest)
 
Please let me know if any other suggestion and/or any sample code that I can leverage.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Replicated cache initialization

Hi,

There are two effective ways to load large amount of data on cache initialization: via IgniteDataStreamer and via CacheStore. In both cases the process will be coordinated automatically, so you don't have to register any custom listeners or generate custom events. Please refer to this documentation page: https://apacheignite.readme.io/docs/data-loading

Let us know if you have more questions.

-Val
diopek diopek
Reply | Threaded
Open this post in threaded view
|

Re: Replicated cache initialization

This post has NOT been accepted by the mailing list yet.
One point I need to clarify, my cluster nodes will have the same code base.
Each node will have same replica of reference data  but in addition to that will have their own partition of positions data. basically each node will do similar computation using same reference data on different positions.  When I bring up all nodes (let' say 20 nodes), since all nodes will have the same initializing bean, so using IgniteStream or CacheStore on each node can automatically avoid pounding the same database table or REST URL to load the same data into their local cache?
Thanks,
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Replicated cache initialization

diopek wrote
One point I need to clarify, my cluster nodes will have the same code base.
Each node will have same replica of reference data  but in addition to that will have their own partition of positions data. basically each node will do similar computation using same reference data on different positions.  When I bring up all nodes (let' say 20 nodes), since all nodes will have the same initializing bean, so using IgniteStream or CacheStore on each node can automatically avoid pounding the same database table or REST URL to load the same data into their local cache?
Thanks,
If you want to make sure that "populateCache(...)" is only called from one place (in which case you should use IgniteStreamer approach), then I would recommend deploying this logic as a singleton service in the service grid. More on singleton services and leader election here:

https://apacheignite.readme.io/docs/cluster-singletons
https://apacheignite.readme.io/docs/leader-election

If you are using the the CacheStore approach, then it will be loading in parallel from the underlying store. In this case you are right, each node will concurrently load the same data from database, but will only accept the data it is responsible for and will automatically ignore the rest.