Cache preloading

classic Classic list List threaded Threaded
3 messages Options
avk avk
Reply | Threaded
Open this post in threaded view
|

Cache preloading

I'm looking for some expert advice/best practices with regard to cache preloading from the cache store. Specifically:
1) who triggers cache preload?
    decentralized approach -- each node individually, or
    centralized -- a specific node (for example, the most senior node), or...?
2) when should a preload be triggered?
    as node join the grid (relevant for the decentralized approach), or
    after the topology has been stable for a certain amount of time, or
    manually via a admin console, or...?
3) once preload has completed, how to ensure that I'm not missing any data?

For example, for partitioned caches, the cache loader will only accept data for the partitions for which it's primary and discard the rest. In this scenario, what is the correct way of handling node crashes during a preload? Specifically, would the partition-to-node assignment change if a node crashes while preload is in progress? Also, how can one ensure that the node gets assigned the same partitions after a restart? What about the situation when the crashed node is restarted on a different physical box?

Finally, do replicated caches also preload only the node's primary partitions, just as the partitioned caches do? Or would running a preload on any single node be sufficient?

Thanks
Andrey
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Cache preloading

Andrey,

Loading from cache store is usually used only for initial data load, so it's safe to do it only after nodes started and before any other cache operations (puts, transactions, etc.) are performed. Topology should be also stable - if a node joins or fails during the process, most likely you will have to restart it to make sure data is consistent. It works this way because loading from store is supposed to run only once. After it's completed, you can safely change topology - rebalancing will guarantee data consistency even if it happens concurrently with cache updates.
If this doesn't work for you, I would recommend to use IgniteDataStreamer instead and load data from a designated client node. This approach is more centralized and allows to load the data concurrently with updates and topology changes.

To trigger preloading from store simply execute IgniteCache.loadCache() method, it will initiate the process on all nodes.

The process itself is the same for partitioned and replicated caches.d When loading data, a node stores entries that are primary OR backup for this node. Since replicated cache always has as many backups as many nodes in topology, nothing is discarded, while with partitioned cache a node will save only part of the data.

Makes sense?

-Val
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Cache preloading



On Mon, Aug 31, 2015 at 5:56 PM, vkulichenko <[hidden email]> wrote:
Andrey,

Loading from cache store is usually used only for initial data load, so it's
safe to do it only after nodes started and before any other cache operations
(puts, transactions, etc.) are performed. Topology should be also stable -
if a node joins or fails during the process, most likely you will have to
restart it to make sure data is consistent. It works this way because
loading from store is supposed to run only once. After it's completed, you
can safely change topology - rebalancing will guarantee data consistency
even if it happens concurrently with cache updates.
If this doesn't work for you, I would recommend to use IgniteDataStreamer
instead and load data from a designated client node. This approach is more
centralized and allows to load the data concurrently with updates and
topology changes.

Valya, it would be great if you added a blurb in the documentation about this. Perhaps a Note or a Tip paragraph.
 

To trigger preloading from store simply execute IgniteCache.loadCache()
method, it will initiate the process on all nodes.

The process itself is the same for partitioned and replicated caches.d When
loading data, a node stores entries that are primary OR backup for this
node. Since replicated cache always has as many backups as many nodes in
topology, nothing is discarded, while with partitioned cache a node will
save only part of the data.

Makes sense?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cache-preloading-tp1206p1231.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.