I'm in the process of creating a distributed cache that will load a few million records from a database that is under significant load. The purpose of the cache is to allow complex queries to be run over historic data without further degrading the database performance.
So far I've created an ignite process that loads all the records from the database on startup. When I run a second instance of this process, it loads all the data from the DB again and the caches correctly don't allow the duplicate keys to be stored.
The problem I'm facing, is that when this is released to production environment, all node processes will be started simultaneously and they will all execute this large query to initialise the cache at the same time.
Can you advise me on whether there is a 'correct' way to limit which processes will perform the initial cache load?
I've considered a few options but they all feel like hacks.
I think using cluster-singleton is a good solution.
You may also wish to store a flag in some distributed cache stating "Loading Is Finished". This way if a server dies before it finishes the loading, another server will still complete the loading process even if the cache.size() is greater than 0.
I am aware that other users took advantage of the Ignite's Distributed CountDownLatch for this purpose. Whenever the loading was taking place, all cluster members were waiting on the count-down-latch. When the loading is complete, the count-down happens, and all cluster members are able to proceed.