I have been using affinity keys with a PARTITIONED cache and then use those
keys to send computations to the nodes that have the data which all works as
I wanted to test LOCAL mode for performance but I found no calculations are
now sent to the nodes.
Is that expected behavior?
How can I send calculations to nodes with a LOCAL cache so that those
calculations only work off the data in the local cache (which is the reason
i was using affinity to begin with, I only ever want my calculations to work
off data that is on that node).
If there is a way for a compute function to access only the keys on the node
without using affinity that would work for me too.
Actually maybe I do not understand what a LOCAL cache is.
I create a LOCAL cache in my client and I have 2 server nodes. I send a
compute function to each node to load some data into the cache. I assumed as
the cache is LOCAL that the nodes would load the data into their own local
However, when I send the affinity compute I notice those compute functions
are actually sent to my client application and not the server nodes as
happens in PARTITIONED mode.
But my client app is ClientMode=TRUE so it has no data. If i change it to
ClientMode=FALSE so that it participates in the data load then it is able to
retrieve the records it added to the cache but not the ones the servers
So can I assume there is no concept of a distributed LOCAL cache?
Actually the code is adding data to each nodes individual LOCAL cache, its
just the affinity is not working as expected, all affinity jobs are run on
the node which is invoking them (in my case the client node) rather than
where the actual data exists.
Cache with LOCAL mode is never distributed between nodes by design. Other
nodes doesn't even know that it exists, you can quickly test it by starting
two server nodes with simple code that creates LOCAL cache with same name on
each node, puts different values for the same key and gets this key in while
true — you will receive the values that were put locally no matter what was
executed on other node.
The concept of affinity is completely unapplicable in this case: you don't
need a function that maps keys to nodes as LOCAL cache exists only on a
Could you please clarify the use-case a bit, which API are you going to use?
Initially I would assume that a broadcast that utilizes local primary
partition scans or local sql queries over a PARTITIONED cache is what you
are looking for.
My use case is to match Spark dataframe functionality using only C# if
possible, without using Spark
Specifically we have CSV files we wish to load into the cache and then we
have compute functions that act on those rows, adding columns as they do, so
the cache will be heavy on read/write
To try and improve the initial cache population from file(which can be
millions of rows) I distribute a job to the cluster that each reads a piece
of the file to get some sort of upload parallelization.
I am using affinity keys so that the calculations only have to process the
data on the node they run on, which works fine. But then I thought,
performance would probably improve on the cache population step if i just
used LOCAL caches. Its the same end result, calculations working off only
the data they have on the node. I can maybe live with the downsides of local
cache, which i assume include no fault tolerance or load balancing, if the
speed improvements make it worthwhile.
Anyway, basically to get my desired functionality I have 2 options - either
use affinity keys and affinity compute OR use local caches and broadcast