Partitioned dataset join and large number of related keys

classic Classic list List threaded Threaded
3 messages Options
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Partitioned dataset join and large number of related keys

asked by mshah

Referring to the following from the document: "join between two PARTITIONED data sets, then you must make sure that the keys you are joining on are collocated."

What does it mean by collocated the keys partitioned dataset? Specifically around memory. By collocating in the suggested manner, is the scaling of the application limited? For example, let's say that I have a Product and for which I have a million Orders. If I had to model the Product object to have collocated the Orders, does it mean that all million Order objects will be on the same node as the 1 Product object, or are the keys on the one node, but the detailed Order objects are distributed? If, so, does this not create some sort of an upper limit?

Does the same carry over, when I have a product object that needs to have a join with two types of other data, lets say orders and returns?

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss

pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned dataset join and large number of related keys

This post was updated on .
commented by yakov zhdanov

Basically you are right, collocation can be a limiting factor. The only solution here is to choose collocation strategy wisely. For example do you have that many Products, so you need to store them in partitioned cache? If this is mostly read-only and not that large dataset, it can be stored in replicated cache and you will not be required to setup this collocation at all. Another more complex option is to choose another collocation key, for example artificially split Product into multiple Products (e.g. store it multiple times under different keys) and randomize Orders across these collocation keys.

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss
pgarg pgarg
Reply | Threaded
Open this post in threaded view
|

Re: Partitioned dataset join and large number of related keys

commented by dmitriy setrakyan

I also want to add that there are only 2 ways to solve this problem efficiently:

1. using REPLICATED caches to store some data. This way you can do joins between replicated and partitioned caches at will.
2. choosing a proper collocation strategy for partitioned caches.

A much slower alternative would be to move large data sets around to do joins, but that is specifically avoided in Ignite as it will be very slow and offers no performance advantage over disk-based databases.

-----
This post is migrated from now discontinued Apache Ignite forum at
http://apacheignite.readme.io/v1.0/discuss