questions

classic Classic list List threaded Threaded
10 messages Options
narges saleh narges saleh
Reply | Threaded
Open this post in threaded view
|

questions

Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
narges saleh narges saleh
Reply | Threaded
Open this post in threaded view
|

Re: questions

Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
narges saleh narges saleh
Reply | Threaded
Open this post in threaded view
|

Re: questions

I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello!

Partitioning based on let's say user id is usually fair, because there usually are 100,000ths of users and neither of those owns disproportionate amount of data.

Partitioning by month is especially bad, since in a given months, all of partitions will be basically idle save for one, and there would be a lot of contention.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh <[hidden email]>:
I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
narges saleh narges saleh
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello Ilya,
 I agree with you that partitioning based on month was a bad example, because most will be idle. Country or customer are better examples of my case. There are limited number of them, but they are disproportionate and they are always active. Let's take the country example. I need to search and aggregate the volume of sales in each city and by country. I have a couple of hundreds countries.
Let me ask a basic question.  If my queries/aggregations are based on cities and countries, do I need to partition based on countries (or even cities)?  I want to avoid network hops for my searches and aggregations as much as possible (I do not slow writes either but I am aware of the trade off between read/writes and replication and partitioning). What do I define my affinity key on and what do I partition on?

thanks again for your help.

On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Partitioning based on let's say user id is usually fair, because there usually are 100,000ths of users and neither of those owns disproportionate amount of data.

Partitioning by month is especially bad, since in a given months, all of partitions will be basically idle save for one, and there would be a lot of contention.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh <[hidden email]>:
I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello!

I don't think that partitioning by country or city is a good idea, since this distribution will be very uneven.

You can have different ways of minimizing network hops, while keeping distributed nature of your database. Database isn't really distributed when for a given city query, only one node is taking all the load and the rest is idle.

Regards,
--
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 13:15, narges saleh <[hidden email]>:
Hello Ilya,
 I agree with you that partitioning based on month was a bad example, because most will be idle. Country or customer are better examples of my case. There are limited number of them, but they are disproportionate and they are always active. Let's take the country example. I need to search and aggregate the volume of sales in each city and by country. I have a couple of hundreds countries.
Let me ask a basic question.  If my queries/aggregations are based on cities and countries, do I need to partition based on countries (or even cities)?  I want to avoid network hops for my searches and aggregations as much as possible (I do not slow writes either but I am aware of the trade off between read/writes and replication and partitioning). What do I define my affinity key on and what do I partition on?

thanks again for your help.

On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Partitioning based on let's say user id is usually fair, because there usually are 100,000ths of users and neither of those owns disproportionate amount of data.

Partitioning by month is especially bad, since in a given months, all of partitions will be basically idle save for one, and there would be a lot of contention.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh <[hidden email]>:
I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
narges saleh narges saleh
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello Ilya

There are parallel streams inserting data for all the countries into different nodes (and caches) and there are parallel queries against the distributed database for different countries, aggregating the data, in some cases inserting back the data, and others returning results. Yes, for a given query, only one or two caches might get hit. But if the volume of data for a given city is too big, the query might hit multiple caches; and hence my question. How do I keep these caches as close as possible to each other?

What would be some of the ways to minimize the network hops? How can I keep the data with the same affinity as close as possible to each other, preferably on the same physical node or neighboring nodes (but across multiple ignite nodes, and caches)?

Thanks and I am sorry for dragging this.


On Fri, Aug 23, 2019 at 5:19 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I don't think that partitioning by country or city is a good idea, since this distribution will be very uneven.

You can have different ways of minimizing network hops, while keeping distributed nature of your database. Database isn't really distributed when for a given city query, only one node is taking all the load and the rest is idle.

Regards,
--
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 13:15, narges saleh <[hidden email]>:
Hello Ilya,
 I agree with you that partitioning based on month was a bad example, because most will be idle. Country or customer are better examples of my case. There are limited number of them, but they are disproportionate and they are always active. Let's take the country example. I need to search and aggregate the volume of sales in each city and by country. I have a couple of hundreds countries.
Let me ask a basic question.  If my queries/aggregations are based on cities and countries, do I need to partition based on countries (or even cities)?  I want to avoid network hops for my searches and aggregations as much as possible (I do not slow writes either but I am aware of the trade off between read/writes and replication and partitioning). What do I define my affinity key on and what do I partition on?

thanks again for your help.

On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Partitioning based on let's say user id is usually fair, because there usually are 100,000ths of users and neither of those owns disproportionate amount of data.

Partitioning by month is especially bad, since in a given months, all of partitions will be basically idle save for one, and there would be a lot of contention.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh <[hidden email]>:
I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: questions

Hello!

It's impossible to answer this question without going into specifics of your use case, which I don't have. Maybe you have a case to show?

Regards,
--
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 16:22, narges saleh <[hidden email]>:
Hello Ilya

There are parallel streams inserting data for all the countries into different nodes (and caches) and there are parallel queries against the distributed database for different countries, aggregating the data, in some cases inserting back the data, and others returning results. Yes, for a given query, only one or two caches might get hit. But if the volume of data for a given city is too big, the query might hit multiple caches; and hence my question. How do I keep these caches as close as possible to each other?

What would be some of the ways to minimize the network hops? How can I keep the data with the same affinity as close as possible to each other, preferably on the same physical node or neighboring nodes (but across multiple ignite nodes, and caches)?

Thanks and I am sorry for dragging this.


On Fri, Aug 23, 2019 at 5:19 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I don't think that partitioning by country or city is a good idea, since this distribution will be very uneven.

You can have different ways of minimizing network hops, while keeping distributed nature of your database. Database isn't really distributed when for a given city query, only one node is taking all the load and the rest is idle.

Regards,
--
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 13:15, narges saleh <[hidden email]>:
Hello Ilya,
 I agree with you that partitioning based on month was a bad example, because most will be idle. Country or customer are better examples of my case. There are limited number of them, but they are disproportionate and they are always active. Let's take the country example. I need to search and aggregate the volume of sales in each city and by country. I have a couple of hundreds countries.
Let me ask a basic question.  If my queries/aggregations are based on cities and countries, do I need to partition based on countries (or even cities)?  I want to avoid network hops for my searches and aggregations as much as possible (I do not slow writes either but I am aware of the trade off between read/writes and replication and partitioning). What do I define my affinity key on and what do I partition on?

thanks again for your help.

On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Partitioning based on let's say user id is usually fair, because there usually are 100,000ths of users and neither of those owns disproportionate amount of data.

Partitioning by month is especially bad, since in a given months, all of partitions will be basically idle save for one, and there would be a lot of contention.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 19:31, narges saleh <[hidden email]>:
I am not sure you can find real world examples where caches can be evenly partitioned, if the partitioning factor is an affinity key. I comparing, with partitioning case with relational databases, say partitioning based on month of the year. I definitely don't have 100s of departments but I do have 10s of departments, but departments are very disproportional in size.
As for rebalancing case, the pods will be added to the system as the volume increases, so I'd assume that would prompt ignite to rebalance. 

On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) No. Ignite only rebalances data when nodes are joining or leaving cluster.
2) Ignite's affinity is not really well suited to such detailed manual assignment. It is assumed that your cache has large number of partitions (e.g. 1024) and data is distributed evenly between all partitions. Having department as affinity key is suboptimal because there's not many departments and they usually vary in size. That's the kind of distribution that you want to avoid.

Regards,
--
Ilya Kasnacheev


чт, 22 авг. 2019 г. в 18:37, narges saleh <[hidden email]>:
Thanks Ilya for replies.
1)  Doesn't ignite rebalance the nodes if there are additional nodes available and the data doesn't fit the cache current ignite node? Consider a scenario where I have 100 pods on a physical node, assuming pod = ignite node.
2)  I am not sure what you mean by confining half of cache to one cluster and another half to another node. If my affinity key is department id, why can't I have department A on a partitioned cache, one partition on one node in cluster A, and the other partition on another node on another cluster.

I might be misunderstanding the whole, and I'd appreciate clarification.

On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1) When there is an overflow, either page eviction kicks in, or, if it is disabled, you get an IgniteOOM, after which the node is no longer usable. Please avoid overflowing any data regions since there's no graceful handling currently.
2) I don't think so. You can't easily confine half of cache's data to one cluster group and another half to other group.

Such scenarios are not recommended. We expect that all partitions have same amount of data. Not that there are a few gargantuan partitions that don't fit in a single node.

Regards,
--
Ilya Kasnacheev


вт, 20 авг. 2019 г. в 06:29, narges saleh <[hidden email]>:
Hello All,

I'd appreciate your answers to my questions.

1) Assuming I use affinity key among 4 caches, and they all end up on the same ignite node. What happens where is an overflow? Does the overflow data end up on a joined node? How do I keep the related data from all the caches close to each other when the volume of exceeds a single node?

2) Is there a concept of cluster affinity, meaning having a cluster group defined based on some affinity key? For example, if I have two departments A and B, can I have a cluster group for department A and another for department B?

Thanks,
Narges