Fair queue polling policy?

classic Classic list List threaded Threaded
5 messages Options
Peter Peter
Reply | Threaded
Open this post in threaded view
|

Fair queue polling policy?

Hello,

My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.

Kind Regards
Peter

Peter Peter
Reply | Threaded
Open this post in threaded view
|

Re: Fair queue polling policy?

Hello,

I have found this discussion about the same topic and indeed the example there works and the queues poll fair.

And when I tweak the sleep after put and take, so that the queue stays mostly empty all the, I can reproduce the unfair behaviour!

I'm not sure if this is a bug as it should be the responsibility of the client to avoid overloading itself. E.g. in my case this happened because I allowed too many threads for the tasks on the polling side, leading to too frequent polling, which leads to this mostly empty queue.

But IMO it should be clarified in the documentation as one expects a round robin behaviour even for empty queues. And e.g. in low latency environments and/or environments with many clients this could make problems. I have created an issue about it here: https://issues.apache.org/jira/browse/IGNITE-10496

Kind Regards
Peter

Am 30.11.18 um 01:44 schrieb Peter:

Hello,

My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.

Kind Regards
Peter


Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Fair queue polling policy?

I think what you’re talking about isn’t fairness, it’s round-robinness.

You can’t distribute a single piece of work among multiple nodes fairly – one gets it and others don’t.

Yes, it could be using different node each time, but it I don’t really a use case for that.

 

The queue itself isn’t a load balancer implementation, it doesn’t even need to care about fairness or anything.

All it need is to implement queue interface efficiently.

 

I think I can explain the fact that one node gets the data most of the time.

It’s probably due to that the first value (when the queue is empty) always has the same key – and always ends up on the same node.

So the behavior is not in that the same client get’s the value – it’s in that the same server always stores the first (second, third) value.

When all the servers try to get and remove the same value, the one closest to it (i.e. the one storing it) wins.

We probably could randomize the distribution – but it’s going to cost us in terms of code complexity and, maybe, performance.

 

Overall, I don’t think it’s a bug in Ignite, and we would need a solid justification to change the behavior.

 

Do you have a use case when a random distribution is important?

 

Stan

 

From: [hidden email]
Sent: 30 ноября 2018 г. 17:30
To: [hidden email]
Subject: Re: Fair queue polling policy?

 

Hello,

 

I have found this discussion about the same topic and indeed the example there works and the queues poll fair.

 

And when I tweak the sleep after put and take, so that the queue stays mostly empty all the, I can reproduce the unfair behaviour!

 

I'm not sure if this is a bug as it should be the responsibility of the client to avoid overloading itself. E.g. in my case this happened because I allowed too many threads for the tasks on the polling side, leading to too frequent polling, which leads to this mostly empty queue.

 

But IMO it should be clarified in the documentation as one expects a round robin behaviour even for empty queues. And e.g. in low latency environments and/or environments with many clients this could make problems. I have created an issue about it here: https://issues.apache.org/jira/browse/IGNITE-10496

 

Kind Regards
Peter

 

Am 30.11.18 um 01:44 schrieb Peter:

Hello,

My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.

Kind Regards
Peter

 

 

Peter Peter
Reply | Threaded
Open this post in threaded view
|

Re: Fair queue polling policy?

Hello Stan,

Thanks for your detailed answer on this topic.

I think you are right that this is no bug and currently I do not see a problem with this, except that the documentation is a bit misleading: Given this approach, threads on remote nodes will only start working on the next job when they have completed the previous one, hence creating ideally balanced system where every node only takes the number of jobs it can process, and not more. 

I think without round robin this is not 100% true: "creating ideally balanced system"

Allow me a further question as I have speed problems for objects in the KB-size range. Which overhead can be expected from the distributed queue? Can I assume roughly the same numbers like when sending these objects via http (round robin) or can it be several times slower like I currently observe in my test environment?

Regards
Peter

Am 03.12.18 um 19:02 schrieb Stanislav Lukyanov:

I think what you’re talking about isn’t fairness, it’s round-robinness.

You can’t distribute a single piece of work among multiple nodes fairly – one gets it and others don’t.

Yes, it could be using different node each time, but it I don’t really a use case for that.

 

The queue itself isn’t a load balancer implementation, it doesn’t even need to care about fairness or anything.

All it need is to implement queue interface efficiently.

 

I think I can explain the fact that one node gets the data most of the time.

It’s probably due to that the first value (when the queue is empty) always has the same key – and always ends up on the same node.

So the behavior is not in that the same client get’s the value – it’s in that the same server always stores the first (second, third) value.

When all the servers try to get and remove the same value, the one closest to it (i.e. the one storing it) wins.

We probably could randomize the distribution – but it’s going to cost us in terms of code complexity and, maybe, performance.

 

Overall, I don’t think it’s a bug in Ignite, and we would need a solid justification to change the behavior.

 

Do you have a use case when a random distribution is important?

 

Stan

 

From: [hidden email]
Sent: 30 ноября 2018 г. 17:30
To: [hidden email]
Subject: Re: Fair queue polling policy?

 

Hello,

 

I have found this discussion about the same topic and indeed the example there works and the queues poll fair.

 

And when I tweak the sleep after put and take, so that the queue stays mostly empty all the, I can reproduce the unfair behaviour!

 

I'm not sure if this is a bug as it should be the responsibility of the client to avoid overloading itself. E.g. in my case this happened because I allowed too many threads for the tasks on the polling side, leading to too frequent polling, which leads to this mostly empty queue.

 

But IMO it should be clarified in the documentation as one expects a round robin behaviour even for empty queues. And e.g. in low latency environments and/or environments with many clients this could make problems. I have created an issue about it here: https://issues.apache.org/jira/browse/IGNITE-10496

 

Kind Regards
Peter

 

Am 30.11.18 um 01:44 schrieb Peter:

Hello,

My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.

Kind Regards
Peter

 

 


Peter Peter
Reply | Threaded
Open this post in threaded view
|

Re: Fair queue polling policy?

Hello Stan,

Have a bit thought about this again. The problem is that I used round robin and populate the distributed queue on node1 and node2 BUT still the queue.take is not equally called from both servers and this means that ignite is artificially increasing network traffic IMO.

> It’s probably due to that the first value (when the queue is empty) always has the same key

Shouldn't the implementation prefer polling clients that are local to the "put" if the queue is empty?

Regards
Peter

Am 04.12.18 um 19:07 schrieb Peter:
Hello Stan,

Thanks for your detailed answer on this topic.

I think you are right that this is no bug and currently I do not see a problem with this, except that the documentation is a bit misleading: Given this approach, threads on remote nodes will only start working on the next job when they have completed the previous one, hence creating ideally balanced system where every node only takes the number of jobs it can process, and not more. 

I think without round robin this is not 100% true: "creating ideally balanced system"

Allow me a further question as I have speed problems for objects in the KB-size range. Which overhead can be expected from the distributed queue? Can I assume roughly the same numbers like when sending these objects via http (round robin) or can it be several times slower like I currently observe in my test environment?

Regards
Peter

Am 03.12.18 um 19:02 schrieb Stanislav Lukyanov:

I think what you’re talking about isn’t fairness, it’s round-robinness.

You can’t distribute a single piece of work among multiple nodes fairly – one gets it and others don’t.

Yes, it could be using different node each time, but it I don’t really a use case for that.

 

The queue itself isn’t a load balancer implementation, it doesn’t even need to care about fairness or anything.

All it need is to implement queue interface efficiently.

 

I think I can explain the fact that one node gets the data most of the time.

It’s probably due to that the first value (when the queue is empty) always has the same key – and always ends up on the same node.

So the behavior is not in that the same client get’s the value – it’s in that the same server always stores the first (second, third) value.

When all the servers try to get and remove the same value, the one closest to it (i.e. the one storing it) wins.

We probably could randomize the distribution – but it’s going to cost us in terms of code complexity and, maybe, performance.

 

Overall, I don’t think it’s a bug in Ignite, and we would need a solid justification to change the behavior.

 

Do you have a use case when a random distribution is important?

 

Stan

 

From: [hidden email]
Sent: 30 ноября 2018 г. 17:30
To: [hidden email]
Subject: Re: Fair queue polling policy?

 

Hello,

 

I have found this discussion about the same topic and indeed the example there works and the queues poll fair.

 

And when I tweak the sleep after put and take, so that the queue stays mostly empty all the, I can reproduce the unfair behaviour!

 

I'm not sure if this is a bug as it should be the responsibility of the client to avoid overloading itself. E.g. in my case this happened because I allowed too many threads for the tasks on the polling side, leading to too frequent polling, which leads to this mostly empty queue.

 

But IMO it should be clarified in the documentation as one expects a round robin behaviour even for empty queues. And e.g. in low latency environments and/or environments with many clients this could make problems. I have created an issue about it here: https://issues.apache.org/jira/browse/IGNITE-10496

 

Kind Regards
Peter

 

Am 30.11.18 um 01:44 schrieb Peter:

Hello,

My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.

Kind Regards
Peter

 

 



-- 
GraphHopper.com - fast and flexible route planning