Job Stealing node not stealing jobs

classic Classic list List threaded Threaded
6 messages Options
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Job Stealing node not stealing jobs

HI there, 

I have asked this question, however I asked it under a different and resolved topic, so I posted the quest under a more suitable title. I hope thats ok

We have tried to configure two compute server nodes one of which is running on a weaker machine. The node running on the more powerful machine always finished its tasks far before
the weaker node and then sits idle.

The node is not even sending a steal request, so I must have configured something wrong.

I have attached the code for both nodes if you could kindly point out what I am missing , I would really appreciate it!



ComputeNode.txt (1K) Download Attachment
CompNodeJobStealer.txt (1K) Download Attachment
aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Job Stealing node not stealing jobs

Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing
didn't work because of your weak node began executions of some tasks in
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:

> HI there,
>
> I have asked this question, however I asked it under a different and
> resolved topic, so I posted the quest under a more suitable title. I
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is
> running on a weaker machine. The node running on the more powerful
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out
> what I am missing , I would really appreciate it!
>
>
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Re: Job Stealing node not stealing jobs

Hi,

attached a small scala project. Just set the build path to src after building and compiling with sbt.

We want to execute processes that happen outside the JVM. These processes can be extremely memory intensive which is why I am limiting the
number of parallel jobs that can be executed on a machine.

I have one desktop that has a lot more memory available and can thus execute more jobs in parallel. As all jobs take roughly the same amount of time, this machine will have completed its jobs much faster. I want it to then take jobs from the nodes started on weaker machines once it has completed all its tasks.

Does that make sense?

Hope this helps.

BR,
Pascoe

On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing
didn't work because of your weak node began executions of some tasks in
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:
> HI there,
>
> I have asked this question, however I asked it under a different and
> resolved topic, so I posted the quest under a more suitable title. I
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is
> running on a weaker machine. The node running on the more powerful
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out
> what I am missing , I would really appreciate it!
>
>

JobStealTesting.tar.gz (18K) Download Attachment
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Re: Job Stealing node not stealing jobs

Hello,

is there any update on this?

We have not been able to resolve this issue

Kind regards


On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <[hidden email]> wrote:
Hi,

attached a small scala project. Just set the build path to src after building and compiling with sbt.

We want to execute processes that happen outside the JVM. These processes can be extremely memory intensive which is why I am limiting the
number of parallel jobs that can be executed on a machine.

I have one desktop that has a lot more memory available and can thus execute more jobs in parallel. As all jobs take roughly the same amount of time, this machine will have completed its jobs much faster. I want it to then take jobs from the nodes started on weaker machines once it has completed all its tasks.

Does that make sense?

Hope this helps.

BR,
Pascoe

On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing
didn't work because of your weak node began executions of some tasks in
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:
> HI there,
>
> I have asked this question, however I asked it under a different and
> resolved topic, so I posted the quest under a more suitable title. I
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is
> running on a weaker machine. The node running on the more powerful
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out
> what I am missing , I would really appreciate it!
>
>
stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Job Stealing node not stealing jobs

I don’t know the answer to your jon stealing question, but I do wonder if that’s the right configuration for your requirements. Why not use the weighted load balancer (https://apacheignite.readme.io/docs/load-balancing)? That’s designed to work in cases where nodes are of differing sizes.

Regards,
Stephen

On 10 Sep 2019, at 10:19, Pascoe Scholle <[hidden email]> wrote:

Hello,

is there any update on this?

We have not been able to resolve this issue

Kind regards


On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <[hidden email]> wrote:
Hi,

attached a small scala project. Just set the build path to src after building and compiling with sbt.

We want to execute processes that happen outside the JVM. These processes can be extremely memory intensive which is why I am limiting the
number of parallel jobs that can be executed on a machine.

I have one desktop that has a lot more memory available and can thus execute more jobs in parallel. As all jobs take roughly the same amount of time, this machine will have completed its jobs much faster. I want it to then take jobs from the nodes started on weaker machines once it has completed all its tasks.

Does that make sense?

Hope this helps.

BR,
Pascoe

On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing
didn't work because of your weak node began executions of some tasks in
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:
> HI there,
>
> I have asked this question, however I asked it under a different and
> resolved topic, so I posted the quest under a more suitable title. I
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is
> running on a weaker machine. The node running on the more powerful
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out
> what I am missing , I would really appreciate it!
>
>


dothething dothething
Reply | Threaded
Open this post in threaded view
|

Re: Job Stealing node not stealing jobs

Thanks for the prompt response. I have looked the WeightedRandomLoadBalancingSpi. It does not look like one can set the number of parallel jobs though and this is big requirement. Also, it is inevitable that there will be nodes which will sit idle, due to the nature of jobs that will be deployed on the nodes and the job stealer just seems like the perfect solution. Regardless, I have used the code provided for the job stealing spi on the docs page and it isnt functioning as intended. 


On Tue, 10 Sep 2019 at 11:34, Stephen Darlington <[hidden email]> wrote:
I don’t know the answer to your jon stealing question, but I do wonder if that’s the right configuration for your requirements. Why not use the weighted load balancer (https://apacheignite.readme.io/docs/load-balancing)? That’s designed to work in cases where nodes are of differing sizes.

Regards,
Stephen

On 10 Sep 2019, at 10:19, Pascoe Scholle <[hidden email]> wrote:

Hello,

is there any update on this?

We have not been able to resolve this issue

Kind regards


On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <[hidden email]> wrote:
Hi,

attached a small scala project. Just set the build path to src after building and compiling with sbt.

We want to execute processes that happen outside the JVM. These processes can be extremely memory intensive which is why I am limiting the
number of parallel jobs that can be executed on a machine.

I have one desktop that has a lot more memory available and can thus execute more jobs in parallel. As all jobs take roughly the same amount of time, this machine will have completed its jobs much faster. I want it to then take jobs from the nodes started on weaker machines once it has completed all its tasks.

Does that make sense?

Hope this helps.

BR,
Pascoe

On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing
didn't work because of your weak node began executions of some tasks in
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:
> HI there,
>
> I have asked this question, however I asked it under a different and
> resolved topic, so I posted the quest under a more suitable title. I
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is
> running on a weaker machine. The node running on the more powerful
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out
> what I am missing , I would really appreciate it!
>
>