Load balancing and executing python processes

classic Classic list List threaded Threaded
4 messages Options
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Load balancing and executing python processes

Hi,

I have a question regarding the workings of the load balancer and running prcesses which are outside the jvm.

We have a python commandline tool which is used for processing big data. The tool is highly optimized and is able to instantly load data into ram. Some data can be as large as 20 Gb.

When sending multiple jobs each triggering their own python process, I do not want this to occur on the same machine, is there any way we can use the load balancer to ensure that all jobs are evenly distributed or possibly restrict certain jobs to a node running on a desktop we know has enough memory available.

For example I have two machines one which has 32 Gb of ram and the second which has 64 Gb, one ignite node per machine. Three jobs are sent using ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and obviously froze up.  Having some way of ensuring that two jobs are sent to the machine with more memory would be really helpful.

Thanks and kind regards,
Pascoe
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Load balancing and executing python processes

Hello!

I think you will have to do it semi-manually: on every node you should know how many resources are available, don't allow jobs to overconsume.

You could try to use LoadBalancingSpi but as far as I have heard, it is not trivial.

Ignite it not a scheduler, so we don't have any built-ins for resource control.

Regards,
--
Ilya Kasnacheev


чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <[hidden email]>:
Hi,

I have a question regarding the workings of the load balancer and running prcesses which are outside the jvm.

We have a python commandline tool which is used for processing big data. The tool is highly optimized and is able to instantly load data into ram. Some data can be as large as 20 Gb.

When sending multiple jobs each triggering their own python process, I do not want this to occur on the same machine, is there any way we can use the load balancer to ensure that all jobs are evenly distributed or possibly restrict certain jobs to a node running on a desktop we know has enough memory available.

For example I have two machines one which has 32 Gb of ram and the second which has 64 Gb, one ignite node per machine. Three jobs are sent using ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and obviously froze up.  Having some way of ensuring that two jobs are sent to the machine with more memory would be really helpful.

Thanks and kind regards,
Pascoe
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Re: Load balancing and executing python processes

Hi Illya,

So I have exciting news. Like you say we have to configure some of the nodes manually and I have done this using the Collision spi and it works great!

I have nodes configured for more powerful pc's that also have more ram available and limit the number of jobs that can run in parallel, really awesome feature.

I now have another issue.

Say I have node A and B both with attribute "compute". So all jobs deployed on the cluster are only executed on these two nodes.

Node A runs on a weaker machine so I limit the number of parallel jobs to 2.

Node B runs on the stronger machine with 4 parallel jobs running at the same time.

Say I send off 100 jobs to the cluster and these are evenly distributed between these two nodes. Node B will obviously finish its queue much faster.

So I tried to implement a JobStealingCollision spi, but its not stealing any jobs from Node A.

Here is some code to show the configuration, can you point out what is missing?

This is the job stealer node that has job stealing enabled:

""""""""""""""""""""""""""""""""""""""""""""""""""""""

  val jobStealer = new JobStealingCollisionSpi();
  jobStealer.setStealingEnabled(true);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(10);

  // Configure stealing attempts number.
  jobStealer.setMaximumStealingAttempts(10);

  val failoverSpi = new JobStealingFailoverSpi();

  val cfg = new IgniteConfiguration();

  val cacheConfig = new CacheConfiguration("myCache");

  /* partition the cache over the entire cluster */
  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setMetricsUpdateFrequency(10);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setCollisionSpi(jobStealer);

  /* jobs are sent to this node for computation */
  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""""

Next the configuration for Node B:
job stealing is disabled, so that other nodes can take jobs from this node

"""""""""""""""""""""""""""""""
  val jobStealer = new JobStealingCollisionSpi();

  jobStealer.setStealingEnabled(false);

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(2);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  val failoverSpi = new JobStealingFailoverSpi();

  val cacheConfig = new CacheConfiguration("myCache");

  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setCollisionSpi(jobStealer);

  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""

I also have a third node reserved for running services with a different attribute.

Any pointers on why the JobStealer node is not stealing any jobs?

Thanks!

On Fri, 30 Aug 2019 at 15:02, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think you will have to do it semi-manually: on every node you should know how many resources are available, don't allow jobs to overconsume.

You could try to use LoadBalancingSpi but as far as I have heard, it is not trivial.

Ignite it not a scheduler, so we don't have any built-ins for resource control.

Regards,
--
Ilya Kasnacheev


чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <[hidden email]>:
Hi,

I have a question regarding the workings of the load balancer and running prcesses which are outside the jvm.

We have a python commandline tool which is used for processing big data. The tool is highly optimized and is able to instantly load data into ram. Some data can be as large as 20 Gb.

When sending multiple jobs each triggering their own python process, I do not want this to occur on the same machine, is there any way we can use the load balancer to ensure that all jobs are evenly distributed or possibly restrict certain jobs to a node running on a desktop we know has enough memory available.

For example I have two machines one which has 32 Gb of ram and the second which has 64 Gb, one ignite node per machine. Three jobs are sent using ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and obviously froze up.  Having some way of ensuring that two jobs are sent to the machine with more memory would be really helpful.

Thanks and kind regards,
Pascoe
dothething dothething
Reply | Threaded
Open this post in threaded view
|

Re: Load balancing and executing python processes

Just some more info.

I use a ContinousMapper within a task that sends off these jobs. Can this maybe be the reason?

On Fri, 30 Aug 2019 at 21:49, Pascoe Scholle <[hidden email]> wrote:
Hi Illya,

So I have exciting news. Like you say we have to configure some of the nodes manually and I have done this using the Collision spi and it works great!

I have nodes configured for more powerful pc's that also have more ram available and limit the number of jobs that can run in parallel, really awesome feature.

I now have another issue.

Say I have node A and B both with attribute "compute". So all jobs deployed on the cluster are only executed on these two nodes.

Node A runs on a weaker machine so I limit the number of parallel jobs to 2.

Node B runs on the stronger machine with 4 parallel jobs running at the same time.

Say I send off 100 jobs to the cluster and these are evenly distributed between these two nodes. Node B will obviously finish its queue much faster.

So I tried to implement a JobStealingCollision spi, but its not stealing any jobs from Node A.

Here is some code to show the configuration, can you point out what is missing?

This is the job stealer node that has job stealing enabled:

""""""""""""""""""""""""""""""""""""""""""""""""""""""

  val jobStealer = new JobStealingCollisionSpi();
  jobStealer.setStealingEnabled(true);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(10);

  // Configure stealing attempts number.
  jobStealer.setMaximumStealingAttempts(10);

  val failoverSpi = new JobStealingFailoverSpi();

  val cfg = new IgniteConfiguration();

  val cacheConfig = new CacheConfiguration("myCache");

  /* partition the cache over the entire cluster */
  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setMetricsUpdateFrequency(10);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setCollisionSpi(jobStealer);

  /* jobs are sent to this node for computation */
  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""""

Next the configuration for Node B:
job stealing is disabled, so that other nodes can take jobs from this node

"""""""""""""""""""""""""""""""
  val jobStealer = new JobStealingCollisionSpi();

  jobStealer.setStealingEnabled(false);

  /* set the number of parallel jobs allowed */
  jobStealer.setActiveJobsThreshold(2);

  jobStealer.setStealingAttributes(Map("compute.node" -> "true"));

  val failoverSpi = new JobStealingFailoverSpi();

  val cacheConfig = new CacheConfiguration("myCache");

  cacheConfig.setCacheMode(CacheMode.PARTITIONED);

  cfg.setCacheConfiguration(cacheConfig);

  cfg.setPeerClassLoadingEnabled(true);

  cfg.setFailoverSpi(failoverSpi);

  cfg.setCollisionSpi(jobStealer);

  cfg.setUserAttributes(Map("compute.node" -> true));

""""""""""""""""""""""""""""""""""""""""""

I also have a third node reserved for running services with a different attribute.

Any pointers on why the JobStealer node is not stealing any jobs?

Thanks!

On Fri, 30 Aug 2019 at 15:02, Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think you will have to do it semi-manually: on every node you should know how many resources are available, don't allow jobs to overconsume.

You could try to use LoadBalancingSpi but as far as I have heard, it is not trivial.

Ignite it not a scheduler, so we don't have any built-ins for resource control.

Regards,
--
Ilya Kasnacheev


чт, 29 авг. 2019 г. в 14:27, Pascoe Scholle <[hidden email]>:
Hi,

I have a question regarding the workings of the load balancer and running prcesses which are outside the jvm.

We have a python commandline tool which is used for processing big data. The tool is highly optimized and is able to instantly load data into ram. Some data can be as large as 20 Gb.

When sending multiple jobs each triggering their own python process, I do not want this to occur on the same machine, is there any way we can use the load balancer to ensure that all jobs are evenly distributed or possibly restrict certain jobs to a node running on a desktop we know has enough memory available.

For example I have two machines one which has 32 Gb of ram and the second which has 64 Gb, one ignite node per machine. Three jobs are sent using ComputeTaskContinuousMapper. The 32Gb  machine received two tasks and obviously froze up.  Having some way of ensuring that two jobs are sent to the machine with more memory would be really helpful.

Thanks and kind regards,
Pascoe