Determine Node Health?

classic Classic list List threaded Threaded
6 messages Options
Chris Berry Chris Berry
Reply | Threaded
Open this post in threaded view
|

Determine Node Health?

Hi,

We are using an Ignite ComputeGrid, and it is mostly working nicely.

Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in
our ComputeGrid.
Even though that Node was quite slow, it was never removed from the
map/reduce – slowing down all computes.

We have already built a system that allows us to add/subtract Nodes to the
ComputeGrid based on when they are actually “ready to compute”,
Because our Nodes take considerable time to be truly ready for computation
(i.e. quite a bit of prepreparation is required).
So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create
the compute.

```
ClusterGroup readyNodes =
readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster());
log.debug(dumpClusterGroup(readyNodes));
return ignite.compute(readyNodes);
```

So. My question.
Does Ignite keep any information that we can use to determine if a Node is
healthy?
I.e. some way that we can locate any outliers in the ComputeGrid?

For example, the Node in our recent incident was at 100% CPU and was much,
much slower in the reduce phase.

Any help/advise would be much appreciated.

Thanks,
-- Chris





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Alex Plehanov Alex Plehanov
Reply | Threaded
Open this post in threaded view
|

Re: Determine Node Health?

Hello Chris, 

There is no such metric as "node is healthy" now, but each node provides a lot of low-level metrics such as CPU usage, memory usage, jobs execution/waiting time etc, which you can combine and define your own criteria of "healthy node". These metrics available cluster-wide and contains information for each node, see ClusterGroup#metrics(), ClusterNode#metrics() methods.


ср, 5 сент. 2018 г. в 0:39, Chris Berry <[hidden email]>:
Hi,

We are using an Ignite ComputeGrid, and it is mostly working nicely.

Recently we had a Node with "Noisy Neighbors" in AWS that wrecked havoc in
our ComputeGrid.
Even though that Node was quite slow, it was never removed from the
map/reduce – slowing down all computes.

We have already built a system that allows us to add/subtract Nodes to the
ComputeGrid based on when they are actually “ready to compute”,
Because our Nodes take considerable time to be truly ready for computation
(i.e. quite a bit of prepreparation is required).
So, to accomplish this, we use a dynamic Ignite ClusterGroup when we create
the compute.

```
ClusterGroup readyNodes =
readyForComputeMonitor.getNodesReadyForCompute(ignite.cluster());
log.debug(dumpClusterGroup(readyNodes));
return ignite.compute(readyNodes);
```

So. My question.
Does Ignite keep any information that we can use to determine if a Node is
healthy?
I.e. some way that we can locate any outliers in the ComputeGrid?

For example, the Node in our recent incident was at 100% CPU and was much,
much slower in the reduce phase.

Any help/advise would be much appreciated.

Thanks,
-- Chris





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
vgrigorev vgrigorev
Reply | Threaded
Open this post in threaded view
|

Re: Determine Node Health?

I would propose to make periodic call to all nodes one by one
with some simple remote function.
Measure time or each node responce, and if it is low for some node according
to your needs, avoid using this node for some period.

How to choose nodes for call, single or many:

        IgniteCompute compute = ignite.compute(ignite.cluster().forNodeIds(
set UUID here ));
        final Collection<String> mapKexs = compute.broadcast(
                new IgniteCallable<String>() {
                    // Inject Ignite instance.
                    @IgniteInstanceResource
                    private Ignite ignite;

                    @Override
                    public String call() throws Exception {
                        log.debug(" DIAGNOSTICS: node is `{}`",
ignite.cluster().localNode().consistentId() , url);

                        return ignite.cluster().localNode().consistentId() ;
                    }
                });



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Chris Berry Chris Berry
Reply | Threaded
Open this post in threaded view
|

Re: Determine Node Health?

Thanks to you both!




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Jason.G Jason.G
Reply | Threaded
Open this post in threaded view
|

Re: Determine Node Health?

In reply to this post by vgrigorev
Hi vgrigorev,

I used your suggestion to do health check for each node. But I got memory
leak issue and exit with OOM error:  java heap space.

Below is my example code:

// I create one bean to collect what I want info which include IP, hostname,
createtime and then return json string.
IgniteHealthCheckEntity healthCheck = new IgniteHealthCheckEntity();
ClusterNode node = ignite.cluster().localNode();
List<String> adresses = (List<String>)node.addresses();
String ip = adresses.get(0);
                               
List<String> hostnames = (List<String>)node.hostNames();
String hostname = hostnames.get(0);
                               
healthCheck.setServerIp(ip);
healthCheck.setStatus(0);
healthCheck.setServerHostname(hostname);
healthCheck.setMonitorTime(monitorTime);
healthCheck.setClientIp(clientIp);
String cacheName = "test_monitor_" + ipStr + "_"+ new Date().getTime();
                               
IgniteCache<String, String> putCache = ignite.createCache(cacheName);
putCache.put("test", "test");
String value = putCache.get("test");
if(!"test".equals(value)) {
        message = "Ignite ("+ ip  +") " + "get/put value failed";
        healthCheck.setMessage(message);
        return JSONObject.fromObject(healthCheck).toString();
}else {
        message = "OKOKOK";
        healthCheck.setMessage(message);
        return JSONObject.fromObject(healthCheck).toString();
}





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Determine Node Health?

You’re creating a new cache on each heath check call and never

destroy them – of course, that leads to a memory leak; it’s also awful for the performance.

 

Don’t create a new cache each time. If you really want to check that cache operations work,

use the same one every time.

 

Thanks,

Stan

 

 

From: [hidden email]
Sent: 10 октября 2018 г. 8:49
To: [hidden email]
Subject: Re: Determine Node Health?

 

Hi vgrigorev,

 

I used your suggestion to do health check for each node. But I got memory

leak issue and exit with OOM error:  java heap space.

 

Below is my example code:

 

// I create one bean to collect what I want info which include IP, hostname,

createtime and then return json string.

IgniteHealthCheckEntity healthCheck = new IgniteHealthCheckEntity();

ClusterNode node = ignite.cluster().localNode();

List<String> adresses = (List<String>)node.addresses();

String ip = adresses.get(0);

                                                               

List<String> hostnames = (List<String>)node.hostNames();

String hostname = hostnames.get(0);

                                                               

healthCheck.setServerIp(ip);

healthCheck.setStatus(0);

healthCheck.setServerHostname(hostname);

healthCheck.setMonitorTime(monitorTime);

healthCheck.setClientIp(clientIp);

String cacheName = "test_monitor_" + ipStr + "_"+ new Date().getTime();

                                                               

IgniteCache<String, String> putCache = ignite.createCache(cacheName);

putCache.put("test", "test");

String value = putCache.get("test");

if(!"test".equals(value)) {

                message = "Ignite ("+ ip  +") " + "get/put value failed";

                healthCheck.setMessage(message);

                return JSONObject.fromObject(healthCheck).toString();

}else {

                message = "OKOKOK";

                healthCheck.setMessage(message);

                return JSONObject.fromObject(healthCheck).toString();

}

 

 

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/