Is there a way to get cache metrics for all the nodes in cluster combined

classic Classic list List threaded Threaded
10 messages Options
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Is there a way to get cache metrics for all the nodes in cluster combined

Hi,


Is there a way to get combined cache metrics for a whole cluster or metrics for selected nodes in a cluster?
Lets say i want to get health of cache on all the nodes or log a combined metrics from a client node?

Regards,
Vinay Sharma
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

i found the way.

cache.metrics(ClusterGroup grp)
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

This post was updated on .
In reply to this post by vinshar
I started a replicated cache on a cluster with 2 server nodes and 1 client node and executed 100,000 puts with key starting from 1 to 100,000. Once put is done i fetched all the entries one by one through cache key. Cache metrics for cluster looks abit off to me. I think i am missing something here. Below is cache configuration and cache metrics for cluster. I started 2 server nodes with below cache config mentioned in their config file and performed puts and gets from client node where i got cache instance through ignite.cache(<cache name>)

<bean class="org.apache.ignite.configuration.CacheConfiguration">
        <property name="name" value="TEST_PARTITIONED" />
        <property name="cacheMode" value="PARTITIONED" />
        <property name="managementEnabled" value="true" />
        <property name="statisticsEnabled" value="true" />
        <property name="evictionPolicy">
                <bean class="org.apache.ignite.cache.eviction.fifo.FifoEvictionPolicy">
                       
                        <property name="maxSize" value="20000" />
                </bean>
        </property>
        <property name="startSize" value="21000" />
</bean>


CLUSTER Cache Metrics = CacheMetricsSnapshot [reads=99104, puts=100000, hits=39104, misses=39104, txCommits=0, txRollbacks=0, evicts=60000, removes=0, putAvgTimeNanos=0.0, getAvgTimeNanos=0.0, rmvAvgTimeNanos=0.0, commitAvgTimeNanos=0.0, rollbackAvgTimeNanos=0.0, cacheName=TEST_PARTITIONED, overflowSize=0, offHeapGets=0, offHeapPuts=0, offHeapRemoves=0, offHeapEvicts=0, offHeapHits=0, offHeapMisses=0, offHeapEntriesCnt=0, offHeapPrimaryEntriesCnt=0, offHeapBackupEntriesCnt=0, offHeapAllocatedSize=0, offHeapMaxSize=-1, swapGets=0, swapPuts=0, swapRemoves=0, swapEntriesCnt=0, swapHits=0, swapMisses=0, swapSize=0, size=0, keySize=0, isEmpty=true, dhtEvictQueueCurrSize=0, txThreadMapSize=0, txXidMapSize=0, txCommitQueueSize=0, txPrepareQueueSize=0, txStartVerCountsSize=0, txCommittedVersionsSize=4, txRolledbackVersionsSize=4, txDhtThreadMapSize=-1, txDhtXidMapSize=-1, txDhtCommitQueueSize=0, txDhtPrepareQueueSize=0, txDhtStartVerCountsSize=0, txDhtCommittedVersionsSize=-1, txDhtRolledbackVersionsSize=-1, isWriteBehindEnabled=false, writeBehindFlushSize=-1, writeBehindFlushThreadCnt=-1, writeBehindFlushFreq=-1, writeBehindStoreBatchSize=-1, writeBehindTotalCriticalOverflowCnt=-1, writeBehindCriticalOverflowCnt=-1, writeBehindErrorRetryCnt=-1, writeBehindBufSize=-1, keyType=java.lang.Object, valType=java.lang.Object, isStoreByVal=true, isStatisticsEnabled=true, isManagementEnabled=true, isReadThrough=false, isWriteThrough=false]

What bothers me in above result is
1) Why reads are 99104 whereas i performed 100,000 gets.
2) Hits=Misses=39104. Why total of hits and missed is not equal to reads?
3) putAvgTimeNanos=getAvgTimeNanos=rmvAvgTimeNanos=0.0. Why all put times are 0?
4) size=keySize=0 in metrics but cache.size() returned 40,000 (which is expected with eviction maxSize 20,000 for partitioned cache on 2 server node cluster)
5) I got cache instance through ignite.cache(<cache name here>). After completion of above process i printed cache.metrics() which was all empty although cache.metrics(<cluster Grp>) prints above metrics.

Regards,
Vinay Sharma
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

Hi,

Metrics/statistics information is delivered from each node to another across the cluster with TcpDiscoverySpi.setHeartbeatFrequency. The default frequency value is 2 seconds. So before acquiring cache.metrics(<cluster Grp>) from some node you should consider this delay. Please make a Thread.sleep() call and double check a result returned by cache.metrics(<cluster Grp>).

In regards to this
5) I got cache instance through ignite.cache(<cache name here>). After completion of above process i printed cache.metrics() which was all empty although cache.metrics(<cluster Grp>) prints above metrics.

In my understanding cache.metrics() call is done from a client node that doesn't hold cache data at all. So the result is expected.

Regards,
Denis


vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

This post was updated on .
Hi Denis,

Thanks for inputs. I took another run by putting thread sleep before printing metric and numbers are good now but issue 3 and 4 are still there. is keeping these numbers 0 in combined metrics from cluster intentional due to some reason?

Vinay wrote
3) putAvgTimeNanos=getAvgTimeNanos=rmvAvgTimeNanos=0.0. Why all put times are 0?
4) size=keySize=0 in metrics but cache.size() returned 40,000 (which is expected with eviction maxSize 20,000 for partitioned cache on 2 server node cluster)
Regarding point 5

Denis Magda wrote
In regards to this
5) I got cache instance through ignite.cache(<cache name here>). After completion of above process i printed cache.metrics() which was all empty although cache.metrics(<cluster Grp>) prints above metrics.

In my understanding cache.metrics() call is done from a client node that doesn't hold cache data at all. So the result is expected.
As a client shouldn't there be a way to see metrics of just my client node cache? how can i determine frequency of operations being performed by my apps client node on a cache which already existed on server?
Lets say my client node started facing some network issue causing 200 ms delay between client node and server node. So cache put now will take a bit more than 200ms from client node. Will this be reflected in combined cache metrics fetched by client node through cache.metrics(<Cluster grp>)? or will combined metrics will just print averages etc with start time as time when server node got a request from client node?

as cache also exists on client node and on this cache client app performs all operations. Although internally client node cache sends all data to server node but still as a client app i am performing all operations on client node cache and i might be interested in metrics just for that node.

Regards,
Vinay Sharma
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

In reply to this post by Denis Magda


On Mon, Jan 25, 2016 at 2:59 AM, Denis Magda <[hidden email]> wrote:
Hi,

Metrics/statistics information is delivered from each node to another across
the cluster with TcpDiscoverySpi.setHeartbeatFrequency. The default
frequency value is 2 seconds. So before acquiring cache.metrics(<cluster
Grp>) from some node you should consider this delay. Please make a
Thread.sleep() call and double check a result returned by
cache.metrics(<cluster Grp>).

In regards to this
5) I got cache instance through ignite.cache(<cache name here>). After
completion of above process i printed cache.metrics() which was all empty
although cache.metrics(<cluster Grp>) prints above metrics.

In my understanding cache.metrics() call is done from a client node that
doesn't hold cache data at all. So the result is expected.

Denis, the result may be expected, but not intuitive. Client node cache should provide metrics across all server caches, in my view. Any reason why we have not implemented it this way?


Regards,
Denis






--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Is-there-a-way-to-get-cache-metrics-for-all-the-nodes-in-cluster-combined-tp2674p2690.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

In reply to this post by vinshar
Hi Vinay,

Here are my answers on your questions:

1. As Denis mentioned, metrics are collected from all nodes and they are sent in heartbeat messages. So it's possible that metrics for servers are not updated immediately on the client, but they will sync up eventually. If you add a sleep for 1-2 seconds before calling cache.metrics(), you will see correct numbers.
2,3,4. This is something that needs to be fixed. I will create ticket(s).
5. Currently metrics() method without parameters returns metrics for local node only, and I agree this is counterintuitive. I think we should calculate numbers for the whole cluster instead.

-Val
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

Hi Val,

What Denis mentioned was correct. I got correct number of get / put / size and eviction after i put a sleep time. so query 1 is resolved.

You mentioned that 2,3 and 4 are issues which has to be fixed and a new ticket will be created. I will look forward to the resolution of ticket.

Regarding 5.

vkulichenko wrote
5. Currently metrics() method without parameters returns metrics for local node only, and I agree this is counterintuitive. I think we should calculate numbers for the whole cluster instead.
If i understood correctly then we think that we should call "cache.metrics(ignite.cluster())" on each call to cache.metrics() for cache with no data on client node. is that right?
Even if we start returning numbers of whole cluster, there will be no way to see numbers of just client node cache.
I understand that most of metrics attributes like "getAverageTxCommitTime" etc will be zero on client node cache but should't there be some way to get some basic metrics attributes like total gets, puts etc for these caches?

Regards,
Vinay

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

Vinay,

To get metrics for single node (server or client), you can always provide a cluster group which will include only this node.

But I believe in most cases metrics collected from all nodes are most useful. And this is what should be returned by default, if cluster group is not provided.

-Val
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to get cache metrics for all the nodes in cluster combined

Created ticket for issues discussed here: https://issues.apache.org/jira/browse/IGNITE-2483

-Val