Pending Requests queue bloating

classic Classic list List threaded Threaded
7 messages Options
gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

Pending Requests queue bloating

Hello,
I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

Questions -
1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

2. What is the recommended solution to this problem?
a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)
b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

Anything else you can recommend?

Thanks,
Abhishek







heap dump775.png (154K) Download Attachment
akurbanov akurbanov
Reply | Threaded
Open this post in threaded view
|

RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek

 

 

 

 

 

 

 

gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

RE: Pending Requests queue bloating

In reply to this post by gupabhi
Thanks Anton for the response. I'm using 2.7.5. I think you correctly identified the issue - I do see MetadataUpdateProposedMessage objects inside.
What is not very clear is what the trigger for this is, and what the work around is? What would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5. Or work around it?

Thanks,
Abhishek


From: [hidden email] At: 10/16/19 21:14:10
To: [hidden email], [hidden email]
Subject: RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek

 

 

 

 

 

 

 


gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

RE: Pending Requests queue bloating

In reply to this post by gupabhi
I should have mentioned I'm using String->BinaryObject in my cache. My binary object itself has a large number of field->value pairs (a few thousand). As I run my ingestion jobs using datastreamers, depending on the job type some new fields might be added to the binary object but eventually after all types of jobs have run atleast once, more new fields are rarely added to the BinaryObjects.

Could this be part of the issue I.e. Having large number of fields? Would it help with this problem if I simply stored a Map of key-value pairs instead of a BinaryObject with a few thousand fields?


Thanks,
Abhishek



From: [hidden email] At: 10/21/19 11:26:44
To: [hidden email]
Subject: RE: Pending Requests queue bloating

Thanks Anton for the response. I'm using 2.7.5. I think you correctly identified the issue - I do see MetadataUpdateProposedMessage objects inside.
What is not very clear is what the trigger for this is, and what the work around is? What would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5. Or work around it?

Thanks,
Abhishek


From: [hidden email] At: 10/16/19 21:14:10
To: Abhishek Gupta (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek

 

 

 

 

 

 

 



ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Pending Requests queue bloating

Hello!

Yes, it can be a problem. Moreover, it is not advised using variable BinaryObject composition since this may cause Ignite to track large number of object schemas, and their propagation is a blocking operation.

It is recommended to put all non-essential/changing fields to Map.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 19:15, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
I should have mentioned I'm using String->BinaryObject in my cache. My binary object itself has a large number of field->value pairs (a few thousand). As I run my ingestion jobs using datastreamers, depending on the job type some new fields might be added to the binary object but eventually after all types of jobs have run atleast once, more new fields are rarely added to the BinaryObjects.

Could this be part of the issue I.e. Having large number of fields? Would it help with this problem if I simply stored a Map of key-value pairs instead of a BinaryObject with a few thousand fields?


Thanks,
Abhishek



From: [hidden email] At: 10/21/19 11:26:44
To: [hidden email]
Subject: RE: Pending Requests queue bloating

Thanks Anton for the response. I'm using 2.7.5. I think you correctly identified the issue - I do see MetadataUpdateProposedMessage objects inside.
What is not very clear is what the trigger for this is, and what the work around is? What would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5. Or work around it?

Thanks,
Abhishek


From: [hidden email] At: 10/16/19 21:14:10
To: Abhishek Gupta (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek

 

 

 

 

 

 

 



gupabhi gupabhi
Reply | Threaded
Open this post in threaded view
|

Re: Pending Requests queue bloating

In reply to this post by gupabhi
Thanks Ilya.
I've made the change so that BO keeps only a Map, but I'm still seeing the heap bloating in my node. I believe its the coordinator node that is seeing a bloated heap because at the time I observed its old gen to be very high there was no application ingestion happening. Another interesting observation was that at that very time all the other nodes received a lot of large messages and the coordinator sent a lot of large messages (I see it from ClusterLocalNodeMetricsMXBeanImpl). Yes another observation is that the coordinator nodes off-heap usage was significantly higher than that of the other nodes (100GB for coordinator versus 35-40GB for others). Why this huge difference?

It doesn't seem like its any app data - seems like internal control messages of Ignite. Does this symptom fit the bill of the issue Anton pointed to? If it is, what is the work around for this issue? Or what would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5.

If its not that issue, what else could it be?



From: [hidden email] At: 10/22/19 05:31:01
Cc: [hidden email]
Subject: Re: Pending Requests queue bloating

Hello!

Yes, it can be a problem. Moreover, it is not advised using variable BinaryObject composition since this may cause Ignite to track large number of object schemas, and their propagation is a blocking operation.

It is recommended to put all non-essential/changing fields to Map.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 19:15, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
I should have mentioned I'm using String->BinaryObject in my cache. My binary object itself has a large number of field->value pairs (a few thousand). As I run my ingestion jobs using datastreamers, depending on the job type some new fields might be added to the binary object but eventually after all types of jobs have run atleast once, more new fields are rarely added to the BinaryObjects.

Could this be part of the issue I.e. Having large number of fields? Would it help with this problem if I simply stored a Map of key-value pairs instead of a BinaryObject with a few thousand fields?


Thanks,
Abhishek



From: [hidden email] At: 10/21/19 11:26:44
To: [hidden email]
Subject: RE: Pending Requests queue bloating

Thanks Anton for the response. I'm using 2.7.5. I think you correctly identified the issue - I do see MetadataUpdateProposedMessage objects inside.
What is not very clear is what the trigger for this is, and what the work around is? What would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5. Or work around it?

Thanks,
Abhishek


From: [hidden email] At: 10/16/19 21:14:10
To: Abhishek Gupta (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek

 

 

 

 

 

 

 




ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Pending Requests queue bloating

Hello!

I'm not really sure how to answer without looking at your heap dumps, or at least histogram.

However, I'm pretty sure that you can bloat heap with BinaryObject schemas if not being careful. Please keep a number of different BinaryObject schemas (sets of fields) to a minimum.

Regards,
--
Ilya Kasnacheev


ср, 23 окт. 2019 г. в 06:07, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
Thanks Ilya.
I've made the change so that BO keeps only a Map, but I'm still seeing the heap bloating in my node. I believe its the coordinator node that is seeing a bloated heap because at the time I observed its old gen to be very high there was no application ingestion happening. Another interesting observation was that at that very time all the other nodes received a lot of large messages and the coordinator sent a lot of large messages (I see it from ClusterLocalNodeMetricsMXBeanImpl). Yes another observation is that the coordinator nodes off-heap usage was significantly higher than that of the other nodes (100GB for coordinator versus 35-40GB for others). Why this huge difference?

It doesn't seem like its any app data - seems like internal control messages of Ignite. Does this symptom fit the bill of the issue Anton pointed to? If it is, what is the work around for this issue? Or what would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5.

If its not that issue, what else could it be?



From: [hidden email] At: 10/22/19 05:31:01
Cc: [hidden email]
Subject: Re: Pending Requests queue bloating

Hello!

Yes, it can be a problem. Moreover, it is not advised using variable BinaryObject composition since this may cause Ignite to track large number of object schemas, and their propagation is a blocking operation.

It is recommended to put all non-essential/changing fields to Map.

Regards,
--
Ilya Kasnacheev


пн, 21 окт. 2019 г. в 19:15, Abhishek Gupta (BLOOMBERG/ 731 LEX) <[hidden email]>:
I should have mentioned I'm using String->BinaryObject in my cache. My binary object itself has a large number of field->value pairs (a few thousand). As I run my ingestion jobs using datastreamers, depending on the job type some new fields might be added to the binary object but eventually after all types of jobs have run atleast once, more new fields are rarely added to the BinaryObjects.

Could this be part of the issue I.e. Having large number of fields? Would it help with this problem if I simply stored a Map of key-value pairs instead of a BinaryObject with a few thousand fields?


Thanks,
Abhishek



From: [hidden email] At: 10/21/19 11:26:44
To: [hidden email]
Subject: RE: Pending Requests queue bloating

Thanks Anton for the response. I'm using 2.7.5. I think you correctly identified the issue - I do see MetadataUpdateProposedMessage objects inside.
What is not very clear is what the trigger for this is, and what the work around is? What would help is, if you could help with explaining the minimal changes I need to make to patch 2.7.5. Or work around it?

Thanks,
Abhishek


From: [hidden email] At: 10/16/19 21:14:10
To: Abhishek Gupta (BLOOMBERG/ 731 LEX ) , [hidden email]
Subject: RE: Pending Requests queue bloating

Hello,

 

First of all, what is the exact version/build that is being used?

 

I would say that it is hard to precisely identify what is the issue, knowing only retained sizes of some objects, but there are several different assumptions that may have happened with the cluster. And this queue is not cache entries inserted with data streamer, they don’t fall into discovery RingMessageWorker as they don’t have to go across the whole server topology.

 

There are couple of issues in Ignite JIRA that are related to memory consumption in ServerImpl/ClientImpl, but the one that might possibly fit is: https://issues.apache.org/jira/browse/IGNITE-11058 , because others might not be related to TcpDiscoveryCustomEventMessage class.

 

If you still have your heap dump available, check for the messages and data stored in these custom messages, what kind of messages are there?

 

Since there are some large BinaryMetadata/Holder heap consumption, my guess would be that there is something like MetadataUpdateProposedMessage inside, and here is another ticket that might be useful to be checked for:

https://issues.apache.org/jira/browse/IGNITE-11531

 

And the last thing, data streamer tuning points are described in the javadoc, check for perNodeParallelOperations to do the throttling on the source side: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteDataStreamer.html

 

Regards,

Anton

 

From: [hidden email]
Sent: Thursday, October 17, 2019 1:29 AM
To: [hidden email]
Subject: Pending Requests queue bloating

 

Hello,

I'm using G1GC with 24G on each of the 6 nodes in my grid. I saw issue while ingesting large amounts of data (using DataStreamers) today where the old gen kept bloating and GC pauses kept going up until the point where the grid became unusable. Looking at the heap dump (attached) of one of the nodes it seems like the Pending Messages queue kept bloating to the point where the GC started to churn a lot.

 

Questions -

1. Given the only operation that were occurring on the grid at the time was ingestion using datastreamer, is this queue basically of those messages?

 

2. What is the recommended solution to this problem?

a. The CPU usage on the server was very low throughout, so what could be causing this queue to bloat? (I'm not using any persistence)

b. Is there a way to throttle these requests on the server such that the clients feel back pressure and this queue doesn't fill up?

 

Anything else you can recommend?

 

Thanks,

Abhishek