Questions related to check pointing

classic Classic list List threaded Threaded
21 messages Options
12
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Questions related to check pointing

Hi,

We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

Hello!

1. If we knew the specific circumstances in which a specific setting value will yield the most benefit, we would've already set it to that value. A setting means that you may tune it and get better results, or not. But in general we can't promise you anything. I did see improvements from increasing this setting in a very specific setup, but in general you may leave it as is.

2. More frequent checkpoints mean increased write amplification. So reducing this value may overwhelm your system with load that it was able to handle previously. You can set this setting to arbitrary small value, meaning that checkpoints will be purely sequential without any pauses between them.

3. I don't think that default throttling mechanism will emit any warnings. What do you see in thread dumps?

Regards,
--
Ilya Kasnacheev


ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <[hidden email]>:
Hi,

We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)

Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

Hi Ilya,

Regarding the throttling question, I have not yet looked at thread dumps - the observed behaviour has been seen in production metrics and logging. What would you expect a thread dump to show in this case?

Given my description of the sizes of the data regions and the numbers of pages being updated in a checkpoint would you expect any throttling behaviour?

Thanks,
Raymond.

On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1. If we knew the specific circumstances in which a specific setting value will yield the most benefit, we would've already set it to that value. A setting means that you may tune it and get better results, or not. But in general we can't promise you anything. I did see improvements from increasing this setting in a very specific setup, but in general you may leave it as is.

2. More frequent checkpoints mean increased write amplification. So reducing this value may overwhelm your system with load that it was able to handle previously. You can set this setting to arbitrary small value, meaning that checkpoints will be purely sequential without any pauses between them.

3. I don't think that default throttling mechanism will emit any warnings. What do you see in thread dumps?

Regards,
--
Ilya Kasnacheev


ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <[hidden email]>:
Hi,

We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)



--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

As another detail, we have the WriteThrottlingEnabled property left at its default value of 'false', so I would not ordinarily expect throttling, correct?

On Tue, Dec 29, 2020 at 10:04 AM Raymond Wilson <[hidden email]> wrote:
Hi Ilya,

Regarding the throttling question, I have not yet looked at thread dumps - the observed behaviour has been seen in production metrics and logging. What would you expect a thread dump to show in this case?

Given my description of the sizes of the data regions and the numbers of pages being updated in a checkpoint would you expect any throttling behaviour?

Thanks,
Raymond.

On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

1. If we knew the specific circumstances in which a specific setting value will yield the most benefit, we would've already set it to that value. A setting means that you may tune it and get better results, or not. But in general we can't promise you anything. I did see improvements from increasing this setting in a very specific setup, but in general you may leave it as is.

2. More frequent checkpoints mean increased write amplification. So reducing this value may overwhelm your system with load that it was able to handle previously. You can set this setting to arbitrary small value, meaning that checkpoints will be purely sequential without any pauses between them.

3. I don't think that default throttling mechanism will emit any warnings. What do you see in thread dumps?

Regards,
--
Ilya Kasnacheev


ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <[hidden email]>:
Hi,

We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.

I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:

1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.

2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.

We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.

It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)



--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

In reply to this post by Raymond Wilson
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 
[1] https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

Hi Zhenya,

1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 

2. Thanks for the detail, we will look for that in thread dumps when we can create them.

3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.

4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.

Thanks,
Raymond.
    


On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <[hidden email]> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour

Perhaps this change may improve the checkpointing issue we are seeing?

Raymond.


On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <[hidden email]> wrote:
Hi Zhenya,

1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 

2. Thanks for the detail, we will look for that in thread dumps when we can create them.

3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.

4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.

Thanks,
Raymond.
    


On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <[hidden email]> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re[2]: Questions related to check pointing


Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 

I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Questions related to check pointing

I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.

On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <[hidden email]> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 

I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Questions related to check pointing

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.

I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.

After reviewing our logs I found this: (one example)

2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  

Which suggests we may have the issue where writes are frozen until the check point is completed.

Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:

    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
     * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;

This raises two questions: 

1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.

The 'limit holder' is calculated like this:

    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }

        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?

2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 

The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?

Thanks,
Raymond.


On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <[hidden email]> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.

On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <[hidden email]> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 

I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re[4]: Questions related to check pointing

Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re[4]: Questions related to check pointing

In reply to this post by Raymond Wilson

All write operations will be blocked for this timeout : checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :  reason='too many dirty pages' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.


 
In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Questions related to check pointing

Hi Zhenya,

The matching checkpoint finished log is this:

2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, pagesWrite=1150ms, fsync=37104ms, total=38571ms] 

Regards your comment that 3/4 of pages in whole data region need to be dirty to trigger this, can you confirm this is 3/4 of the maximum size of the data region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)

Are data regions independently checkpointed, or are they checkpointed as a whole, so that a 'too many dirty pages' condition affects all data regions in terms of write blocking?

Can you comment on my query regarding should we set Min and Max size of the data region to be the same? Ie: Don't bother with growing the data region memory use on demand, just allocate the maximum?  

In terms of the checkpoint lock hold time metric, of the checkpoints quoting 'too many dirty pages' there is one instance apart from the one I have provided earlier violating this limit, ie:

2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, splitAndSortCpPagesDuration=276ms, pages=77774, reason='too many dirty pages'] 

This is out of a population of 16 instances I can find. The remainder have lock times of 16-17ms.

Regarding writes of pages to the persistent store, does the check pointing system parallelise writes across partitions ro maximise throughput? 

Thanks,
Raymond.

 

On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky <[hidden email]> wrote:

All write operations will be blocked for this timeout : checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :  reason='too many dirty pages' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.


 
In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Questions related to check pointing

In reply to this post by Zhenya Stanilovsky
Regards this section of code:

            maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
                ? pool.pages() * 3L / 4
                : Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <[hidden email]> wrote:
Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Questions related to check pointing

Hello!

I guess it's pool.pages() * 3L / 4
Since, counter intuitively, the default ThrottlingPolicy is not ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.

Regards,

--
Ilya Kasnacheev


чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <[hidden email]>:
Regards this section of code:

            maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
                ? pool.pages() * 3L / 4
                : Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <[hidden email]> wrote:
Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Questions related to check pointing

I checked our code that creates the primary data region, and it does set the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in that region.

The secondary data region is much smaller, and is set to min/max = 128 Mb of memory.

The checkpoints with the "too many dirty pages" reason were quoting less than 100,000 dirty pages, so this must have been triggered on the size of the smaller data region.

Both these data regions have persistence, and I think this may have been a sub-optimal way to set it up. My aim was to provide a dedicated channel for inbound data arriving to be queued that was not impacted by updates due to processing of that data. I think it may be better to will change this arrangement to use a single data region to make the checkpointing process simpler and reduce cases where it decides there are too many dirty pages.

On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I guess it's pool.pages() * 3L / 4
Since, counter intuitively, the default ThrottlingPolicy is not ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.

Regards,

--
Ilya Kasnacheev


чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <[hidden email]>:
Regards this section of code:

            maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
                ? pool.pages() * 3L / 4
                : Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <[hidden email]> wrote:
Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[hidden email]
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Questions related to check pointing

Hello!

I think it's a sensible explanation.

Regards,
--
Ilya Kasnacheev


ср, 6 янв. 2021 г. в 14:32, Raymond Wilson <[hidden email]>:
I checked our code that creates the primary data region, and it does set the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in that region.

The secondary data region is much smaller, and is set to min/max = 128 Mb of memory.

The checkpoints with the "too many dirty pages" reason were quoting less than 100,000 dirty pages, so this must have been triggered on the size of the smaller data region.

Both these data regions have persistence, and I think this may have been a sub-optimal way to set it up. My aim was to provide a dedicated channel for inbound data arriving to be queued that was not impacted by updates due to processing of that data. I think it may be better to will change this arrangement to use a single data region to make the checkpointing process simpler and reduce cases where it decides there are too many dirty pages.

On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I guess it's pool.pages() * 3L / 4
Since, counter intuitively, the default ThrottlingPolicy is not ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY.

Regards,

--
Ilya Kasnacheev


чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <[hidden email]>:
Regards this section of code:

            maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
                ? pool.pages() * 3L / 4
                : Math.min(pool.pages() * 2L / 3, cpPoolPages);

I think the correct ratio will be 2/3 of pages as we do not have a throttling policy defined, correct?.

On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <[hidden email]> wrote:
Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null)
    break;
else {
    CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages");
and near you can see that :

maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
    ? pool.pages() * 3L / 4
    : Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
 

In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[hidden email]
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

In reply to this post by Raymond Wilson

fsync=37104ms too long for such pages amount : pages=33421, plz check how can you improve fsync on your storage.

 


------- Forwarded message -------
From: "Raymond Wilson" <raymond_wilson@...>
To: user <user@...>, "Zhenya Stanilovsky" <arzamas123@...>
Cc:
Subject: Re: Re[4]: Questions related to check pointing
Date: Thu, 31 Dec 2020 01:46:20 +0300
 
Hi Zhenya,
 
The matching checkpoint finished log is this:
 
2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, pagesWrite=1150ms, fsync=37104ms, total=38571ms] 
 
Regards your comment that 3/4 of pages in whole data region need to be dirty to trigger this, can you confirm this is 3/4 of the maximum size of the data region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)
 
Are data regions independently checkpointed, or are they checkpointed as a whole, so that a 'too many dirty pages' condition affects all data regions in terms of write blocking?
 
Can you comment on my query regarding should we set Min and Max size of the data region to be the same? Ie: Don't bother with growing the data region memory use on demand, just allocate the maximum?  
 
In terms of the checkpoint lock hold time metric, of the checkpoints quoting 'too many dirty pages' there is one instance apart from the one I have provided earlier violating this limit, ie:
 
2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, splitAndSortCpPagesDuration=276ms, pages=77774, reason='too many dirty pages'] 
 
This is out of a population of 16 instances I can find. The remainder have lock times of 16-17ms.
 
Regarding writes of pages to the persistent store, does the check pointing system parallelise writes across partitions ro maximise throughput? 
 
Thanks,
Raymond.
 
 
 
On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky <arzamas123@...> wrote:

All write operations will be blocked for this timeout : checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :  reason='too many dirty pages' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.


 
In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 


 
 
 
 
 
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re:Questions related to check pointing

In reply to this post by Raymond Wilson
https://docs.aws.amazon.com/efs/latest/ug/storage-classes.html
 
Hi Zhenya,
 
The matching checkpoint finished log is this:
 
2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, pagesWrite=1150ms, fsync=37104ms, total=38571ms] 
 
Regards your comment that 3/4 of pages in whole data region need to be dirty to trigger this, can you confirm this is 3/4 of the maximum size of the data region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)
 
Are data regions independently checkpointed, or are they checkpointed as a whole, so that a 'too many dirty pages' condition affects all data regions in terms of write blocking?
 
Can you comment on my query regarding should we set Min and Max size of the data region to be the same? Ie: Don't bother with growing the data region memory use on demand, just allocate the maximum?  
 
In terms of the checkpoint lock hold time metric, of the checkpoints quoting 'too many dirty pages' there is one instance apart from the one I have provided earlier violating this limit, ie:
 
2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, splitAndSortCpPagesDuration=276ms, pages=77774, reason='too many dirty pages'] 
 
This is out of a population of 16 instances I can find. The remainder have lock times of 16-17ms.
 
Regarding writes of pages to the persistent store, does the check pointing system parallelise writes across partitions ro maximise throughput? 
 
Thanks,
Raymond.
 
 
 
On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky <arzamas123@...> wrote:

All write operations will be blocked for this timeout : checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :  reason='too many dirty pages' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.


 
In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 


 
 
 
 
 
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Questions related to check pointing

Hi Zhenya,

Thanks for the pointers - I will look into them.

I have been doing some additional reading into this and discovered we are using a 4.0 NFS client, which seems to be the first 'no-no'; we will look at updating to use the 41 NFS client.

We have modified our default timer cadence for checkpointing from 3 minutes to 1 minutes, which seems to be giving us better performance. We will continue to measure the impact that has.

Lastly, I'm planning to merge our two data regions into a single region to reduce 'too many dirty pages' checkpoints due to high write activity in a small region.

Would using larger pages sizes (eg: 16kb) be useful with EFS?

Raymond.

On Tue, Jan 12, 2021 at 8:27 PM Zhenya Stanilovsky <[hidden email]> wrote:
 
Hi Zhenya,
 
The matching checkpoint finished log is this:
 
2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, pagesWrite=1150ms, fsync=37104ms, total=38571ms] 
 
Regards your comment that 3/4 of pages in whole data region need to be dirty to trigger this, can you confirm this is 3/4 of the maximum size of the data region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and used is 2Gb, would 1.5Gb of dirty pages trigger this?)
 
Are data regions independently checkpointed, or are they checkpointed as a whole, so that a 'too many dirty pages' condition affects all data regions in terms of write blocking?
 
Can you comment on my query regarding should we set Min and Max size of the data region to be the same? Ie: Don't bother with growing the data region memory use on demand, just allocate the maximum?  
 
In terms of the checkpoint lock hold time metric, of the checkpoints quoting 'too many dirty pages' there is one instance apart from the one I have provided earlier violating this limit, ie:
 
2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, splitAndSortCpPagesDuration=276ms, pages=77774, reason='too many dirty pages'] 
 
This is out of a population of 16 instances I can find. The remainder have lock times of 16-17ms.
 
Regarding writes of pages to the persistent store, does the check pointing system parallelise writes across partitions ro maximise throughput? 
 
Thanks,
Raymond.
 
 
 
On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky <arzamas123@...> wrote:

All write operations will be blocked for this timeout : checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :  reason='too many dirty pages' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them). And please attach other part of cp message starting with : Checkpoint finished.


 
In (https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points.
 
I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html where "too many dirty pages" is a reason given for initiating a checkpoint.
 
After reviewing our logs I found this: (one example)
 
2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty pages']  
 
Which suggests we may have the issue where writes are frozen until the check point is completed.
 
Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
 
    /**
     * Threshold to calculate limit for pages list on-heap caches.
     * <p>
     * Note: When a checkpoint is triggered, we need some amount of page memory to store pages list on-heap cache.
     * If a checkpoint is triggered by "too many dirty pages" reason and pages list cache is rather big, we can get
* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total amount of cached page list buckets,
     * assuming that checkpoint will be triggered if no more then 3/4 of pages will be marked as dirty (there will be
     * at least 1/4 of clean pages) and each cached page list bucket can be stored to up to 2 pages (this value is not
     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take
     * more than 2 pages). Also some amount of page memory needed to store page list metadata.
     */
    private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1;
 
This raises two questions: 
 
1. The data region where most writes are occurring has 4Gb allocated to it, though it is permitted to start at a much lower level. 4Gb should be 1,000,000 pages, 10% of which should be 100,000 dirty pages.
 
The 'limit holder' is calculated like this:
 
    /**
     * @return Holder for page list cache limit for given data region.
     */
    public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) {
        if (dataRegion.config().isPersistenceEnabled()) {
            return pageListCacheLimits.computeIfAbsent(dataRegion.config().getName(), name -> new AtomicLong(
                (long)(((PageMemoryEx)dataRegion.pageMemory()).totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
        }
 
        return null;
    }
 
... but I am unsure if totalPages() is referring to the current size of the data region, or the size it is permitted to grow to. ie: Could the 'dirty page limit' be a sliding limit based on the growth of the data region? Is it better to set the initial and maximum sizes of data regions to be the same number?
 
2. We have two data regions, one supporting inbound arrival of data (with low numbers of writes), and one supporting storage of processed results from the arriving data (with many more writes). 
 
The block on writes due to the number of dirty pages appears to affect all data regions, not just the one which has violated the dirty page limit. Is that correct? If so, is this something that can be improved?
 
Thanks,
Raymond.
 
 
On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson <raymond_wilson@...> wrote:
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information.
 
On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas123@...> wrote:

Don`t think so, checkpointing work perfectly well already before this fix.
Need additional info for start digging your problem, can you share ignite logs somewhere?
 
 
I noticed an entry in the Ignite 2.9.1 changelog:
  • Improved checkpoint concurrent behaviour
 
Perhaps this change may improve the checkpointing issue we are seeing?
 
Raymond.
 
 
On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson <raymond_wilson@...> wrote:
Hi Zhenya,
 
1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page size (experiments with larger page sizes showed instability when checkpointing due to free page starvation, so we reverted to the default size). 
 
2. Thanks for the detail, we will look for that in thread dumps when we can create them.
 
3. We are using the default CP buffer size, which is max(256Mb, DataRagionSize / 4) according to the Ignite documentation, so this should have more than enough checkpoint buffer space to cope with writes. As additional information, the cache which is displaying very slow writes is in a data region with relatively slow write traffic. There is a primary (default) data region with large write traffic, and the vast majority of pages being written in a checkpoint will be for that default data region.
 
4. Yes, this is very surprising. Anecdotally from our logs it appears write traffic into the low write traffic cache is blocked during checkpoints.
 
Thanks,
Raymond.
    
 
 
On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas123@...> wrote:
  1. Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce this param.
  2. Log will shows you something like :
    Parking thread=%Thread name% for timeout(ms)= %time%
    and appropriate :
    Unparking thread=
  3. No additional looging with cp buffer usage are provided. cp buffer need to be more than 10% of overall persistent  DataRegions size.
  4. 90 seconds or longer —  Seems like problems in io or system tuning, it`s very bad score i hope. 



 
Hi,
 
We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client.
 
I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process:
 
1. Number of check pointing threads. This defaults to 4, but I don't understand how it applies to the checkpointing process. Are more threads generally better (eg: because it makes the disk IO parallel across the threads), or does it only have a positive effect if you have many data storage regions? Or something else? If this could be clarified in the documentation (or a pointer to it which Google has not yet found), that would be good.
 
2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking that reducing this time would result in smaller less disruptive check points. Setting it to 60 seconds seems pretty safe, but is there a practical lower limit that should be used for use cases with new data constantly being added, eg: 5 seconds, 10 seconds?
 
3. Write exclusivity constraints during checkpointing. I understand that while a checkpoint is occurring ongoing writes will be supported into the caches being check pointed, and if those are writes to existing pages then those will be duplicated into the checkpoint buffer. If this buffer becomes full or stressed then Ignite will throttle, and perhaps block, writes until the checkpoint is complete. If this is the case then Ignite will emit logging (warning or informational?) that writes are being throttled.
 
We have cases where simple puts to caches (a few requests per second) are taking up to 90 seconds to execute when there is an active check point occurring, where the check point has been triggered by the checkpoint timer. When a checkpoint is not occurring the time to do this is usually in the milliseconds. The checkpoints themselves can take 90 seconds or longer, and are updating up to 30,000-40,000 pages, across a pair of data storage regions, one with 4Gb in-memory space allocated (which should be 1,000,000 pages at the standard 4kb page size), and one small region with 128Mb. There is no 'throttling' logging being emitted that we can tell, so the checkpoint buffer (which should be 1Gb for the first data region and 256 Mb for the second smaller region in this case) does not look like it can fill up during the checkpoint.
 
It seems like the checkpoint is affecting the put operations, but I don't understand why that may be given the documented checkpointing process, and the checkpoint itself (at least via Informational logging) is not advertising any restrictions.
 
Thanks,
Raymond.
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 
 
 
 
 
 
 
--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
+64-21-2013317 Mobile
raymond_wilson@...
     
 


 
 
 
 
 


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[hidden email]
12