Binary memory restoration

classic Classic list List threaded Threaded
8 messages Options
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Binary memory restoration

Just a general query on Ignite server node startup sequence.

There are a pair of log lines emitted during node startup that look like this:

2020-09-27 20:48:43,191 [1] INF [] Starting binary memory restore for: [948695530, -1026154091, -2049124072, -552847479, -2024686410, -1820754607, -1907049797, -1029472048, 671082124, -493080817, -2100569601, 928822533]  

and

2020-09-27 20:49:13,025 [1] INF []   Binary memory state restored at node startup [restoredPtr=FileWALPointer [idx=1033, fileOff=485201080, len=0]]  

 This particular example required ~30 seconds to complete. 

We are running Ignite 2.8.1 using the C# client with persistence enabled. 

Do these messages indicate Ignite is reloading the memory state of the caches at the point it was shutdown? If not, what is this activity?

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

There is a summary log line that also references the time taken by the 'restore binary memory' stage. It also references another stage called 'restore logical state'.

eg:

2020-09-27 20:50:31,423 [1] INF [] Node started : [stage="Configure system pool" (44 ms),stage="Start managers" (2422 ms),stage="Configure binary metadata" (487 ms),stage="Start processors" (1089 ms),stage="Init metastore" (726 ms),stage="Init and start regions" (21 ms),stage="Restore binary memory" (30289 ms),stage="Restore logical state" (76551 ms),stage="Finish recovery" (201 ms),stage="Join topology" (840 ms),stage="Await transition" (136 ms),stage="Await exchange" (211 ms),stage="Total time" (113017 ms)] 

The restore logical state stage takes 76 seconds in this example.

Currently we have around 10 caches with less than 20 Gb of data. Do these startup times seem reasonable for a node that is shut down gracefully?

Thanks,
Raymond.


On Tue, Sep 29, 2020 at 11:20 AM Raymond Wilson <[hidden email]> wrote:
Just a general query on Ignite server node startup sequence.

There are a pair of log lines emitted during node startup that look like this:

2020-09-27 20:48:43,191 [1] INF [] Starting binary memory restore for: [948695530, -1026154091, -2049124072, -552847479, -2024686410, -1820754607, -1907049797, -1029472048, 671082124, -493080817, -2100569601, 928822533]  

and

2020-09-27 20:49:13,025 [1] INF []   Binary memory state restored at node startup [restoredPtr=FileWALPointer [idx=1033, fileOff=485201080, len=0]]  

 This particular example required ~30 seconds to complete. 

We are running Ignite 2.8.1 using the C# client with persistence enabled. 

Do these messages indicate Ignite is reloading the memory state of the caches at the point it was shutdown? If not, what is this activity?

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

Hello!

I guess it is the time to iterate WAL after the last checkpoint to make sure all operations prior to node stop are accounted for.

Every DB will do that and the time of 76 seconds seems reasonable.

Regards,
--
Ilya Kasnacheev


вт, 29 сент. 2020 г. в 02:00, Raymond Wilson <[hidden email]>:
There is a summary log line that also references the time taken by the 'restore binary memory' stage. It also references another stage called 'restore logical state'.

eg:

2020-09-27 20:50:31,423 [1] INF [] Node started : [stage="Configure system pool" (44 ms),stage="Start managers" (2422 ms),stage="Configure binary metadata" (487 ms),stage="Start processors" (1089 ms),stage="Init metastore" (726 ms),stage="Init and start regions" (21 ms),stage="Restore binary memory" (30289 ms),stage="Restore logical state" (76551 ms),stage="Finish recovery" (201 ms),stage="Join topology" (840 ms),stage="Await transition" (136 ms),stage="Await exchange" (211 ms),stage="Total time" (113017 ms)] 

The restore logical state stage takes 76 seconds in this example.

Currently we have around 10 caches with less than 20 Gb of data. Do these startup times seem reasonable for a node that is shut down gracefully?

Thanks,
Raymond.


On Tue, Sep 29, 2020 at 11:20 AM Raymond Wilson <[hidden email]> wrote:
Just a general query on Ignite server node startup sequence.

There are a pair of log lines emitted during node startup that look like this:

2020-09-27 20:48:43,191 [1] INF [] Starting binary memory restore for: [948695530, -1026154091, -2049124072, -552847479, -2024686410, -1820754607, -1907049797, -1029472048, 671082124, -493080817, -2100569601, 928822533]  

and

2020-09-27 20:49:13,025 [1] INF []   Binary memory state restored at node startup [restoredPtr=FileWALPointer [idx=1033, fileOff=485201080, len=0]]  

 This particular example required ~30 seconds to complete. 

We are running Ignite 2.8.1 using the C# client with persistence enabled. 

Do these messages indicate Ignite is reloading the memory state of the caches at the point it was shutdown? If not, what is this activity?

Thanks,
Raymond.

--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Vladimir Pligin Vladimir Pligin
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

It's possible that it happens because of
https://issues.apache.org/jira/browse/IGNITE-13068.
We need to scan the entire SQL primary index during startup in case you have
at least on query entity configured.
As far as I can see it's going to be a part of the Ignite 2.9 release.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

Thanks for the thoughts Ilya and Vladimir. 

We'll do a comparison with 2.9 when it releases to see if that makes any difference.

One of the advantages with persistent storage is that it is effectively 'instant start'. Our WAL size is around 5Gb, perhaps this should be decreased to reduce system start-up time?

Thanks,
Raymond.

On Wed, Sep 30, 2020 at 7:31 AM Vladimir Pligin <[hidden email]> wrote:
It's possible that it happens because of
https://issues.apache.org/jira/browse/IGNITE-13068.
We need to scan the entire SQL primary index during startup in case you have
at least on query entity configured.
As far as I can see it's going to be a part of the Ignite 2.9 release.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

Hello!

I think the real saver is in decreasing amount of time between checkpoints.

Regards,
--
Ilya Kasnacheev


вт, 6 окт. 2020 г. в 02:10, Raymond Wilson <[hidden email]>:
Thanks for the thoughts Ilya and Vladimir. 

We'll do a comparison with 2.9 when it releases to see if that makes any difference.

One of the advantages with persistent storage is that it is effectively 'instant start'. Our WAL size is around 5Gb, perhaps this should be decreased to reduce system start-up time?

Thanks,
Raymond.

On Wed, Sep 30, 2020 at 7:31 AM Vladimir Pligin <[hidden email]> wrote:
It's possible that it happens because of
https://issues.apache.org/jira/browse/IGNITE-13068.
We need to scan the entire SQL primary index during startup in case you have
at least on query entity configured.
As far as I can see it's going to be a part of the Ignite 2.9 release.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
Raymond Wilson Raymond Wilson
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

Ilya,

Does 2.9 have specific optimizations for checkpoints?

Thanks,
Raymond.

On Fri, Oct 9, 2020 at 1:23 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think the real saver is in decreasing amount of time between checkpoints.

Regards,
--
Ilya Kasnacheev


вт, 6 окт. 2020 г. в 02:10, Raymond Wilson <[hidden email]>:
Thanks for the thoughts Ilya and Vladimir. 

We'll do a comparison with 2.9 when it releases to see if that makes any difference.

One of the advantages with persistent storage is that it is effectively 'instant start'. Our WAL size is around 5Gb, perhaps this should be decreased to reduce system start-up time?

Thanks,
Raymond.

On Wed, Sep 30, 2020 at 7:31 AM Vladimir Pligin <[hidden email]> wrote:
It's possible that it happens because of
https://issues.apache.org/jira/browse/IGNITE-13068.
We need to scan the entire SQL primary index during startup in case you have
at least on query entity configured.
As far as I can see it's going to be a part of the Ignite 2.9 release.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Binary memory restoration

Hello!

You do not need any optimizations here since the time between checkpoints is configurable.

More checkpoints - faster recovery but more data needs to be written to disk.
Fewer checkpoints - recovery will take more time but overall performance is somewhat higher.

Regards,
--
Ilya Kasnacheev


пт, 9 окт. 2020 г. в 07:05, Raymond Wilson <[hidden email]>:
Ilya,

Does 2.9 have specific optimizations for checkpoints?

Thanks,
Raymond.

On Fri, Oct 9, 2020 at 1:23 AM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

I think the real saver is in decreasing amount of time between checkpoints.

Regards,
--
Ilya Kasnacheev


вт, 6 окт. 2020 г. в 02:10, Raymond Wilson <[hidden email]>:
Thanks for the thoughts Ilya and Vladimir. 

We'll do a comparison with 2.9 when it releases to see if that makes any difference.

One of the advantages with persistent storage is that it is effectively 'instant start'. Our WAL size is around 5Gb, perhaps this should be decreased to reduce system start-up time?

Thanks,
Raymond.

On Wed, Sep 30, 2020 at 7:31 AM Vladimir Pligin <[hidden email]> wrote:
It's possible that it happens because of
https://issues.apache.org/jira/browse/IGNITE-13068.
We need to scan the entire SQL primary index during startup in case you have
at least on query entity configured.
As far as I can see it's going to be a part of the Ignite 2.9 release.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]


--

Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
<a href="tel:+64-21-2013317" style="background-color:transparent;color:rgb(54,53,69)" target="_blank">+64-21-2013317 Mobile
[hidden email]