Ignite 2.7 Persistence

classic Classic list List threaded Threaded
7 messages Options
gweiske gweiske
Reply | Threaded
Open this post in threaded view
|

Ignite 2.7 Persistence

I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT <column> FROM
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow?
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ggwiebe ggwiebe
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.7 Persistence

I am new to Ignite, but as I understand it, after cluster restart, data is re-hydrated into memory as the nodes receive requests for their partitions' entries. So, a first query would be as slow as a distributed disk-based query. Subsequent queries should have some (depending on memory available) information in memory and thus faster. 

So, my question, is this the first query execution since startup?
Given that you have sufficient memory to hold this particular cache, I would expect subsequent query executions to take advantage of memory resident query processing.

Additionally I have done a quick look (but could not find) at whether Ignite caches in memory store aggregates (like counts) which may be able to be returned without reading actual data as here.

Good luck!

On Tue, Jan 8, 2019 at 7:55 AM gweiske <[hidden email]> wrote:
I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT <column> FROM
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow?
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Ignite 2.7 Persistence

Hi,

 

That’s right, Ignite nodes restart “cold” meaning that they become operational without the data in the RAM.

It allows to restart as quickly as possible, but the price is that the first operations have to load data from the disk, meaning that the performance will be much lower.

 

Here is a ticket to allow turn on a “hot restart” mode - https://issues.apache.org/jira/browse/IGNITE-10152.

There is also an improvement that allows to manually load data of a specific partition in an efficient way - https://issues.apache.org/jira/browse/IGNITE-8873. If you iterate over all partitions after the node start it may shorten the warmup period.

 

Stan

 

From: [hidden email]
Sent: 8 января 2019 г. 18:02
To: [hidden email]
Subject: Re: Ignite 2.7 Persistence

 

I am new to Ignite, but as I understand it, after cluster restart, data is re-hydrated into memory as the nodes receive requests for their partitions' entries. So, a first query would be as slow as a distributed disk-based query. Subsequent queries should have some (depending on memory available) information in memory and thus faster. 

 

So, my question, is this the first query execution since startup?

Given that you have sufficient memory to hold this particular cache, I would expect subsequent query executions to take advantage of memory resident query processing.

 

Additionally I have done a quick look (but could not find) at whether Ignite caches in memory store aggregates (like counts) which may be able to be returned without reading actual data as here.

 

Good luck!

 

On Tue, Jan 8, 2019 at 7:55 AM gweiske <[hidden email]> wrote:

I am using Ignite 2.7 with persistence enabled on a single VM with 128 GB RAM
in Azure and separate external HDD drives each for wal, walarchive and
storage. I loaded 20 GB of data/50,000,000 rows, then shut down Ignite and
restarted the hosting VM, started and activated Ignite and ran a simple
query
that requires sorting through all the data (SELECT DISTINCT <column> FROM
;). The query has been running for hours now. Looking at the memory, instead
of the expected ~42 GB it is currently at 5.7GB (*slowly* increasing). Any
ideas why it might be that slow?
The same scenario with SSD drives (this time 1 drive for wal and walarchive,
a second one for storage) finishes in about 5500 seconds (still slow).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

gweiske gweiske
Reply | Threaded
Open this post in threaded view
|

RE: Ignite 2.7 Persistence

Thanks for the replies. Yes, subsequent queries are faster, but the time to
run the query the first time (i.e. load the data into memory) after a
restart can be measured in hours and is significantly longer than loading
the data from a csv file. That does not seem right.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

RE: Ignite 2.7 Persistence

Running the query the first time isn’t really like loading all data into memory and then doing the query. I would assume that

it is much less efficient – all kinds of locking and contention may be involved. Also, the reads are done via random disk access, while when reading from

CSV you’re reading sequentially.

 

I assume that there are ways to make queries on a cold storage more efficient.

One would probably need to spend a lot of time on that collecting and analyzing JFRs and other profiling data.

On the other hand, having an ability to do a hot restart will probably solve the issue for most users.

 

Stan

 

From: [hidden email]
Sent: 11 января 2019 г. 2:03
To: [hidden email]
Subject: RE: Ignite 2.7 Persistence

 

Thanks for the replies. Yes, subsequent queries are faster, but the time to

run the query the first time (i.e. load the data into memory) after a

restart can be measured in hours and is significantly longer than loading

the data from a csv file. That does not seem right.

 

 

 

 

--

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

gweiske gweiske
Reply | Threaded
Open this post in threaded view
|

RE: Ignite 2.7 Persistence

Is there a command that one can/needs to run to load the data into memory
after restart of Ignite? The documentation suggests that at least for 2.7
that is not necessary, and I have not found a command that would start the
loading into memory from persistence. It looks like one can write some Java
code, but it seems such basic functionality that I thought that there should
be a shell command.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dilaz03 dilaz03
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.7 Persistence

Some java code that helps me on node startup:

// Call for each partition in parallel
private void preloadPartition(int partition) {
        IgniteCache<String, BinaryObject> cache = ignite
                .cache("test_cache")
                .withKeepBinary();

        ScanQuery<String, BinaryObject> query = new
ScanQuery<>(partition, (k, v) -> {
            return false;
        });
        query.setLocal(true);

        try (QueryCursor<Cache.Entry<String, BinaryObject>> cursor =
cache.query(query)) {
            for (@SuppressWarnings("unused") Cache.Entry<String,
BinaryObject> row  : cursor) {
                // empty
            }
        }
    }

// Call for each index
private void preloadIndex(String index) {
    // Use sql query which uses index and contains falsy-condition
}

PS. My memory region is bigger than total data size.

On 1/11/19 18:20, gweiske wrote:

> Is there a command that one can/needs to run to load the data into memory
> after restart of Ignite? The documentation suggests that at least for 2.7
> that is not necessary, and I have not found a command that would start the
> loading into memory from persistence. It looks like one can write some Java
> code, but it seems such basic functionality that I thought that there should
> be a shell command.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/