Efficiency of key queries

classic Classic list List threaded Threaded
3 messages Options
seiferma seiferma
Reply | Threaded
Open this post in threaded view
|

Efficiency of key queries

Dear all,

we are evaluating techniques for finding all keys contained in a cache. So
far, we tried the most simple approaches mentioned in this mailing list like
iterating over all cache entries or creating a SqlFieldsQuery asking for the
_key column. While the results are correct, the performance is not
satisfying if some cache content is not held in memory.

Our setup is as follows: We got a small spring boot application that is
running an embedded ignite instance. We manually activated the cluster
consisting of this single node. We create all caches using the partitioned
mode, so we can modify indexed fields during runtime by issuing sql
statements. We are aware of the fact that using partitioned mode with a
single node is useless but it should not do any harm. We limit the available
heap memory of the application to one gigabyte and pushed more data to the
cache than the heap memory can hold. Therefore, some cache entries are
written to the hard drive. In addition, we enabled persistent storage to
make the cache survive restarts.

What we see is that the query as well as iterating cache entries takes a
long time to complete. While running the queries, we could see some decent
load on the hard drive. This makes sense for iterating cache entries but not
so much for the query. We expected that the query just aks the database for
the content of the (indexed?) _key column, which should not require the
whole entity being loaded. This request should be pretty fast even if most
of the entities are not available in the heap. With less data, the requests
are faster than we could explain by the smaller amount of data.

Could you give us some hints about what we could have done wrong or how we
could retrieve all keys used in a cache in a more efficient way? If you need
more information, I am keen to provide it.

Best regards
Stephan



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Efficiency of key queries

Hi,

In the case of a single node, I don't think that you can speed up this
process. But you can scale your performance by adding new nodes.

It can be done using compute tasks and local SQL queries or local cache
operation. It means that you will be able to run part of your login on every
data node and send to the client only some results.

Read more about compute tasks you can here:

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/compute/ComputeTask.html

The example you can see here:

https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/computegrid/ComputeTaskMapExample.java

Local SqlFieldQuery you can do inside compute task with next flag:

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/query/SqlFieldsQuery.html#setLocal-boolean-

Do local cache operation you can via next methods:

https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#localPeek-K-org.apache.ignite.cache.CachePeekMode...-

BR,
Andrei







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Efficiency of key queries

In reply to this post by seiferma
Hello!

- I don't think Ignite can get indexed columns' values from index. It will load the key-value pair either way.
- Ignite stores key-value pairs in pages together, so when you iterate on _key, all pages will be loaded into offheap.

I think that there is no easy solution for your use case.

Regards,
--
Ilya Kasnacheev


вт, 28 мая 2019 г. в 12:09, seiferma <[hidden email]>:
Dear all,

we are evaluating techniques for finding all keys contained in a cache. So
far, we tried the most simple approaches mentioned in this mailing list like
iterating over all cache entries or creating a SqlFieldsQuery asking for the
_key column. While the results are correct, the performance is not
satisfying if some cache content is not held in memory.

Our setup is as follows: We got a small spring boot application that is
running an embedded ignite instance. We manually activated the cluster
consisting of this single node. We create all caches using the partitioned
mode, so we can modify indexed fields during runtime by issuing sql
statements. We are aware of the fact that using partitioned mode with a
single node is useless but it should not do any harm. We limit the available
heap memory of the application to one gigabyte and pushed more data to the
cache than the heap memory can hold. Therefore, some cache entries are
written to the hard drive. In addition, we enabled persistent storage to
make the cache survive restarts.

What we see is that the query as well as iterating cache entries takes a
long time to complete. While running the queries, we could see some decent
load on the hard drive. This makes sense for iterating cache entries but not
so much for the query. We expected that the query just aks the database for
the content of the (indexed?) _key column, which should not require the
whole entity being loaded. This request should be pretty fast even if most
of the entities are not available in the heap. With less data, the requests
are faster than we could explain by the smaller amount of data.

Could you give us some hints about what we could have done wrong or how we
could retrieve all keys used in a cache in a more efficient way? If you need
more information, I am keen to provide it.

Best regards
Stephan



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/