Sql query performance with partitioned caches

classic Classic list List threaded Threaded
19 messages Options
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Sql query performance with partitioned caches

Hi All,

Is any sql query, executed with partitioned cache, will always be executed on every node where cache is deployed, irrespectively which predicates are used or what indexes are created?
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Correct.

Sergi

2016-01-25 18:39 GMT+03:00 Andrey Nestrogaev <[hidden email]>:
Hi All,

Is any sql query, executed with partitioned cache, will always be executed
on every node where cache is deployed, irrespectively which predicates are
used or what indexes are created?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Yes, this is correct, but this is not a performance issue, this is a performance gain. 

Your data set is split across the nodes and the indexes are also split in the same way. So, if you have 100GB of data and 10 nodes, then each node will end up with about 10GB of data. In this case, Ignite will execute the query across 10 nodes in parallel and each query will execute on 1/10 of the data. Quite often this results in almost linear performance gain.

On Mon, Jan 25, 2016 at 8:59 AM, Sergi Vladykin <[hidden email]> wrote:
Correct.

Sergi

2016-01-25 18:39 GMT+03:00 Andrey Nestrogaev <[hidden email]>:
Hi All,

Is any sql query, executed with partitioned cache, will always be executed
on every node where cache is deployed, irrespectively which predicates are
used or what indexes are created?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi Dmitriy,

How about query that returns or scans less 1% of the rows (or only one row), and the rows that satisfy the query predicates all resides on one node? When we add nodes, such queries become slowly.

Are there any plans for implementing for sql queries any kind of partitions or/and nodes pruning based on predicates?

Or sql queries are only intended for analitic's queries?
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi Andrey,

In case you know that all required data is stored on a single node, you can send a closure to that node that will execute a local query:

// Execute a callable on a node where myAffinityKey is mapped.
ignite.compute().affinityCall("my-cache", myAffinityKey, new IgniteCallable<List<List<?>>>() {
    @Override public List<List<?>> call() throws Exception {
        IgniteCache cache = Ignition.ignite().cache("my-cache");
       
        // Note the setLocal(true) flag.
        return cache.query(new SqlFieldsQuery("select * from ...").setLocal(true)).getAll();
    }
});

Will this work for you?

-Val
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Good point, Valentin! I will add that if you don’t have an affinity key, but simply know which node owns the data, you can also simply create a cluster group for that node and send a computation to it which will run the local query.

On Mon, Jan 25, 2016 at 1:23 PM, vkulichenko <[hidden email]> wrote:
Hi Andrey,

In case you know that all required data is stored on a single node, you can
send a closure to that node that will execute a local query:

// Execute a callable on a node where myAffinityKey is mapped.
ignite.compute().affinityCall("my-cache", myAffinityKey, new
IgniteCallable<List&lt;List&lt;?>>>() {
    @Override public List<List&lt;?>> call() throws Exception {
        IgniteCache cache = Ignition.ignite().cache("my-cache");

        // Note the setLocal(true) flag.
        return cache.query(new SqlFieldsQuery("select * from
...").setLocal(true)).getAll();
    }
});

Will this work for you?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700p2708.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

In reply to this post by vkulichenko
Hi Valentin,

The proposed option is not suitable.

In most cases, the application does not know on which node and in which partition the data resides, as it is the physical layer of data model, which can change over time (application only knows the logical data model).

The application simply execute a query using a set of predicates. And it would be good if the sql engine could on the basis of data on the query predicate and some metadata model, build a list of nodes, and partitions to fulfill the request.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Andrey,

I'm not sure I understand the use case. What is physical and logical data model? How do they differ from each other? How will Ignite know where to execute the query if even the application doesn't know?

It would be great if you provide a small example of such a data model and a query that has to be executed on a subset of nodes.

-Val
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Valentin,

Application knows that there is a cache with name "myCache" - this is a logical model.
Cache "myCache" can be replicated or partitioned, it can be deployed to one ore more nodes - this is physical model.
Application doesn't know physical model. It's only call cache.get(key), and ignite engine using metainfo and affinity functionality determines on wich node and partition the value is resides.

That's opportunity could be done for sql engine too (most likely it will require some revision of affinty's functionality).

Is it now clear or still need examples with sql queries?
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi Andrey,

The cache is created by the application and it defines on which nodes it's deployed (all server nodes by default or according to provided node filter). Also Ignite provides Affinity API that gives you information about data locality, as well as the ways to collocate data with data and computations with the data (as in my example above). For more information refer to [1].

In other words, you have a lot of control on where the data is stored and where the computations are executed. And I can't imagine a use case where you don't need to broadcast the query, but existing affinity capabilities doesn't help. If you have such a use case in mind, the example would be useful.

[1] https://apacheignite.readme.io/docs/affinity-collocation

-Val
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi Valentin,

1. We have partitioned cache "myCache" where store entities like Account(id, name)
2. key = id e.g. cache.put(1, new Account(1,"account 1"));
3. Assume I execute throuth jdbc query "select * from account where id=1", i want that query doesn't broadcast on every node.
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi Andrey,

In terms of Ignite or any other in-memory product there is no reason to
use an SQL query there cause when you need to get an object(s) by key(s)
you should use cache.get(key)/cache.getKeys(key) that sends request(s)
only to primary node(s).
That's it and I wouldn't make SQL engine more complex just in order to
detect such queries that can be optimized because a developer can take
care of this.

Regards,
Denis

On 1/27/2016 10:15 AM, Andrey Nestrogaev wrote:

> Hi Valentin,
>
> 1. We have partitioned cache "myCache" where store entities like Account(id,
> name)
> 2. key = id e.g. cache.put(1, new Account(1,"account 1"));
> 3. Assume I execute throuth jdbc query "select * from account where id=1", i
> want that query doesn't broadcast on every node.
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700p2734.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Denis,

Do not take the examples as the real problem.
In real application sql can be more complex like this:

select a.*
  from account a
  join account_details b
    on (b.id = a.id)
 where b.type = 'TTT'
   and (select nvl(sum(amount),0) from account_sums s where s.id = a.id) > 0
   and a.id = 1

Do you recommend to replace all such queries with java code using native cache's api?

Thus it appears that ignite sql inginte with partitioned caches can't scale for OLTP queries (every query runs on every node), it only will good serves for DWH workload?

Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Andrey,

In my previous reply I didn't mean that all the queries should be
replaced with native cache.get() calls.
In opposite I wanted to show that there is no need to parse a simple
query on sender side trying to optimize it by sending only to particular
nodes because majority of queries are more complex (like in your example
below).

Exactly because in most the cases queries are complex they all are sent
to all the nodes where they're actually executed.

However, it doesn't mean that Ignite SQL doesn't suite for OLTP or DWH.
It costs almost nothing to send a query to a remote node.
The biggest factor here is cluster size, actual type of query, proper
indexes usage, result set that is returned from every node.

--
Denis

On 1/27/2016 3:38 PM, Andrey Nestrogaev wrote:

> Denis,
>
> Do not take the examples as the real problem.
> In real application sql can be more complex like this:
>
> select a.*
>    from account a
>    join account_details b
>      on (b.id = a.id)
>   where b.type = 'TTT'
>     and (select nvl(sum(amount),0) from account_sums s where s.id = a.id) > 0
>     and a.id = 1
>
> Do you recommend to replace all such queries with java code using native
> cache's api?
>
> Thus it appears that ignite sql inginte with partitioned caches can't scale
> for OLTP queries (every query runs on every node), it only will good serves
> for DWH workload?
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700p2742.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Denis,

I don't agree that extra newtork roundtreeps "costs almost nothing" for heavy OLTP load :)

But I get your point and think that question is closed.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Andrey,

I think Denis meant that if there is no data for a provided ID on the node, query execution time will be close to nothing (of course, assuming that there is a proper index). Network utilization is minimal for these requests as well, because there are no results going back. In other words, most likely this will not slow you down, because you will wait for the slowest node anyway (in this case that's the node that has data).

But even this can be optimized with the approach I described earlier. You can use IgniteCompute.affinityCall() to send a closure to the node that stores the data for the ID and execute local query there. This way you avoid meaningless network trips to other nodes.

Makes sense?

-Val
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

In reply to this post by Andrey Nestrogaev

On Wed, Jan 27, 2016 at 5:34 AM, Andrey Nestrogaev <[hidden email]> wrote:
Denis,

I don't agree that extra newtork roundtreeps "costs almost nothing" for
heavy OLTP load :)

I disagree, because many large companies, including some very well known banks, use Ignite exactly for OLTP workloads. If you look at the data grid market, then you will quickly realize that no other data grid vendor or project comes even close to the SQL capabilities provided by Ignite.

Again, Ignite already supports non-collocated SQL (will be released in version 1.6 as part of IGNITE-1232), but the performance of the non-collocated joins may be worse than for the collocated joins depending on the result sets.

In your design, you need to aim to achieve about +50% collocation for your queries. Generally, the more collocated queries you have, the better the performance will be. However, it is OK to have a smaller number of queries, or the queries that are not executed often, as non-collocated queries.

D.
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches

Hi All,

"use Ignite exactly for OLTP workloads"
I mean SQL OLTP workloads.
I have no doubt that the OLTP workloads using direct cache API calls ignite scales perfectly.

But not to argue in vain, I will use the well-tested method - benchmarking.

Thank you for your responses.
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Sql query performance with partitioned caches



On Wed, Jan 27, 2016 at 11:09 PM, Andrey Nestrogaev <[hidden email]> wrote:
Hi All,

"use Ignite exactly for OLTP workloads"
I mean SQL OLTP workloads.
I have no doubt that the OLTP workloads using direct cache API calls ignite
scales perfectly.

I also meant SQL.  


But not to argue in vain, I will use the well-tested method - benchmarking.

We use Yardstick framework for benchmarking. You can take a look at some of the existing benchmarks here: https://ignite.apache.org/benchmarks/ignite-vs-hazelcast.html
 

Thank you for your responses.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Sql-query-performance-with-partitioned-caches-tp2700p2751.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.