Ignite query performance with lots of joins

classic Classic list List threaded Threaded
7 messages Options
spoutnik_be spoutnik_be
Reply | Threaded
Open this post in threaded view
|

Ignite query performance with lots of joins

Hi,

I posted a message related to  H2 & Ignite performances
<https://stackoverflow.com/questions/57902194/h2-ignite-query-performance>  
on stackoverflow.

As the main issue is about a difference in execution times between H2 &
Ignite so I thought it would be good to relay it here.

Thanks for your support,
L.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
spoutnik_be spoutnik_be
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

@Daniel Magda: following you comment on Stackoverflow, continuing the
discussion here.

This is most probably related to the number of joins, we'll evolve to a less
normalized model for querying from Ignite.

Couple of questions still remain:
- How many joins can Ignite/H2 handle without significant performance drop
(read something about H2 CBO out if more than 7 joins)?
- How to correctly interpret explain plan, especially with ANALYZE option
and its scanCount?
- Does order of query matters, Oracle CBO uses statistics to re-order joins
while this is not visible in H2 explain plans.

Any hints/info welcome ...

Thanks,
L.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
spoutnik_be spoutnik_be
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

In reply to this post by spoutnik_be
About the quote on Stackoverflow:
"Ignite/GridGain is optimized for multi-nodes deployments with RAM as a
primary storage. Don’t try to compare a single-node GridGain cluster to a
relational database that was optimized for such single-node configurations.
You should deploy a multi-node GridGain cluster with the whole copy of data
in RAM."

Not sure how to interpret the above statement. The support for SQL is an
attractive feature of Ignite/Gridgain, but if it doesn't perform on a single
node with little data I don't see how it will perform on a multi-node
cluster.

What would be then your recommendation? Should we implement a SQL converter
to translate queries into something else Ignite could run faster?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

Ignite SQL engine cannot be considered as a competitor (in a single-node scenario) to old-good RDBMS like Postgres or MySQL. Those DBs have been being developed for decades and are optimized for single-machine deployments. While our efforts (Ignite community) were put into distributed optimizations when an application has to scale out and utilize RAM in the distributed fashion. That's why we have that recommendation to avoid any misunderstanding. Probably, it needs to be rewritten a bit for more clarity.


As for your specific case, I think that a primary bottleneck is a number of JOINs. It makes sense to rewrite this query first and then scale-out for bigger benefits. 

Ivan Pavluknin, Stan Lukyanov, Pavel Vinokurov, could you folks please check this thread out and this repo suggesting any optimizations?


-
Denis


On Mon, Sep 16, 2019 at 6:08 AM spoutnik_be <[hidden email]> wrote:
About the quote on Stackoverflow:
"Ignite/GridGain is optimized for multi-nodes deployments with RAM as a
primary storage. Don’t try to compare a single-node GridGain cluster to a
relational database that was optimized for such single-node configurations.
You should deploy a multi-node GridGain cluster with the whole copy of data
in RAM."

Not sure how to interpret the above statement. The support for SQL is an
attractive feature of Ignite/Gridgain, but if it doesn't perform on a single
node with little data I don't see how it will perform on a multi-node
cluster.

What would be then your recommendation? Should we implement a SQL converter
to translate queries into something else Ignite could run faster?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
spoutnik_be spoutnik_be
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

Unfortunately, I am nowhere near Silicon Valley these days ;-)

Any update on possible optimizations that could bring us somewhat closer
than H2 timings?

Thanks, L.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Ivan Pavlukhin Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

Hi,

I checked provided test data. I was able to speedup a query execution
with Ignite about 2 times on my machine by using extra configuration
property System.setProperty("IGNITE_MAX_INDEX_PAYLOAD_SIZE", "256");
See a following documentation section about configuring index inline
size [1]. You can try the same in your environment. Shortly inline
size is needed to tune an indexed search speed. By default ignite
index pages can contain very limited pieces of indexed values (default
inline size is 10 bytes). If indexed values do not fit inline size
then actual values will be searched in another page (data page). It
can lead to a performance degradation.

> Not sure how to interpret the above statement. The support for SQL is an attractive feature of Ignite/Gridgain, but if it doesn't perform on a single node with little data I don't see how it will perform on a multi-node cluster.

Actually data distribution is a tradeoff. And usually it sounds as
"doing more work with more resources". And a gain here is not linear.
But as final result you can reach higher throughput by adding more
computational resources. Of course it depends on a particular
workload. Complex joins might be not good candidate here.

[1] https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-increasing-index-inline-size

ср, 18 сент. 2019 г. в 11:03, spoutnik_be <[hidden email]>:

>
> Unfortunately, I am nowhere near Silicon Valley these days ;-)
>
> Any update on possible optimizations that could bring us somewhat closer
> than H2 timings?
>
> Thanks, L.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



--
Best regards,
Ivan Pavlukhin
spoutnik_be spoutnik_be
Reply | Threaded
Open this post in threaded view
|

Re: Ignite query performance with lots of joins

Hi,

Bad news is that there is no short term solution, good news are that the
issue/case is acknowledged and that Ignite is moving forward.

See approach:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/