Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

classic Classic list List threaded Threaded
6 messages Options
gourav10041996 gourav10041996
Reply | Threaded
Open this post in threaded view
|

Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

To simplify our use-case, we created two caches using the SQL query and
loaded data consisting of about 4 million records and 60k records
approximately, in the respective caches with INDEX created on all the
columns. Ignite is set up to run on a single node, meaning all the data is
present on the same node. The query used for testing/the one we are facing
issue with is of the type -

SELECT * FROM CACHE1 C1, CACHE2 C2  WHERE  C1.JOINCol = C2.JOINCol AND
C1.COL1 = 'someValue' ORDER BY C1.COL2

The above query execution leads to the Ignite thread memory rising
extensively, eventually leading to heap OOME. When the heap memory was
increased to about 14GB,  we were able to get the results back, but the
processing time of the query was too long, about 2-4 minutes ( with CPUs
=2).

We ran an EXPLAIN for the above query and found out that INDEX was created
on COL1 for C1 cache and on JOINCol for C2 cache. There was no index on the
sorted column. We think the problem of 'slow querying and huge heap memory
requirement' is because of the absence of an index in the sorted column.
Whenever there is a condition present in the WHERE clause ( in our example
C1.COL1='someValue'), Ignite is using an INDEX for that column and there is
no INDEX being created on the ORDER BY column.

And for our use-case, it is imperative that we have a condition in the where
clause ( to filter out the data) and a join condition apart from the order
by clause.

 We tried the multiple column indexing strategy on the COL1, COL2 as per our
use case.

 In case of a composite index with the order as (COL1, COL2), INDEX was
created only for the COL1.

While for the composite index order as (COL2, COL1), INDEX was getting
created for both COL1 and COL2 and the results were index sorted. ( But only
in case of the absence of an INDEX for COL1, it looks for the ORDER BY
clause column and uses a composite index). But, if we don't have a separate
INDEX for COL1, it again poses a problem as COL1 is something which is
heavily used for filtering in all other queries. So an INDEX on COL1 is
necessary.

To summarize, In case there is a condition present in the WHERE clause,
Ignite uses the WHERE clause column for indexing, and therefore there is no
INDEX in the sorting column, resulting in severe query performance, which
can eventually lead us to our system going down.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

Hello!

Frankly speaking I got lost in your verbal description of tables and indexes. Can you please provide actual cache configurations or CREATE TABLE statements (obfuscated if needed so)?

Otherwise I guess that query planner expects you won't have too much entries after WHERE so it's more important to choose the one used in WHERE (or in JOIN).

Regards,
--
Ilya Kasnacheev


чт, 24 янв. 2019 г. в 18:10, gourav10041996 <[hidden email]>:
To simplify our use-case, we created two caches using the SQL query and
loaded data consisting of about 4 million records and 60k records
approximately, in the respective caches with INDEX created on all the
columns. Ignite is set up to run on a single node, meaning all the data is
present on the same node. The query used for testing/the one we are facing
issue with is of the type -

SELECT * FROM CACHE1 C1, CACHE2 C2  WHERE  C1.JOINCol = C2.JOINCol AND
C1.COL1 = 'someValue' ORDER BY C1.COL2

The above query execution leads to the Ignite thread memory rising
extensively, eventually leading to heap OOME. When the heap memory was
increased to about 14GB,  we were able to get the results back, but the
processing time of the query was too long, about 2-4 minutes ( with CPUs
=2).

We ran an EXPLAIN for the above query and found out that INDEX was created
on COL1 for C1 cache and on JOINCol for C2 cache. There was no index on the
sorted column. We think the problem of 'slow querying and huge heap memory
requirement' is because of the absence of an index in the sorted column.
Whenever there is a condition present in the WHERE clause ( in our example
C1.COL1='someValue'), Ignite is using an INDEX for that column and there is
no INDEX being created on the ORDER BY column.

And for our use-case, it is imperative that we have a condition in the where
clause ( to filter out the data) and a join condition apart from the order
by clause.

 We tried the multiple column indexing strategy on the COL1, COL2 as per our
use case.

 In case of a composite index with the order as (COL1, COL2), INDEX was
created only for the COL1.

While for the composite index order as (COL2, COL1), INDEX was getting
created for both COL1 and COL2 and the results were index sorted. ( But only
in case of the absence of an INDEX for COL1, it looks for the ORDER BY
clause column and uses a composite index). But, if we don't have a separate
INDEX for COL1, it again poses a problem as COL1 is something which is
heavily used for filtering in all other queries. So an INDEX on COL1 is
necessary.

To summarize, In case there is a condition present in the WHERE clause,
Ignite uses the WHERE clause column for indexing, and therefore there is no
INDEX in the sorting column, resulting in severe query performance, which
can eventually lead us to our system going down.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
gourav10041996 gourav10041996
Reply | Threaded
Open this post in threaded view
|

Re: Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

We are facing a severe performance issue with Ignite SQL select JOIN queries
when using Order By clause.
This occurs when the column used for Order By is not one of the columns in
the where clause.  

SELECT *
FROM  CACHE1 C1, CACHE2 C2  
WHERE C1.JOINCol = C2.JOINCol AND C1.COL1 = 'someValue'
ORDER BY C1.COL2

In this situation, it looks like the index available on the order by column
(C1.COL2) is not used by SQL engine.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

Hello!

During query Ignite can only use single index per table. Meaning that if an index is already used for join, a different index will not be used for order by.

You can try setting up a multi-column index to do both join and order.

Regards,
--
Ilya Kasnacheev


вт, 5 февр. 2019 г. в 10:07, gourav10041996 <[hidden email]>:
We are facing a severe performance issue with Ignite SQL select JOIN queries
when using Order By clause.
This occurs when the column used for Order By is not one of the columns in
the where clause. 

SELECT *
FROM  CACHE1 C1, CACHE2 C2 
WHERE C1.JOINCol = C2.JOINCol AND C1.COL1 = 'someValue'
ORDER BY C1.COL2

In this situation, it looks like the index available on the order by column
(C1.COL2) is not used by SQL engine.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
gourav10041996 gourav10041996
Reply | Threaded
Open this post in threaded view
|

Re: Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

Hi Ilya,

We have experimented with the multi-column index. In case of a composite
index on (COL1, COL2), the index is only getting applied for COL1. Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Performance problems while running SQL query involving JOINS and ORDER BY eventually leading to heap OOME.

Hello!

Can you provide schema for tables as well as query so I could look?

Regards,
--
Ilya Kasnacheev


пт, 8 февр. 2019 г. в 11:19, gourav10041996 <[hidden email]>:
Hi Ilya,

We have experimented with the multi-column index. In case of a composite
index on (COL1, COL2), the index is only getting applied for COL1. Thanks.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/