Slow SQL query when joining on a REPLICATED Cache

classic Classic list List threaded Threaded
3 messages Options
wiesenfe wiesenfe
Reply | Threaded
Open this post in threaded view
|

Slow SQL query when joining on a REPLICATED Cache

Good afternoon,

I am facing a strange performance issue when doing SQL queries on my
cluster.
Here is the scenario (I cannot use real config etc because this source code
is protected):

I have 3 caches (a subset of a STAR schema).

CACHE1 is the fact table: EVENTS. It is a partitioned cache. It has an
affinityKey on USER ID.
CACHE2 is the user table: USERS. It is a partitioned cache. It has an
affinityKey on USER ID as well.
CACHE3 is the document table: DOCS. It is a replicated cache.

I also have the following config:

Every event from the event table has one USER ID and one DOCUMENT ID.
All the columns are indexed.
I run the query with setLocal(true) and setEnforceJoinOrder(true).



I would like to do the following:

1. SELECT *
2. FROM EVENTS

3.     INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
4.     INNER JOIN DOCS  ON EVENTS.DOC_ID   = DOCS.DOC_ID

5. WHERE   DOCS.COL1 = 'some filter'
6. AND       USERS.COL2 = 'some other filter'


Here is what I observe:

When I run the query without line 5 (filter on documents), it is instant.
When I run the query with line 5 (both filters), it is 20X slower. (Even
though those filters are on indexed columns)- The EXPLAIN form the logs
indicates lots of scans.

If I run the query as so (syntax is not exact but the idea is there):

SELECT *
FROM (

          SELECT USERS.*, DOCS.*

          FROM EVENTS

               INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
               INNER JOIN DOCS  ON EVENTS.DOC_ID   = DOCS.DOC_ID

          WHERE   DOCS.COL1 = 'some filter'

)

WHERE USERS.COL2 = 'some other filter'

I have the expected performances again.


It seems that the index on COL2 is being ignored when doing two joins and a
filter on each table.
What do you think about it ?


Thank yo very much !
Kind regards
Emmanuel










--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Taras Ledkov Taras Ledkov
Reply | Threaded
Open this post in threaded view
|

Re: Slow SQL query when joining on a REPLICATED Cache

Hi,

Please provide the plans for specified cases:
- original plan
- without condition
- with subquery (optional)

Also please provide the indexes schemas.

I suspect a not optimal index may be chosen.
e.g.
There are indexed:
idx_doc_col1  and idx_doc_doc_id

idx_doc_col1 - may be chosen because there is the EQ condition.

I guess you can try to create composite index for col1 and doc_id and
make the performance better then expected %)
e.g.:

CREATE INDEX IDX_DOCS_DOC_ID_COL1 ON DOCS(DOC_ID, COL1)

On 31.03.2021 17:39, wiesenfe wrote:

> Good afternoon,
>
> I am facing a strange performance issue when doing SQL queries on my
> cluster.
> Here is the scenario (I cannot use real config etc because this source code
> is protected):
>
> I have 3 caches (a subset of a STAR schema).
>
> CACHE1 is the fact table: EVENTS. It is a partitioned cache. It has an
> affinityKey on USER ID.
> CACHE2 is the user table: USERS. It is a partitioned cache. It has an
> affinityKey on USER ID as well.
> CACHE3 is the document table: DOCS. It is a replicated cache.
>
> I also have the following config:
>
> Every event from the event table has one USER ID and one DOCUMENT ID.
> All the columns are indexed.
> I run the query with setLocal(true) and setEnforceJoinOrder(true).
>
>
>
> I would like to do the following:
>
> 1. SELECT *
> 2. FROM EVENTS
>
> 3.     INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
> 4.     INNER JOIN DOCS  ON EVENTS.DOC_ID   = DOCS.DOC_ID
>
> 5. WHERE   DOCS.COL1 = 'some filter'
> 6. AND       USERS.COL2 = 'some other filter'
>
>
> Here is what I observe:
>
> When I run the query without line 5 (filter on documents), it is instant.
> When I run the query with line 5 (both filters), it is 20X slower. (Even
> though those filters are on indexed columns)- The EXPLAIN form the logs
> indicates lots of scans.
>
> If I run the query as so (syntax is not exact but the idea is there):
>
> SELECT *
> FROM (
>
>            SELECT USERS.*, DOCS.*
>
>            FROM EVENTS
>
>                 INNER JOIN USERS ON EVENTS.USER_ID = USERS.USER_ID
>                 INNER JOIN DOCS  ON EVENTS.DOC_ID   = DOCS.DOC_ID
>
>            WHERE   DOCS.COL1 = 'some filter'
>
> )
>
> WHERE USERS.COL2 = 'some other filter'
>
> I have the expected performances again.
>
>
> It seems that the index on COL2 is being ignored when doing two joins and a
> filter on each table.
> What do you think about it ?
>
>
> Thank yo very much !
> Kind regards
> Emmanuel
>
>
>
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

--
Taras Ledkov
Mail-To: [hidden email]

wiesenfe wiesenfe
Reply | Threaded
Open this post in threaded view
|

Re: Slow SQL query when joining on a REPLICATED Cache

Hello,

Thank you for your answer, creating a compound index worked perfectly.
Your assumption was correct !


Best regards,
Emmanuel



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/