how does group by works in ignite

classic Classic list List threaded Threaded
5 messages Options
David David
Reply | Threaded
Open this post in threaded view
|

how does group by works in ignite

Hi all,

I need to analyze a performance bottleneck in a complex environment.
So I decided to split down each component and split down the query to most
simple level.

*scenario:*
- only 1 table "person" with 2 columns
id (long), CURRENTCITYID(long) (both indexed)
10_000_000 rows

*Task: *
- make a group by of one indexed field.
select CURRENTCITYID, count(CURRENTCITYID) from
"worldPerson_AFF_CCID".person group by CURRENTCITYID;

Result:
CURRENTCITYID; COUNT(CURRENTCITYID)
1; 3000
2; 4000
...
12,799 rows selected (14.986 seconds)

Issue: a group by within a single table is very slow

Question: how to analyse and find bottleneck? any performance tuning hints?

Currently using 1 server/1 node
Also did same test with 10 servers/ 10 nodes, which imporved the
performance, but I would need 16 servers to reach the speed of one PG server

So I would like to understand better how ignite works for group opperations.
And what can be done on tuning side?


0: jdbc:ignite:thin://127.0.0.1/> explain select CURRENTCITYID,
count(CURRENTCITYID) from "worldPerson_AFF_CCID".person group by
CURRENTCITYID;
+--------------------------------+
|              PLAN              |
+--------------------------------+
| SELECT
    __Z0.CURRENTCITYID AS __C0_0,
    COUNT(__Z0.CURRENTCITYID) AS __C0_1
FROM "worldPerson_AFF_CCID".PERSON __Z0
    /* "worldPerson_AFF_CCID".PERSON_CURRENTCITYID_IDX */
GROUP BY __Z0.CURRE |
| SELECT
    __C0_0 AS CURRENTCITYID,
    CAST(SUM(__C0_1) AS BIGINT) AS __C0_1
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */
GROUP BY __C0_0 |
+--------------------------------+
2 rows selected (0.008 seconds)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: how does group by works in ignite

Hello!

It would help making group by column an affinity key. Then you can use collocated=true setting to make this faster.

Regards,
--
Ilya Kasnacheev


вт, 4 июн. 2019 г. в 15:15, David <[hidden email]>:
Hi all,

I need to analyze a performance bottleneck in a complex environment.
So I decided to split down each component and split down the query to most
simple level.

*scenario:*
- only 1 table "person" with 2 columns
id (long), CURRENTCITYID(long) (both indexed)
10_000_000 rows

*Task: *
- make a group by of one indexed field.
select CURRENTCITYID, count(CURRENTCITYID) from
"worldPerson_AFF_CCID".person group by CURRENTCITYID;

Result:
CURRENTCITYID; COUNT(CURRENTCITYID)
1; 3000
2; 4000
...
12,799 rows selected (14.986 seconds)

Issue: a group by within a single table is very slow

Question: how to analyse and find bottleneck? any performance tuning hints?

Currently using 1 server/1 node
Also did same test with 10 servers/ 10 nodes, which imporved the
performance, but I would need 16 servers to reach the speed of one PG server

So I would like to understand better how ignite works for group opperations.
And what can be done on tuning side?


0: jdbc:ignite:thin://127.0.0.1/> explain select CURRENTCITYID,
count(CURRENTCITYID) from "worldPerson_AFF_CCID".person group by
CURRENTCITYID;
+--------------------------------+
|              PLAN              |
+--------------------------------+
| SELECT
    __Z0.CURRENTCITYID AS __C0_0,
    COUNT(__Z0.CURRENTCITYID) AS __C0_1
FROM "worldPerson_AFF_CCID".PERSON __Z0
    /* "worldPerson_AFF_CCID".PERSON_CURRENTCITYID_IDX */
GROUP BY __Z0.CURRE |
| SELECT
    __C0_0 AS CURRENTCITYID,
    CAST(SUM(__C0_1) AS BIGINT) AS __C0_1
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */
GROUP BY __C0_0 |
+--------------------------------+
2 rows selected (0.008 seconds)




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
David David
Reply | Threaded
Open this post in threaded view
|

Re: how does group by works in ignite

Hi Ilya,

thx for helping me

the column is collocated. Still I need around ~10+Servers to match one PG
Server speed for a group by.

What do you mean with use "collocated=true"?
I thought it is only needed in case you dont have collocated data and still
want to have full result set.
and its not needed when data are collocated. Do I understand it wrong?


btw side question:
when using sqlline.sh with the "distributedJoins=true"
all queries are x-times slower than without

which makes totaly sense in case for non collocated data.
But it is also the case for collocated data, which makes no sense.

sqlline.sh --verbose=true -u
jdbc:ignite:thin://127.0.0.1/?distributedJoins=true



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: how does group by works in ignite

Hello!

I'm not sure that Ignite is able to detect actual affinity 100% of the time. So that's why collocate=true may help and that's why distributedJoins may slow down a perfectly collocated query.

Other than that, I don't think Ignite is going to beat Pg on group by performance right now, when there's a lot of data to process. Maybe consider using task-split instead of SQL for such case, you will still need to scan all data.

Regards,
--
Ilya Kasnacheev


ср, 5 июн. 2019 г. в 05:37, David <[hidden email]>:
Hi Ilya,

thx for helping me

the column is collocated. Still I need around ~10+Servers to match one PG
Server speed for a group by.

What do you mean with use "collocated=true"?
I thought it is only needed in case you dont have collocated data and still
want to have full result set.
and its not needed when data are collocated. Do I understand it wrong?


btw side question:
when using sqlline.sh with the "distributedJoins=true"
all queries are x-times slower than without

which makes totaly sense in case for non collocated data.
But it is also the case for collocated data, which makes no sense.

sqlline.sh --verbose=true -u
jdbc:ignite:thin://127.0.0.1/?distributedJoins=true



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: how does group by works in ignite

In reply to this post by David
Hi David,

Please check this thread to see if you need to stay with Postgres or switch to Ignite:

In general, it's expected that Ignite is to be used for relational DBs acceleration - when those can't process large data sets and an upgrade to expensive hardware is not the way to go. So, with Ignite the volume of data is usually measured in hundreds of GBs and more and the scalability is one of the keys to benefit.

-
Denis


On Wed, Jun 5, 2019 at 3:37 AM David <[hidden email]> wrote:
Hi Ilya,

thx for helping me

the column is collocated. Still I need around ~10+Servers to match one PG
Server speed for a group by.

What do you mean with use "collocated=true"?
I thought it is only needed in case you dont have collocated data and still
want to have full result set.
and its not needed when data are collocated. Do I understand it wrong?


btw side question:
when using sqlline.sh with the "distributedJoins=true"
all queries are x-times slower than without

which makes totaly sense in case for non collocated data.
But it is also the case for collocated data, which makes no sense.

sqlline.sh --verbose=true -u
jdbc:ignite:thin://127.0.0.1/?distributedJoins=true



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/