Group By Query is slow : Apache Ignite 2.3.0

classic Classic list List threaded Threaded
7 messages Options
indranil.db indranil.db
Reply | Threaded
Open this post in threaded view
|

Group By Query is slow : Apache Ignite 2.3.0

Hi,

    I am trying to execute a group by query on 86k items in a cache which is taking 70 ms which imo is quite slow

Can anyone please let me know what is that I may be doing wrong?

Model code:

@Data

@ToString

@QueryGroupIndex.List(@QueryGroupIndex(name = "idx1"))

public class A implements Serializable {

@QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 5)})

    private long id;

    @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 4)})

    private long secondId;

    @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 3)})

    private long timestamp;

    @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 2)})

    private long thirdId;

    @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 1)})

    private long fourthId;

    @QuerySqlField(index = true, orderedGroups = {@QuerySqlField.Group(name = "idx1", order = 0)})

    private long mainId;    

}

Main class for query:

private void startCache(){
    

    CacheConfiguration<Long, A> cacheConfig = new CacheConfiguration<>("testcache");

    cacheConfig.setCacheMode(CacheMode.PARTITIONED);

    cacheConfig.setBackups(0);

    cacheConfig.setIndexedTypes(Long.class, A.class);

}

private void grouByQuery() {

        IgniteCache<Long, Bet> cache = Ignition.ignite().cache("testcache");

        SqlFieldsQuery qry = new SqlFieldsQuery("select mainId, count(*) from Bet group by mainId order by count(*) desc limit 20");

        Collection<List<?>> res = cache.query(qry).getAll();

    }


Thanks.
dkarachentsev dkarachentsev
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi,

How many nodes do you have and how do you measure that 70 ms? Is it first
query or average time? Please show your EXPLAIN of the query.

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
indranil.db indranil.db
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi Dmitry,

   I have only 1 node at this moment. The way I am measuring is using stop watch kind of stuff before and after execution of the sql query.

This is the only query and I kept it running in a while loop with some sleep so that JIT compiler has done all its optimisation.

Please fine below the explain plan: 

[[SELECT

    __Z0.MAINID AS __C0_0,

    COUNT(*) AS __C0_1

FROM "testcache".A __Z0

    /* "testcache".A_MAINID_IDX */

GROUP BY __Z0.MAINID], [SELECT

    __C0_0 AS MAINID,

    CAST(SUM(__C0_1) AS BIGINT) AS __C0_1

FROM PUBLIC.__T0

    /* "testcache"."merge_scan" */

GROUP BY __C0_0

LIMIT 20]] 

Thanks and regards,

-- IB



On Wednesday 20 December 2017, 1:06:51 AM GMT+11, dkarachentsev <[hidden email]> wrote:


Hi,

How many nodes do you have and how do you measure that 70 ms? Is it first
query or average time? Please show your EXPLAIN of the query.


Thanks!

-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

dkarachentsev dkarachentsev
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi,

How many records your query returns without LIMIT? How long does it take to
select all records without grouping?

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
indranil.db indranil.db
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi,

    There are 86486 records.
  
Time taken for select count(*) from A => 35 ms
Time taken for select mainId, count(*) from A group by mainId => 70 ms
Time taken for select * from => 0 ms

 I am doing a POC with apache ignite and I am very keen to use it in production for live streaming and real time in memory fast query.
Group by, top 100 are the 2 preferred queries.

I need to get the figures great to put up a case, hence any help will be appreciated.

Thanks and regards,

-- Indranil Basu



On Wednesday 20 December 2017, 11:14:33 PM GMT+11, dkarachentsev <[hidden email]> wrote:


Hi,

How many records your query returns without LIMIT? How long does it take to
select all records without grouping?
dkarachentsev dkarachentsev
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi Indranil,

These measurements are not fully correct, for example select count(*) might
use only index and in select * was not actually invoked, because you need to
run over cursor.
Also by default query is not parallelized on one node, and scan with
grouping is going sequentially in one thread.

Try to recheck your results on one node with enabled query parallelism:
CacheConfiguration.setQueryParallelism(8) [1].

And/or on 4 server nodes with 1 backup. You should get better numbers
because of spreading query over machines.

[1]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/CacheConfiguration.html#setQueryParallelism(int)

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Group By Query is slow : Apache Ignite 2.3.0

Hi,

Were you able to resolve the issue? If yes, it would be nice to share it with the community.

D.

On Thu, Dec 21, 2017 at 12:49 AM, dkarachentsev <[hidden email]> wrote:
Hi Indranil,

These measurements are not fully correct, for example select count(*) might
use only index and in select * was not actually invoked, because you need to
run over cursor.
Also by default query is not parallelized on one node, and scan with
grouping is going sequentially in one thread.

Try to recheck your results on one node with enabled query parallelism:
CacheConfiguration.setQueryParallelism(8) [1].

And/or on 4 server nodes with 1 backup. You should get better numbers
because of spreading query over machines.

[1]
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/CacheConfiguration.html#setQueryParallelism(int)

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/