Range queries on indexed columns

classic Classic list List threaded Threaded
4 messages Options
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Range queries on indexed columns

HI Team,

I have a table TEST with a indexed column COL_A. Does the following query works ?

select * from Test where COL_A > '1' and COL_A < '2' offset 10  ROWS FETCH NEXT 20 ROWS ONLY

As per my understanding of distributed systems, the query is sent to all nodes and gets the 10 records from each node and return 10 (whatever returns first)

as indexes are distributed, the above query may not return the records in paginated way without adding sort like below.

select * from Test where COL_A > '1' and COL_A < '2' order by COL_A offset 10  ROWS FETCH NEXT 20 ROWS ONLY

do you see any overhead of sort here ?

Does it work in following way ?

send the query to all nodes and get 10 (based on sorting) records and sort all results of each node at reducer and return final 10 .

Sort should not have any overhead here as sort and filter is done on indexed column. 

Please correct me if i am wrong. thanks.

Thanks

Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Range queries on indexed columns

Hi Team,

Can some help in understanding the below ? Thanks.

On 13 June 2017 at 11:07, Anil <[hidden email]> wrote:
HI Team,

I have a table TEST with a indexed column COL_A. Does the following query works ?

select * from Test where COL_A > '1' and COL_A < '2' offset 10  ROWS FETCH NEXT 20 ROWS ONLY

As per my understanding of distributed systems, the query is sent to all nodes and gets the 10 records from each node and return 10 (whatever returns first)

as indexes are distributed, the above query may not return the records in paginated way without adding sort like below.

select * from Test where COL_A > '1' and COL_A < '2' order by COL_A offset 10  ROWS FETCH NEXT 20 ROWS ONLY

do you see any overhead of sort here ?

Does it work in following way ?

send the query to all nodes and get 10 (based on sorting) records and sort all results of each node at reducer and return final 10 .

Sort should not have any overhead here as sort and filter is done on indexed column. 

Please correct me if i am wrong. thanks.

Thanks


Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Range queries on indexed columns

Hi Anil,

Yes, in your case map queries results already sorted and there is only merge on reduce side.
Sorting can be disabled on map side when e.g. aggregates is used.

On Wed, Jun 14, 2017 at 3:20 PM, Anil <[hidden email]> wrote:
Hi Team,

Can some help in understanding the below ? Thanks.

On 13 June 2017 at 11:07, Anil <[hidden email]> wrote:
HI Team,

I have a table TEST with a indexed column COL_A. Does the following query works ?

select * from Test where COL_A > '1' and COL_A < '2' offset 10  ROWS FETCH NEXT 20 ROWS ONLY

As per my understanding of distributed systems, the query is sent to all nodes and gets the 10 records from each node and return 10 (whatever returns first)

as indexes are distributed, the above query may not return the records in paginated way without adding sort like below.

select * from Test where COL_A > '1' and COL_A < '2' order by COL_A offset 10  ROWS FETCH NEXT 20 ROWS ONLY

do you see any overhead of sort here ?

Does it work in following way ?

send the query to all nodes and get 10 (based on sorting) records and sort all results of each node at reducer and return final 10 .

Sort should not have any overhead here as sort and filter is done on indexed column. 

Please correct me if i am wrong. thanks.

Thanks





--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Range queries on indexed columns

Thanks Andrey.

On 14 June 2017 at 20:15, Andrey Mashenkov <[hidden email]> wrote:
Hi Anil,

Yes, in your case map queries results already sorted and there is only merge on reduce side.
Sorting can be disabled on map side when e.g. aggregates is used.

On Wed, Jun 14, 2017 at 3:20 PM, Anil <[hidden email]> wrote:
Hi Team,

Can some help in understanding the below ? Thanks.

On 13 June 2017 at 11:07, Anil <[hidden email]> wrote:
HI Team,

I have a table TEST with a indexed column COL_A. Does the following query works ?

select * from Test where COL_A > '1' and COL_A < '2' offset 10  ROWS FETCH NEXT 20 ROWS ONLY

As per my understanding of distributed systems, the query is sent to all nodes and gets the 10 records from each node and return 10 (whatever returns first)

as indexes are distributed, the above query may not return the records in paginated way without adding sort like below.

select * from Test where COL_A > '1' and COL_A < '2' order by COL_A offset 10  ROWS FETCH NEXT 20 ROWS ONLY

do you see any overhead of sort here ?

Does it work in following way ?

send the query to all nodes and get 10 (based on sorting) records and sort all results of each node at reducer and return final 10 .

Sort should not have any overhead here as sort and filter is done on indexed column. 

Please correct me if i am wrong. thanks.

Thanks





--
Best regards,
Andrey V. Mashenkov