Running a query in only current nodes partitions

classic Classic list List threaded Threaded
5 messages Options
Tolga Kavukcu Tolga Kavukcu
Reply | Threaded
Open this post in threaded view
|

Running a query in only current nodes partitions

Hi everyone,

Basicly i have a simple cache with almost 5m~ entries across 4 nodes. And i use fair affinity function as affinity function.

I also use fair affinity function to send data to nodes from external sources.  My flow can be summarised as.

- Run 4 node cluster with a partitioned cache
- Run another host as a client.
- Use affinity function within client host to send data to owner node. 
- Preprocess data and put to cache.


I have this flow because i need previous value of key to decide next value. So data should be processed within owner node.

My problem is that i have to query only local data with a time interval to decide is value should be changed. So i also need previous value of keys after processing query result. So i need to be sure query result key is exists in local node

I have a solution but i has poor performance. 

First i run
primaryPartitions(instance.ignite.cluster().localNode());  // Returns array of current node's partition numbers
after loop over this array and run scan query
ScanQuery<String, ClassNameOfObject> scanQuery = new ScanQuery<>();
scanQuery.setLocal(true);
scanQuery.setFilter(filter);
scanQuery.setPartition(partition);

.

If anyone can provide a better solution i would be very happy.

Thanks
Tolga KAVUKÇU

Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Running a query in only current nodes partitions

Hi Tolga,

Do you really need to iterate over all the keys when data has to be update? If so then you can parallelize ScanQueries when multiple local Threads will be iterating over specific partitions on each node.
Please refer to this example for more details
https://github.com/gridgain/gridgain-advanced-examples/blob/master/src/main/java/org/gridgain/examples/datagrid/query/ScanQueryExample.java

If you don't need to iterate over every key then how would your query look like if you use SQL query to get a subset of the keys?

--
Denis
Tolga Kavukcu Tolga Kavukcu
Reply | Threaded
Open this post in threaded view
|

Re: Running a query in only current nodes partitions

Hi Denis,

Thanks for the answer.

Better if i provide more detail to get a point . Lets say i have cache1 and cache2. 

- I would like to run a query on cache1. Than i should check if key exists or not in cache2 than i will execute some logic.

I need to make sure that keys are owned by queried node so that i could check properly if key exists in cache or not .(This can be achieved by ScanQuery by setting partition and setLocal=true)

So if i use sql query my query would be like ;

SqlQuery sql = new SqlQuery(Person.class, "salary > ?");

I only apply one rule over one field.

But! there is no setPartition() method in SqlQuery. Please correct me if am wrong.

Thats why i use ScanQuery to itearte over cache and apply my rule. 

I will try multi-thread approach, it could speed up think. Also i would be happy if you can suggest a faster alternative way.

Thanks.

On Mon, Apr 4, 2016 at 2:29 PM, Denis Magda <[hidden email]> wrote:
Hi Tolga,

Do you really need to iterate over *all the keys* when data has to be
update? If so then you can parallelize ScanQueries when multiple local
Threads will be iterating over specific partitions on each node.
Please refer to this example for more details
https://github.com/gridgain/gridgain-advanced-examples/blob/master/src/main/java/org/gridgain/examples/datagrid/query/ScanQueryExample.java

If you don't need to iterate over every key then how would your query look
like if you use SQL query to get a subset of the keys?

--
Denis



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-a-query-in-only-current-nodes-partitions-tp3878p3890.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Tolga KAVUKÇU

Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Running a query in only current nodes partitions

Hi Tolga,

Probably you can use SqlQuery as is by getting a list of keys that have to be updated with"SELECT _key FROM cache2 WHERE ..." where "_key" is Ignite specific keyword saying that entries key has to be included into the result set.
However since SqlQueries are broadcasted to every node (if cache mode is PARTITIONED) the following has to be considered (can be ignored for REPLICATED caches):
- indexes has to be properly configured for "cache2". Execution plan can be checked with "EXPLAIN SELECT ..."
- frequency of such queries. If the query will be executed too frequent it may affect performance.
- size of the result set. If you return tens or hundreds of keys in a result set then it's ok but if the size is measured in thousands and thousands of rows it may have a negative impact.

If solution with SqlQuery works fine then you can iterate over the keys local preparing updates for cache1 and then use cache1.putAll to apply changes.

Will this work for you?

--
Denis


On 4/4/2016 3:11 PM, Tolga Kavukcu wrote:
Hi Denis,

Thanks for the answer.

Better if i provide more detail to get a point . Lets say i have cache1 and cache2. 

- I would like to run a query on cache1. Than i should check if key exists or not in cache2 than i will execute some logic.

I need to make sure that keys are owned by queried node so that i could check properly if key exists in cache or not .(This can be achieved by ScanQuery by setting partition and setLocal=true)

So if i use sql query my query would be like ;

SqlQuery sql = new SqlQuery(Person.class, "salary > ?");

I only apply one rule over one field.

But! there is no setPartition() method in SqlQuery. Please correct me if am wrong.

Thats why i use ScanQuery to itearte over cache and apply my rule. 

I will try multi-thread approach, it could speed up think. Also i would be happy if you can suggest a faster alternative way.

Thanks.

On Mon, Apr 4, 2016 at 2:29 PM, Denis Magda <[hidden email]> wrote:
Hi Tolga,

Do you really need to iterate over *all the keys* when data has to be
update? If so then you can parallelize ScanQueries when multiple local
Threads will be iterating over specific partitions on each node.
Please refer to this example for more details
https://github.com/gridgain/gridgain-advanced-examples/blob/master/src/main/java/org/gridgain/examples/datagrid/query/ScanQueryExample.java

If you don't need to iterate over every key then how would your query look
like if you use SQL query to get a subset of the keys?

--
Denis



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-a-query-in-only-current-nodes-partitions-tp3878p3890.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Tolga KAVUKÇU


Tolga Kavukcu Tolga Kavukcu
Reply | Threaded
Open this post in threaded view
|

Re: Running a query in only current nodes partitions

Hi Denis,

Thanks for the response i will try with SqlQuery approach, seems like it could fit my needs.

I got the idea of executing queries on partitioned caches.

Have a nice day.

On Tue, Apr 5, 2016 at 12:08 PM, Denis Magda <[hidden email]> wrote:
Hi Tolga,

Probably you can use SqlQuery as is by getting a list of keys that have to be updated with"SELECT _key FROM cache2 WHERE ..." where "_key" is Ignite specific keyword saying that entries key has to be included into the result set.
However since SqlQueries are broadcasted to every node (if cache mode is PARTITIONED) the following has to be considered (can be ignored for REPLICATED caches):
- indexes has to be properly configured for "cache2". Execution plan can be checked with "EXPLAIN SELECT ..."
- frequency of such queries. If the query will be executed too frequent it may affect performance.
- size of the result set. If you return tens or hundreds of keys in a result set then it's ok but if the size is measured in thousands and thousands of rows it may have a negative impact.

If solution with SqlQuery works fine then you can iterate over the keys local preparing updates for cache1 and then use cache1.putAll to apply changes.

Will this work for you?

--
Denis



On 4/4/2016 3:11 PM, Tolga Kavukcu wrote:
Hi Denis,

Thanks for the answer.

Better if i provide more detail to get a point . Lets say i have cache1 and cache2. 

- I would like to run a query on cache1. Than i should check if key exists or not in cache2 than i will execute some logic.

I need to make sure that keys are owned by queried node so that i could check properly if key exists in cache or not .(This can be achieved by ScanQuery by setting partition and setLocal=true)

So if i use sql query my query would be like ;

SqlQuery sql = new SqlQuery(Person.class, "salary > ?");

I only apply one rule over one field.

But! there is no setPartition() method in SqlQuery. Please correct me if am wrong.

Thats why i use ScanQuery to itearte over cache and apply my rule. 

I will try multi-thread approach, it could speed up think. Also i would be happy if you can suggest a faster alternative way.

Thanks.

On Mon, Apr 4, 2016 at 2:29 PM, Denis Magda <[hidden email]> wrote:
Hi Tolga,

Do you really need to iterate over *all the keys* when data has to be
update? If so then you can parallelize ScanQueries when multiple local
Threads will be iterating over specific partitions on each node.
Please refer to this example for more details
https://github.com/gridgain/gridgain-advanced-examples/blob/master/src/main/java/org/gridgain/examples/datagrid/query/ScanQueryExample.java

If you don't need to iterate over every key then how would your query look
like if you use SQL query to get a subset of the keys?

--
Denis



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Running-a-query-in-only-current-nodes-partitions-tp3878p3890.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Tolga KAVUKÇU





--
Tolga KAVUKÇU