Retrieving multiple keys with filtering

classic Classic list List threaded Threaded
12 messages Options
avk avk
Reply | Threaded
Open this post in threaded view
|

Retrieving multiple keys with filtering

Hello,


I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the filter. I'm pretty sure earlier versions of Ignite (or was pre-Ignite?) used to have some "getAll()" methods that would take a set of keys and an instance of IgniteBiPredicate, or something like that, and return a map of keys to values. I can no longer find those in the current API...


In any case, what's the recommended way of doing it with the current API?


Thanks

Andrey


slava.koptilin slava.koptilin
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

avk avk
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Slava,

I'd like to avoid scanning potentially millions of cache items just to retrieve a hundred. More importantly, I already have the cache keys that I want. Why would I scan the entire cache? All I need is to filter keys.

Any other suggestions?

Thanks
Andrey

_____________________________
From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 2:34 AM
Subject: Re: Retrieving multiple keys with filtering
To: <[hidden email]>


Hi Andrey,

It seems IgniteCache#query(ScanQuery) method is that you are looking for.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html

You can find an example here:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/datagrid/CacheQueryExample.java

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16392.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


slava.koptilin slava.koptilin
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!
avk avk
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html


Sent from the Apache Ignite Users mailing list archive at Nabble.com.
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.

avk avk
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user
Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.

dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Andrey,

I am not sure I understand. According to EntryProcessor API [1] you can chose to return nothing. 

Also, to my knowledge, you can still do parallel reads while executing the EntryProcessor. Perhaps other community members can elaborate on this.


D.


On Mon, Aug 28, 2017 at 8:29 PM, Andrey Kornev <[hidden email]> wrote:

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user

Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Semyon Boikov Semyon Boikov
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Hi,

If EntryProcessor returns null then null is not added in the result map. But I agree that using invokeAll() will have a lot of unnecessary overhead. Perhaps we need add new getAll method on API, otherwise best alternative is use custom ComputeJob or affinityCall.

Thanks,
Semyon

On Tue, Aug 29, 2017 at 7:20 AM, Dmitriy Setrakyan <[hidden email]> wrote:
Andrey,

I am not sure I understand. According to EntryProcessor API [1] you can chose to return nothing. 

Also, to my knowledge, you can still do parallel reads while executing the EntryProcessor. Perhaps other community members can elaborate on this.


D.


On Mon, Aug 28, 2017 at 8:29 PM, Andrey Kornev <[hidden email]> wrote:

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user

Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.



avk avk
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Ah, yes! Thank you, Semyon! According to invokeAll() javadocs "No mappings will be returned for EntryProcessors that return a null value for a key." I should read JCache javadocs more carefully next time. :) 


Still, the processor is invoked while a monitor is held on the cache entry being processed, which is of course unnecessary in a read-only case like the one we're discussing in this thread...


I guess I'm stuck with the Compute-based approach for now. :(


Thanks!
Andrey


From: Semyon Boikov <[hidden email]>
Sent: Tuesday, August 29, 2017 6:15 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi,

If EntryProcessor returns null then null is not added in the result map. But I agree that using invokeAll() will have a lot of unnecessary overhead. Perhaps we need add new getAll method on API, otherwise best alternative is use custom ComputeJob or affinityCall.

Thanks,
Semyon

On Tue, Aug 29, 2017 at 7:20 AM, Dmitriy Setrakyan <[hidden email]> wrote:
Andrey,

I am not sure I understand. According to EntryProcessor API [1] you can chose to return nothing. 

Also, to my knowledge, you can still do parallel reads while executing the EntryProcessor. Perhaps other community members can elaborate on this.


D.


On Mon, Aug 28, 2017 at 8:29 PM, Andrey Kornev <[hidden email]> wrote:

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user

Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.



dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Semyon, 

Can you please clarify this. Do we allow concurrent reads while invokeAll or invoke is executed?

D.

On Tue, Aug 29, 2017 at 11:59 AM, Andrey Kornev <[hidden email]> wrote:

Ah, yes! Thank you, Semyon! According to invokeAll() javadocs "No mappings will be returned for EntryProcessors that return a null value for a key." I should read JCache javadocs more carefully next time. :) 


Still, the processor is invoked while a monitor is held on the cache entry being processed, which is of course unnecessary in a read-only case like the one we're discussing in this thread...


I guess I'm stuck with the Compute-based approach for now. :(


Thanks!
Andrey


From: Semyon Boikov <[hidden email]>
Sent: Tuesday, August 29, 2017 6:15 AM

To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi,

If EntryProcessor returns null then null is not added in the result map. But I agree that using invokeAll() will have a lot of unnecessary overhead. Perhaps we need add new getAll method on API, otherwise best alternative is use custom ComputeJob or affinityCall.

Thanks,
Semyon

On Tue, Aug 29, 2017 at 7:20 AM, Dmitriy Setrakyan <[hidden email]> wrote:
Andrey,

I am not sure I understand. According to EntryProcessor API [1] you can chose to return nothing. 

Also, to my knowledge, you can still do parallel reads while executing the EntryProcessor. Perhaps other community members can elaborate on this.


D.


On Mon, Aug 28, 2017 at 8:29 PM, Andrey Kornev <[hidden email]> wrote:

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user

Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.




Semyon Boikov Semyon Boikov
Reply | Threaded
Open this post in threaded view
|

Re: Retrieving multiple keys with filtering

Yes, read can be executed without acquiring entry lock. But you need take into account that request for cache.get operation can be processed in the same stripe as cache.invoke.

Semyon

On Mon, Sep 4, 2017 at 6:39 AM, Dmitriy Setrakyan <[hidden email]> wrote:
Semyon, 

Can you please clarify this. Do we allow concurrent reads while invokeAll or invoke is executed?

D.

On Tue, Aug 29, 2017 at 11:59 AM, Andrey Kornev <[hidden email]> wrote:

Ah, yes! Thank you, Semyon! According to invokeAll() javadocs "No mappings will be returned for EntryProcessors that return a null value for a key." I should read JCache javadocs more carefully next time. :) 


Still, the processor is invoked while a monitor is held on the cache entry being processed, which is of course unnecessary in a read-only case like the one we're discussing in this thread...


I guess I'm stuck with the Compute-based approach for now. :(


Thanks!
Andrey


From: Semyon Boikov <[hidden email]>
Sent: Tuesday, August 29, 2017 6:15 AM

To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi,

If EntryProcessor returns null then null is not added in the result map. But I agree that using invokeAll() will have a lot of unnecessary overhead. Perhaps we need add new getAll method on API, otherwise best alternative is use custom ComputeJob or affinityCall.

Thanks,
Semyon

On Tue, Aug 29, 2017 at 7:20 AM, Dmitriy Setrakyan <[hidden email]> wrote:
Andrey,

I am not sure I understand. According to EntryProcessor API [1] you can chose to return nothing. 

Also, to my knowledge, you can still do parallel reads while executing the EntryProcessor. Perhaps other community members can elaborate on this.


D.


On Mon, Aug 28, 2017 at 8:29 PM, Andrey Kornev <[hidden email]> wrote:

Dmitriy,


It's good to be back! 😃 Glad to find Ignite community as vibrant and thriving as ever!


Speaking of invokeAll(), even if we ignore for a moment the overhead associated with locking/unlocking a cache entry prior to passing it to the EntryProcessor as well as the overhead associated with enlisting the touched entries in a transaction, the bigger problem with using invokeAll() for filtering is that EntryProcessor must return a value. I'm not aware of any way to make EntryProcessor drop the entry from the response. The only options is to use a null (or false) to indicate a filtered out entry. In my specific case, I'll end up sending back a whole bunch of nulls in the result map as I expect most of the keys to be rejected by the filter. 

Overall, invokeAll() is not what one would call *efficient* (the key word in my original question) way of filtering.

Thanks!
Andrey


From: Dmitriy Setrakyan <[hidden email]>
Sent: Saturday, August 26, 2017 8:37 AM
To: user

Subject: Re: Retrieving multiple keys with filtering
 
Andrey, 

Good to hear from you. Long time no talk.

I don't think invokeAll has only update semantics. You can definitely use it just to look at the keys and return a result. Also, as you mentioned, Ignite compute is a viable option as well.

The reason that predicates were removed from the get methods is because the API was becoming unwary, and also because JCache does not require it.

D.

On Thu, Aug 24, 2017 at 10:50 AM, Andrey Kornev <[hidden email]> wrote:

Well, I believe invokeAll() has "update" semantics and using it for read-only filtering of cache entries is probably not going to be efficient or even appropriate.


I'm afraid the only viable option I'm left with is to use Ignite's Compute feature: 

- on the sender, group the keys by affinity.

- send each group along with the filter predicate to their affinity nodes using IgniteCompute.

- on each node, use getAll() to fetch the local keys and apply the filter.

- on the sender node, collect the results of the compute jobs into a map.


It's unfortunate that Ignite dropped that original API. What used to be a single API call is now a non-trivial algorithm and one have to worry about things like what happens if the grid topology changes while the compute jobs are executing, etc.


Can anyone think of any other less complex/more robust approach?

Thanks
Andrey


From: slava.koptilin <[hidden email]>
Sent: Thursday, August 24, 2017 9:03 AM
To: [hidden email]
Subject: Re: Retrieving multiple keys with filtering
 
Hi Andrey,

Yes, you are right. ScanQuery scans all entries.
Perhaps, IgniteCache#invokeAll(keys, cacheEntryProcessor) with custom
processor will work for you.
https://ignite.apache.org/releases/2.1.0/javadoc/org/apache/ignite/IgniteCache.html#invokeAll(java.util.Set,%20org.apache.ignite.cache.CacheEntryProcessor,%20java.lang.Object...)

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Retrieving-multiple-keys-with-filtering-tp16391p16400.html
Retrieving multiple keys with filtering. Hello, I have a list of cache keys (up to a few hundred of them) and a filter predicate. I'd like to efficiently retrieve only those values that pass the...


Sent from the Apache Ignite Users mailing list archive at Nabble.com.