AffinityKey for co-location

classic Classic list List threaded Threaded
4 messages Options
syedmoiz syedmoiz
Reply | Threaded
Open this post in threaded view
|

AffinityKey for co-location

Hi,

My use case is to optimize certain aggregation queries in a very large
and flat table.
The table schema would be like below:

EmployeeID , DepartmentID, <Employee Details ...> < Department Details
...> <Company details >

Most of my queries are "group by" over <EmployeeID, DepartmentID>, and
nested as:

select ... ( select ... group by EmployeeID ) ... group by EmployeeID,
DepartmentID

From  my basic Ignite understanding, keeping the EmployeeID,
DepartmentID co-located would help in query performance, assuming a
EmpoyeeID is always linked to a single  DepartmentID, I tried the
following:

1. created AffinityKey<EmployeeID, DepartmentID> in the pojo
public class Model implements Serializable{

@QuerySqlField(index = true)
private String employeeID;

@QuerySqlField(index = true)
private String departmentID;

...
public AffinityKey<String> key() {
    if (key == null)
        key = new AffinityKey<>(employeeID, employeeID);
    return key;
}
...
}

2. used DataStreamer  to load cache as follow (inspired by
CacheQueryExample.initialize):

CacheConfiguration<AffinityKey<String>, Model> cacheConfiguration =
new CacheConfiguration<>(CACHE_NAME);

 cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
 cacheConfiguration.setIndexedTypes(AffinityKey.class, Model.class);
 try (IgniteCache<AffinityKey<String>, Model> cache =
ignite.getOrCreateCache(cacheConfiguration)) {
       try (IgniteDataStreamer<AffinityKey<String>, Model> stmr =
ignite.dataStreamer(CACHE_NAME)) {
       ...
       Model model = new Model();
       ...
       stmr.addData(model.key(), model);
       ...

I am doing it wrong here as I get only one record per key (EmployeeID,
DepartmentID) in the cache when I load the data.

Could you please let me know the correct usage of the AffinityKey to
achieve co-location , or if I need to try out other ways for it.

Regards,
Moiz
PS: ... used to hide some relevant code
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: AffinityKey for co-location

Hi Moiz,

It looks like your data is already flattened in a single 'Model' type, so there is nothing to collocate.

Affinity collocation is required if you have two different types (e.g., Employee and Department) and you join them in SQL queries.

Let me know if I'm missing something.

-Val
syedmoiz syedmoiz
Reply | Threaded
Open this post in threaded view
|

Re: AffinityKey for co-location

Hi Val,

Got it. Actually what I want to achieve is all records with same
Employee and Department to be available in the same node. Is there a
way to set custom partitioner. And by doing this, would the queries
(group by) when executed have little data movement between the nodes.

Regards,
Moiz

On Sat, Jan 30, 2016 at 2:19 AM, vkulichenko
<[hidden email]> wrote:

> Hi Moiz,
>
> It looks like your data is already flattened in a single 'Model' type, so
> there is nothing to collocate.
>
> Affinity collocation is required if you have two different types (e.g.,
> Employee and Department) and you join them in SQL queries.
>
> Let me know if I'm missing something.
>
> -Val
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/AffinityKey-for-co-location-tp2772p2777.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: AffinityKey for co-location

Moiz,

There is no data movement between nodes. We execute SQL in map-reduce fashion: client broadcasts the map part of the query to server nodes, gets results back and reduces them if needed.

The only case when you need to collocate the data for SQL queries is joins.

-Val