Cross join bug for partitioned caches

classic Classic list List threaded Threaded
9 messages Options
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Cross join bug for partitioned caches

Hi all!

Testing ignite 1.5.

Seems, cross join on partitioned caches, when started more then 1 server node return incorrect result (part of the rows is missing).

When started only one server node or caches in replicated mode all work correctly.
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Hi Andrey,

You need to properly collocate your data in order to have correct join results when using partitioned caches (it does not matter whether you join tables within one partitioned cache or join tables across different partitioned caches). Please refer to documentation [1] and example [2]. Note the usage of AffinityKey class for Person objects in the example.

An alternative for this approach is to use REPLICATED mode for one of the caches in the case if this cache contains relatively small data set (so called star-schema). You can refer to [3] for further details.

As far as I know, this limitation is planned to be removed in ignite-1.6, however such a distributed join will be significantly slower compared to collocated join. Stay tuned!

Hope this helps,
AG

[2] org.apache.ignite.examples.datagrid.CacheQueryExample
[3] org.apache.ignite.examples.datagrid.starschema.CacheStarSchemaExample

2016-01-25 17:06 GMT+03:00 Andrey Nestrogaev <[hidden email]>:
Hi all!

Testing ignite 1.5.

Seems, cross join on partitioned caches, when started more then 1 server
node return incorrect result (part of the rows is missing).

When started only one server node or caches in replicated mode all work
correctly.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cross-join-bug-for-partitioned-caches-tp2694.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Hi Alexey,

"Cross Join" doesn't imply predicates for table joins, so how can collocation help with this type of join?

Only workaround is to have all caches in repliacted mode except one.
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Collocation exactly means "to have all the joined entries on the same node". So basically you are right, for cross join it implies having all the data in replicated caches except one partitioned cache.

Sergi

2016-01-25 18:08 GMT+03:00 Andrey Nestrogaev <[hidden email]>:
Hi Alexey,

"Cross Join" doesn't imply predicates for table joins, so how can
collocation help with this type of join?

Only workaround is to have all caches in repliacted mode except one.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Cross-join-bug-for-partitioned-caches-tp2694p2697.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

In reply to this post by Andrey Nestrogaev
If you need an ability to run ad-hoc SQL, then you're right and you need to have one PARTITIONED cache and all others should be REPLICATED. However, if you know your SQL queries in advance, usually you can some up with a collocation strategy for multiple PARTITIONED caches.

I believe the community may suggest several options to you if you share your use-case.
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

I will explore the possibility of adapting the sql based applications to use the ignite as a database.
Therefore, I need to understand what sql can be used as is, and what the limitations and consequences and what you need to completely rewrite or replace with  native api calls.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Andrey,

There is only one limitation in the current implementation: you have to make sure that joined entries are collocated and are stored on the same nodes. There are two ways to achieve this:

1. Using affinity [1]. Take a look at query example [2], it joins Person and Organization types both stored in partitioned caches, but collocated with the help of AffinityKey class.
2. Moving one of the joined parties into a replicated cache. E.g., if you move Organization table in previous example to replicated cache, all organizations will be available on all nodes, therefore you don't have to bother about affinity for persons.

We also have plans to support ad-hoc queries without collocation. But this will always be slower due to potential data reshuffling, so should be used only if both options described above do not work.

Makes sense to you?

[1] https://apacheignite.readme.io/docs/affinity-collocation
[2] https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/datagrid/CacheQueryExample.java

-Val
Andrey Nestrogaev Andrey Nestrogaev
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Valentin,

As I've wrote before "collocated with the help of AffinityKey" does't have sense becouse "Cross Join" does not imply join predicates.

The only workaround is to have in the query no more than one partitioned cache.

Thanks.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Cross join bug for partitioned caches

Andrey,

I just provided the general rules on how to design the domain model to get correct results for SQL queries.

I agree that for cross joins without condition only the second option works.

-Val