distributed sql join not working as mentioned in documentation

classic Classic list List threaded Threaded
4 messages Options
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

distributed sql join not working as mentioned in documentation

Hi all,
I am trying to do a simple cross join on two tables with non-collocated data (without affinity key), 
This non-collocated distributed join always fails with the error message:

"java.sql.SQLException: javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index "

If I create one of the tables in replicated mode and another one in partitioned mode this Join operation works but documentation mentions that Ignite supports non-collocated joins without any condition.
And we tried with 3 tables and 1 in replicated and other 2 in partitioned then we observed that it failed.
we are running the Join operations with distributedJoins=true.
We observed that if there are N tables in Join operation then (N-1) should be in replicated mode, is our understanding right?
If our understanding is correct then to do Join operation the dimensioning of cluster increases by many folds which can't be used in a production environment.
To reproduce:
Ignite with 4 node cluster with native persistence enabled.
create the following tables
CREATE TABLE City (
  id LONG PRIMARY KEY, name VARCHAR)
  WITH "backup=1";

CREATE TABLE Person (
  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
  WITH "backups=1";

CREATE INDEX idx_city_name ON City (name);

CREATE INDEX idx_person_name ON Person (name);

INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
INSERT INTO City (id, name) VALUES (2, 'Denver');
INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');

INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);

Query to be run:
select * from City c, Person p;
or 
SELECT * FROM City AS c CROSS join Person AS p; 



ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: distributed sql join not working as mentioned in documentation

Hi,

To make work this query, you can add one where clause or join condition in the query, for example: where c.id = city_id;. I don't really understand why do you want to run a fully distributed cross join on these tables - it doesn't make sense, moreover, it will lead to the a lot of data movement between nodes.

What are you trying to achieve?

Best Regards,
Evgenii

чт, 19 сент. 2019 г. в 16:18, Shiva Kumar <[hidden email]>:
Hi all,
I am trying to do a simple cross join on two tables with non-collocated data (without affinity key), 
This non-collocated distributed join always fails with the error message:

"java.sql.SQLException: javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index "

If I create one of the tables in replicated mode and another one in partitioned mode this Join operation works but documentation mentions that Ignite supports non-collocated joins without any condition.
And we tried with 3 tables and 1 in replicated and other 2 in partitioned then we observed that it failed.
we are running the Join operations with distributedJoins=true.
We observed that if there are N tables in Join operation then (N-1) should be in replicated mode, is our understanding right?
If our understanding is correct then to do Join operation the dimensioning of cluster increases by many folds which can't be used in a production environment.
To reproduce:
Ignite with 4 node cluster with native persistence enabled.
create the following tables
CREATE TABLE City (
  id LONG PRIMARY KEY, name VARCHAR)
  WITH "backup=1";

CREATE TABLE Person (
  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
  WITH "backups=1";

CREATE INDEX idx_city_name ON City (name);

CREATE INDEX idx_person_name ON Person (name);

INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
INSERT INTO City (id, name) VALUES (2, 'Denver');
INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');

INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);

Query to be run:
select * from City c, Person p;
or 
SELECT * FROM City AS c CROSS join Person AS p; 



shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: distributed sql join not working as mentioned in documentation

Hi Evgenii,
Even with where condition, I am getting the same error.
I have some use case where I can't collocate tables data, as Ignite doc says non-collocated distributed join or cross join is supported in Ignite I am trying to use that but getting this exception when I create tables in replicated mode.

regards,
shiva

On Mon, Sep 23, 2019 at 3:57 PM Evgenii Zhuravlev <[hidden email]> wrote:
Hi,

To make work this query, you can add one where clause or join condition in the query, for example: where c.id = city_id;. I don't really understand why do you want to run a fully distributed cross join on these tables - it doesn't make sense, moreover, it will lead to the a lot of data movement between nodes.

What are you trying to achieve?

Best Regards,
Evgenii

чт, 19 сент. 2019 г. в 16:18, Shiva Kumar <[hidden email]>:
Hi all,
I am trying to do a simple cross join on two tables with non-collocated data (without affinity key), 
This non-collocated distributed join always fails with the error message:

"java.sql.SQLException: javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index "

If I create one of the tables in replicated mode and another one in partitioned mode this Join operation works but documentation mentions that Ignite supports non-collocated joins without any condition.
And we tried with 3 tables and 1 in replicated and other 2 in partitioned then we observed that it failed.
we are running the Join operations with distributedJoins=true.
We observed that if there are N tables in Join operation then (N-1) should be in replicated mode, is our understanding right?
If our understanding is correct then to do Join operation the dimensioning of cluster increases by many folds which can't be used in a production environment.
To reproduce:
Ignite with 4 node cluster with native persistence enabled.
create the following tables
CREATE TABLE City (
  id LONG PRIMARY KEY, name VARCHAR)
  WITH "backup=1";

CREATE TABLE Person (
  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
  WITH "backups=1";

CREATE INDEX idx_city_name ON City (name);

CREATE INDEX idx_person_name ON Person (name);

INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
INSERT INTO City (id, name) VALUES (2, 'Denver');
INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');

INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);

Query to be run:
select * from City c, Person p;
or 
SELECT * FROM City AS c CROSS join Person AS p; 



ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: distributed sql join not working as mentioned in documentation

Hello!

This is an interesting situation to consider.

Have you tried SELECT * FROM City AS c join (SELECT * FROM Person) AS p; ?

Otherwise, I don't think we support joins without conditions, unless most of tables are replicated.

Regards,
--
Ilya Kasnacheev


чт, 26 сент. 2019 г. в 13:40, Shiva Kumar <[hidden email]>:
Hi Evgenii,
Even with where condition, I am getting the same error.
I have some use case where I can't collocate tables data, as Ignite doc says non-collocated distributed join or cross join is supported in Ignite I am trying to use that but getting this exception when I create tables in replicated mode.

regards,
shiva

On Mon, Sep 23, 2019 at 3:57 PM Evgenii Zhuravlev <[hidden email]> wrote:
Hi,

To make work this query, you can add one where clause or join condition in the query, for example: where c.id = city_id;. I don't really understand why do you want to run a fully distributed cross join on these tables - it doesn't make sense, moreover, it will lead to the a lot of data movement between nodes.

What are you trying to achieve?

Best Regards,
Evgenii

чт, 19 сент. 2019 г. в 16:18, Shiva Kumar <[hidden email]>:
Hi all,
I am trying to do a simple cross join on two tables with non-collocated data (without affinity key), 
This non-collocated distributed join always fails with the error message:

"java.sql.SQLException: javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index "

If I create one of the tables in replicated mode and another one in partitioned mode this Join operation works but documentation mentions that Ignite supports non-collocated joins without any condition.
And we tried with 3 tables and 1 in replicated and other 2 in partitioned then we observed that it failed.
we are running the Join operations with distributedJoins=true.
We observed that if there are N tables in Join operation then (N-1) should be in replicated mode, is our understanding right?
If our understanding is correct then to do Join operation the dimensioning of cluster increases by many folds which can't be used in a production environment.
To reproduce:
Ignite with 4 node cluster with native persistence enabled.
create the following tables
CREATE TABLE City (
  id LONG PRIMARY KEY, name VARCHAR)
  WITH "backup=1";

CREATE TABLE Person (
  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
  WITH "backups=1";

CREATE INDEX idx_city_name ON City (name);

CREATE INDEX idx_person_name ON Person (name);

INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
INSERT INTO City (id, name) VALUES (2, 'Denver');
INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');

INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);

Query to be run:
select * from City c, Person p;
or 
SELECT * FROM City AS c CROSS join Person AS p;