When to use REPLICATED vs PARTITIONED?

classic Classic list List threaded Threaded
4 messages Options
David David
Reply | Threaded
Open this post in threaded view
|

When to use REPLICATED vs PARTITIONED?

Hi all,

Could someone please explain, in what cases it makes sense to use REPLICATED
cache/table?

I thought I would use REPLICATED tables for small tables which contain
related information.
Like a city table, which maybe has a few 10k entries.

And use PARTITIONED for big tables, like persons table, with millions of
entries.

My hope was, that if the small tables are REPLICATED and so available on
each node.
It would avoid network trafic during table joins and would make queries
faster.

But surprisingly the opposite happend. With one table replicated the join
was slower than with 2 PARTITIONED tables.

Do I think in the wrong way?

small example

table 1 persons: big data set = PARTITIONED
"CREATE TABLE IF NOT EXISTS Persons" +
                        "(id BIGINT, " +
                        "firstname varchar(20), " +
                        "lastname varchar(20), " +
                        "age int, " +
                        "currentcityid int, " +
                        "borncityid int, " +
                        "gender varchar(1), " +
                        "PRIMARY KEY(id))" +
                        "WITH \"template=PARTITIONED, backups=0\"";



table 2 city: small data set=REPLICATED
"CREATE TABLE IF NOT EXISTS city " +
                            "(id int PRIMARY KEY, " +
                            "name varchar(40), " +
                            "lat double, " +
                            "lng double, " +
                            "country varchar(60), " +
                            "capital varchar(10)) " +
                            "WITH \"template=REPLICATED, backups=0\"";

query
SELECT count(*) FROM "cachePerson".person AS p left join "cityCache".city AS
c ON p.CURRENTCITYID=c.id where p.age <50;

is faster when both tables are PARTITIONED.
While my logic telles me it should faster if one is REPLICATED to avoid
network trafic during joins.

can someone help to understand.

br
  David






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Vladimir Pligin Vladimir Pligin
Reply | Threaded
Open this post in threaded view
|

Re: When to use REPLICATED vs PARTITIONED?

Hi David,

As far as I know your understanding is correct in general.
Let's try to figure out what's happening in this particular case.
Could you please share execution plans of the query with both partitioned
and replicated templates.
And it also would be great if you share execution time for both.


Vladimir



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
David David
Reply | Threaded
Open this post in threaded view
|

Re: When to use REPLICATED vs PARTITIONED?

Hi Vladimir,

thx for help.


*Setup:*
2x server - one ignite node per server = 2 nodes

city table size: 13_000 entries
person table size: 10_000_000 entries


*Execution times:*
city table *PARTITIONED *+ person table PARTITIONED
0: jdbc:ignite:thin://127.0.0.1/> SELECT count(*) FROM "cachePerson".person
AS p left join "cityCache".city AS c ON p.CURRENTCITYID=c.id where p.age<50;
+--------------------------------+
|            COUNT(*)            |
+--------------------------------+
| 4949454                        |
+--------------------------------+
1 row selected (7.466 seconds)


city table *REPLICATED *+ person table PARTITIONED
0: jdbc:ignite:thin://127.0.0.1/> SELECT count(*) FROM "cachePerson".person
AS p left join "cityCache".city AS c ON p.CURRENTCITYID=c.id where p.age<50;
+--------------------------------+
|            COUNT(*)            |
+--------------------------------+
| 4947668                        |
+--------------------------------+
1 row selected (8.808 seconds)


=> both tables PARTITIONED = 16% faster compared to one table REPLICATED
while as I would understand should be faster not slower


*execution plan:*
city table *PARTITIONED *+ person table PARTITIONED
0: jdbc:ignite:thin://127.0.0.1/> explain SELECT count(*) FROM
"cachePerson".person AS p left join "cityCache".city AS c ON
p.CURRENTCITYID=c.id where p.age<50;
+--------------------------------+
|              PLAN              |
+--------------------------------+
| SELECT
    COUNT(*) AS __C0_0
FROM "cachePerson".PERSON P__Z0
    /* "cachePerson".PERSON.__SCAN_ */
    /* WHERE P__Z0.AGE < 50
    */
LEFT OUTER JOIN "cityCache".CITY C__Z1
    /* "cityCache".CITY_ID_ID |
| SELECT
    CAST(SUM(__C0_0) AS BIGINT) AS __C0_0
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */ |
+--------------------------------+
2 rows selected (0.004 seconds)


city table *REPLICATED *+ person table PARTITIONED
0: jdbc:ignite:thin://127.0.0.1/> explain SELECT count(*) FROM
"cachePerson".person AS p left join "cityCache".city AS c ON
p.CURRENTCITYID=c.id where p.age<50;
+--------------------------------+
|              PLAN              |
+--------------------------------+
| SELECT
    COUNT(*) AS __C0_0
FROM "cachePerson".PERSON P__Z0
    /* "cachePerson".PERSON.__SCAN_ */
    /* WHERE P__Z0.AGE < 50
    */
LEFT OUTER JOIN "cityCache".CITY C__Z1
    /* "cityCache".CITY_ID_ID |
| SELECT
    CAST(SUM(__C0_0) AS BIGINT) AS __C0_0
FROM PUBLIC.__T0
    /* PUBLIC."merge_scan" */ |
+--------------------------------+
2 rows selected (0.005 seconds)


=> looks like they are equivalent


*visor cache output:*
city table *PARTITIONED *+ person table PARTITIONED
visor> cache
Time of the snapshot: 2019-03-19 14:52:01
+================================================================================================================================================================+
|     Name(@)      |    Mode     | Nodes | Total entries (Heap / Off-heap) |
Primary entries (Heap / Off-heap)  |   Hits    |  Misses   |   Reads   |
Writes   |
+================================================================================================================================================================+
| cachePerson(@c0) | PARTITIONED | 2     | 10000020 (0 / 10000020)         |
min: 4794924 (0 / 4794924)          | min: 0    | min: 0    | min: 0    |
min: 0    |
|                  |             |       |                                 |
avg: 5000010.00 (0.00 / 5000010.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 0.00 |
|                  |             |       |                                 |
max: 5205096 (0 / 5205096)          | max: 0    | max: 0    | max: 0    |
max: 0    |
+------------------+-------------+-------+---------------------------------+-------------------------------------+-----------+-----------+-----------+-----------+
| cityCache(@c1)   | PARTITIONED | 2     | 12893 (0 / 12893)               |
min: 6188 (0 / 6188)                | min: 0    | min: 0    | min: 0    |
min: 0    |
|                  |             |       |                                 |
avg: 6446.50 (0.00 / 6446.50)       | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 0.00 |
|                  |             |       |                                 |
max: 6705 (0 / 6705)                | max: 0    | max: 0    | max: 0    |
max: 0    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+




city table *REPLICATED *+ person table PARTITIONED
visor> cache
Time of the snapshot: 2019-03-19 14:26:55
+================================================================================================================================================================+
|     Name(@)      |    Mode     | Nodes | Total entries (Heap / Off-heap) |
Primary entries (Heap / Off-heap)  |   Hits    |  Misses   |   Reads   |
Writes   |
+================================================================================================================================================================+
| cachePerson(@c0) | PARTITIONED | 2     | 10000020 (0 / 10000020)         |
min: 4794928 (0 / 4794928)          | min: 0    | min: 0    | min: 0    |
min: 0    |
|                  |             |       |                                 |
avg: 5000010.00 (0.00 / 5000010.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 0.00 |
|                  |             |       |                                 |
max: 5205092 (0 / 5205092)          | max: 0    | max: 0    | max: 0    |
max: 0    |
+------------------+-------------+-------+---------------------------------+-------------------------------------+-----------+-----------+-----------+-----------+
| cityCache(@c1)   | REPLICATED  | 2     | 12893 (0 / 12893)               |
min: 6400 (0 / 6400)                | min: 0    | min: 0    | min: 0    |
min: 0    |
|                  |             |       |                                 |
avg: 6446.50 (0.00 / 6446.50)       | avg: 0.00 | avg: 0.00 | avg: 0.00 |
avg: 0.00 |
|                  |             |       |                                 |
max: 6493 (0 / 6493)                | max: 0    | max: 0    | max: 0    |
max: 0    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Vladimir Pligin Vladimir Pligin
Reply | Threaded
Open this post in threaded view
|

Re: When to use REPLICATED vs PARTITIONED?

Hi again!

I'm sorry for being absent for so long.
Maybe the reason here that you data is collocated somehow using id of a
city.
Replicated cache suits well if you need to join to some static dictionary
and it's impossible or difficult to distribute other data in manner
corresponding to that dictionary. My assumption here is that Ignite joins
only subsets (about 1/2) of city cache in partitioned case. But when you
have replicated one the Ignite should join entire dataset. What do you think
does it make sense here?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/