Simulating Graph Dependencies With Ignite

classic Classic list List threaded Threaded
18 messages Options
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Hi,

You can store cell dependencies as Ignite's data grid of

IgniteCache<Cell, Cell> cells = ...

where relation between key and value is interpreted like: value depends on key.

When the cell is updated you do the following:

Cell cell = updated;
do {
     recalculate(cell);
} while( (cell = cells.get(cell)) != null);

Did it help ?





2016-05-27 16:49 GMT+03:00 pragmaticbigdata <[hidden email]>:
Hello,

I have started exploring apache ignite by following the introductory videos.
It looks quite promising and I wanted to understand if it will be well
suited for the use case I brief out below. If so, I would be glad to hear
out on how could I approach it

The use case is

We are trying to implement cell level dependencies within multiple tables.
It's similar to the functionality excel offers just that in our case it
could span across multiple tables. Imagine multiple cells in an excel
worksheet having interdependent formula's where updating a value in one cell
causes another cell value to change and that cell update causes another cell
value to update and so on and so forth. It is kind of graph of dependencies
that determines what is the next row cell that needs to be updated.

With apache ignite, what api's and data structures could I use to maintain a
graph of inter-dependencies between multiple tables? Note that these would
be metadata dependencies and not data.

I would appreciate to get inputs on this.

Thanks,
Amit



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Of course you can store result of recalculation in the temp variable if it's needed for next recalculation.


2016-05-27 18:34 GMT+03:00 Alexei Scherbakov <[hidden email]>:
Hi,

You can store cell dependencies as Ignite's data grid of

IgniteCache<Cell, Cell> cells = ...

where relation between key and value is interpreted like: value depends on key.

When the cell is updated you do the following:

Cell cell = updated;
do {
     recalculate(cell);
} while( (cell = cells.get(cell)) != null);

Did it help ?





2016-05-27 16:49 GMT+03:00 pragmaticbigdata <[hidden email]>:
Hello,

I have started exploring apache ignite by following the introductory videos.
It looks quite promising and I wanted to understand if it will be well
suited for the use case I brief out below. If so, I would be glad to hear
out on how could I approach it

The use case is

We are trying to implement cell level dependencies within multiple tables.
It's similar to the functionality excel offers just that in our case it
could span across multiple tables. Imagine multiple cells in an excel
worksheet having interdependent formula's where updating a value in one cell
causes another cell value to change and that cell update causes another cell
value to update and so on and so forth. It is kind of graph of dependencies
that determines what is the next row cell that needs to be updated.

With apache ignite, what api's and data structures could I use to maintain a
graph of inter-dependencies between multiple tables? Note that these would
be metadata dependencies and not data.

I would appreciate to get inputs on this.

Thanks,
Amit



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov



--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

1.  I don't undestand "Table1.Column1 depends on Table2.Column2"
In Excel one cell depends on another, not column or row.
You can define table per cache as follows

IgniteCache<Integer, List<Cell>> table1 = ... // key is the row number of table

and have cell dependencies cache as described earlier: IgniteCache<Cell, Cell> deps = ...

Ignite fully supports ACID transactions [1] so keeping data in sync is not a problem.
Formula should be stored inside Cell.

2. I already proposed a solution for resolving cell dependencies.

3. Ignite supports collocation processing. Refer [2] for details.

[1] https://apacheignite.readme.io/docs/transactions
[2] https://apacheignite.readme.io/docs/affinity-collocation




2016-05-27 20:56 GMT+03:00 Amit Shah <[hidden email]>:
Thanks for the replies.

1. With IgniteCache<Cell, Cell> I guess you have data dependencies in mind. By data dependencies I mean the Cell instance would contain actual data. Maintaining a graph of data dependencies like Excel does would not be practical because our use case could have millions on rows in a table and there could be many such tables. Keeping the graph in sync with the data updates would be one of the challenges. Creating and maintaining the graph would be another challenge. Hence I mentioned about having a graph of metadata in my initial post. For e.g. the graph could look like Table1.Column1 depends on Table2.Column2. The formula needs to be stored somewhere, somehow?

2. The other challenge would be determining the next set of rows to be updated i.e. assume that 5 rows of table 1 cause an update on another 50 rows of table 2. The row keys of table 1 of those 5 rows determine which rows of table 2 need to be updated. How do we handle this in ignite efficiently? 

3. How can I take the advantage of co-located processing with Ignite? Assuming that the subsequent table to be updated is on the same node as the previous it would be a good optimization to ship the update query on that node.

Thank you,
Amit.


On Fri, May 27, 2016 at 9:08 PM, Alexei Scherbakov <[hidden email]> wrote:
Of course you can store result of recalculation in the temp variable if it's needed for next recalculation.


2016-05-27 18:34 GMT+03:00 Alexei Scherbakov <[hidden email]>:
Hi,

You can store cell dependencies as Ignite's data grid of

IgniteCache<Cell, Cell> cells = ...

where relation between key and value is interpreted like: value depends on key.

When the cell is updated you do the following:

Cell cell = updated;
do {
     recalculate(cell);
} while( (cell = cells.get(cell)) != null);

Did it help ?





2016-05-27 16:49 GMT+03:00 pragmaticbigdata <[hidden email]>:
Hello,

I have started exploring apache ignite by following the introductory videos.
It looks quite promising and I wanted to understand if it will be well
suited for the use case I brief out below. If so, I would be glad to hear
out on how could I approach it

The use case is

We are trying to implement cell level dependencies within multiple tables.
It's similar to the functionality excel offers just that in our case it
could span across multiple tables. Imagine multiple cells in an excel
worksheet having interdependent formula's where updating a value in one cell
causes another cell value to change and that cell update causes another cell
value to update and so on and so forth. It is kind of graph of dependencies
that determines what is the next row cell that needs to be updated.

With apache ignite, what api's and data structures could I use to maintain a
graph of inter-dependencies between multiple tables? Note that these would
be metadata dependencies and not data.

I would appreciate to get inputs on this.

Thanks,
Amit



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov



--

Best regards,
Alexei Scherbakov




--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Hi Amit.

1) Performance degradation may be caused by long GC pauses.
Please check [1] for hints on how to set up GC settings properly.
If you plan to have very large caches, I recommend using OFFHEAP_TIERED mode [2]

I assume you take the aproach like IgniteCache<Integer, List<Cell>>
How many cells do you have in single row ?

2) Try to use cache.invokeAll on all keys at once.


2016-06-02 9:26 GMT+03:00 pragmaticbigdata <[hidden email]>:
Hi Alexei, I was able to implement this custom logic based on your guidance
below. Thanks for that. I do experience a couple of performance issues.

1. With a caching having 1 million entries, updating 11k entities 5 times
(cached entities are updated multiple times in the application) took 1 min.
The cluster configuration includes 5 server nodes with 16 cpu cores and 15
GB RAM. The cache is a partitioned cache with 0 backups. Can these timings
be improved?

2. I tried using the compute colocation feature by having a affinityRun()
but that seems to be degrading the performance otherwise. Making an affinity
call through affinityRun() method by passing a Callable resulted into a poor
performance compared to the default execution. The code does a localPeek and
updates that cache entry along with returning one of the properties from the
cached object. Would you have inputs on what could be the problem? The code
looks like below

        for(String key : keyValues) {

outputValues.add(ignite.compute().affinityCall(productCache.getName(), key,
() -> updateCacheEntry(key, productCache.localPeek(key))));
        }

Kindly let me know your suggestions.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5369.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

1. What's the memory size of ProductDetail?

2. Possibly the coding error. Share the code, I'll take a look.

2016-06-02 13:55 GMT+03:00 pragmaticbigdata <[hidden email]>:
Thanks Alexei for the responses

1. Ok I will try out the GC settings and off heap memory usage.

I have a cache of IgniteCache<String, ProductDetails> where ProductDetails
is my custom model. I have implemented custom logic using directed acyclic
graphs.

2. I tried executing it with cache.invokeAll. The first run failed with an
NPE where the code that is to be executed on remotely on the node where the
cache entry resides got the cache entry null. Wonder what could be wrong?
Also doesn't  EntryProcessor
<http://apacheignite.gridgain.org/docs/affinity-collocation>  slow
performance wise when compared to the affinity.call() method since it takes
locks on the keys before executing the custom code.

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5378.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Alexei, what do you think about the object size and the affinity code?

Thanks,
Amit.
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

I tuned the application by batching the cache updates and making the query use the index. Wasn't able to make affinity calls work.

Alexei, can you please provide your inputs on the affinity code?
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Hi, Amit.

I think the object size is OK for performance.

As for NPE, I think you should check if the entry exists in case you passed the key which is not contained in cache:

if (mutableEntry.exists()) {
       mutableEntry.setValue(...)
}

Check the javadoc for igniteCache.invoke about surrogate entries.

I don't see any "affinity code" in the provided sample.
Read here about affinity [1]

BTW, if you need to load many entries into cache and don't require transactions, you should use DataStreamer API [2]



2016-06-07 12:56 GMT+03:00 pragmaticbigdata <[hidden email]>:
I tuned the application by batching the cache updates and making the query
use the index. Wasn't able to make affinity calls work.

Alexei, can you please provide your inputs on the affinity code?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5478.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Hi, Amit.

1. Usually we are talking about affinity collocation in the context of multiple caches having related data on the same node.
Nothing wrong in your understanding of how the EntryProcessor works, it's a just a special case.

2. Try to increase heap size.

3. Yes.

4. invoke operations are atomic by default. Either value is updated or not.

2016-06-08 21:00 GMT+03:00 pragmaticbigdata <[hidden email]>:
1. The code attempts to fetch the cache entry, update it and return an
attribute of that cache entry. Assuming it would be faster to perform this
operation on the node where the data resides, I was trying out affinity
collocation. Kindly correct me if my assumption is wrong.

2. I added the if check as you suggested and the code executes successfully
if the cache is small. When I preload the cache with 1 million entries, one
of the nodes in the cluster crashes with a "java.lang.OutOfMemoryError: GC
overhead limit exceeded" error and after that the node on which the main
thread was running also crashes. The logs for the node which crashes with
OOME are  ignite-d5c3ec0c.log
<http://apache-ignite-users.70518.x6.nabble.com/file/n5538/ignite-d5c3ec0c.log>
shared.


3. "I don't see any "affinity code" in the provided sample."

I didn't follow what you meant to say here. Is my understanding of affinity
from point 1 correct?

4. The code that updates the cache needs to be executed in a transaction. I
was planning to add transactions as a second step in my POC after compute
collocation works.

Kindly let me know your inputs.

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5538.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Alexei Scherbakov Alexei Scherbakov
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

1. Affinity function is always present.
By default it's RendezvousAffinityFunction with 1024 partitions.

2 OutOfMemory error is only possible when you are running out of free space in heap
and GC was not able to clean some on allocation request.

Make sure you tune the GC properly as described here [1]



2016-06-10 14:40 GMT+03:00 pragmaticbigdata <[hidden email]>:
Thanks Alexei for your inputs.

1. How does the EntryProcessor detect which node does the data reside given
the key? I question it because I have configured I have PARTITIONED cache
for which I haven't set any affinity function. It is partitioning the cache
based on the hash function of the cached object. I didn't not follow how
does ignite detect the partition just given the cache key?

2. I doubt that the EntryProcessor code is failing because memory is
insufficient. I pulled out the node statistics before starting the test and
verified that there is sufficient heap space available. Please find the
statistics as below.

visor> node
Select node from:
+=======================================================================================+
| # |       Node ID8(@), IP        | Node Type | Up Time  | CPUs | CPU Load
| Free Heap |
+=======================================================================================+
| 0 | 9B989E4C(@n0), <ip1> | Server    | 01:59:48 | 2    | 0.33 %   | 89.00
%   |
| 1 | 1ED58F00(@n2), <ip2>  | Server    | 01:59:12 | 4    | 0.17 %   | 77.00
%   |
| 2 | 56214422(@n3), <ip3>  | Server    | 01:57:42 | 2    | 0.33 %   | 92.00
%   |
| 3 | A57219B6(@n1), <ip4>  | Server    | 01:57:25 | 4    | 0.50 %   | 87.00
%   |


Could you detail on how do EntryProcessor's work? If it was suppose to
execute on the node where the data resides why does crash or take
exponentially more time than it would take with executing any affinity code?

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5572.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Simulating Graph Dependencies With Ignite

Hi,

java.lang.OutOfMemoryError: GC overhead limit exceeded

This kind of OOM exception says that your application spends almost all its time garbage collecting the heap. The first obvious reason is a small heap size and high allocation rate of objects in the heap. Please refer to the page Alexei provided you below and adjust your heap size accordingly.

Denis

On Jun 10, 2016, at 4:46 PM, Alexei Scherbakov <[hidden email]> wrote:

1. Affinity function is always present.
By default it's RendezvousAffinityFunction with 1024 partitions.

2 OutOfMemory error is only possible when you are running out of free space in heap
and GC was not able to clean some on allocation request.

Make sure you tune the GC properly as described here [1]



2016-06-10 14:40 GMT+03:00 pragmaticbigdata <[hidden email]>:
Thanks Alexei for your inputs.

1. How does the EntryProcessor detect which node does the data reside given
the key? I question it because I have configured I have PARTITIONED cache
for which I haven't set any affinity function. It is partitioning the cache
based on the hash function of the cached object. I didn't not follow how
does ignite detect the partition just given the cache key?

2. I doubt that the EntryProcessor code is failing because memory is
insufficient. I pulled out the node statistics before starting the test and
verified that there is sufficient heap space available. Please find the
statistics as below.

visor> node
Select node from:
+=======================================================================================+
| # |       Node ID8(@), IP        | Node Type | Up Time  | CPUs | CPU Load
| Free Heap |
+=======================================================================================+
| 0 | 9B989E4C(@n0), <ip1> | Server    | 01:59:48 | 2    | 0.33 %   | 89.00
%   |
| 1 | 1ED58F00(@n2), <ip2>  | Server    | 01:59:12 | 4    | 0.17 %   | 77.00
%   |
| 2 | 56214422(@n3), <ip3>  | Server    | 01:57:42 | 2    | 0.33 %   | 92.00
%   |
| 3 | A57219B6(@n1), <ip4>  | Server    | 01:57:25 | 4    | 0.50 %   | 87.00
%   |


Could you detail on how do EntryProcessor's work? If it was suppose to
execute on the node where the data resides why does crash or take
exponentially more time than it would take with executing any affinity code?

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Simulating-Graph-Dependencies-With-Ignite-tp5282p5572.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--

Best regards,
Alexei Scherbakov