Heap memory estimation rules

classic Classic list List threaded Threaded
4 messages Options
hueb1 hueb1
Reply | Threaded
Open this post in threaded view
|

Heap memory estimation rules

Is there a rule of thumb metric for measuring how much heap space you'd need in order to store contents of a file of size X ?

For example, I am reading in a 1.1gb file of about 7 million lines which would equate to 7 million total cache entries.  My small two node cluster has total of 4gb heap space, but I hit the gc limit exception during data stream loading.  So is there a percentage we can expect that will be used in the heap other than for cache storage?  I would have though 4gb would be enough to store 1.1gb of data into a distributed cache...
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory estimation rules

Hi,

How many backups and indexes do you have?

You can roughly do the estimation like this:
- Get your original data size and double it. Cache can potentially have both serialized and deserialized form of each key and value, so the required memory doubles in the worst case.
- If you have backups, you should multiple by their quantity.
- Indexes also require memory. Usually they add around 30%, but it depends on how many you have them.
- Also you should add around 200-300MB per node for internals and some spare memory for GC to operate.

I would recommend you to take a look at offheap memory: https://apacheignite.readme.io/docs/off-heap-memory. This mode is more compact because it doesn't store deserialized values and also does not depend on GC. But in some cases it can cause ~20% performance degradation (this is true mostly for queries because they imply a lot of index lookups).

Does this make sense for you?

-Val
yakov yakov
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory estimation rules

I would say that memory consumption is individual and may vary depending on configuration parameters. I agree with Val's points, but I would also advice you to load your caches (with smaller test data set at least) and take a look at heap dumps.

--Yakov

2015-08-19 8:27 GMT+03:00 vkulichenko <[hidden email]>:
Hi,

How many backups and indexes do you have?

You can roughly do the estimation like this:
- Get your original data size and double it. Cache can potentially have both
serialized and deserialized form of each key and value, so the required
memory doubles in the worst case.
- If you have backups, you should multiple by their quantity.
- Indexes also require memory. Usually they add around 30%, but it depends
on how many you have them.
- Also you should add around 200-300MB per node for internals and some spare
memory for GC to operate.

I would recommend you to take a look at offheap memory:
https://apacheignite.readme.io/docs/off-heap-memory. This mode is more
compact because it doesn't store deserialized values and also does not
depend on GC. But in some cases it can cause ~20% performance degradation
(this is true mostly for queries because they imply a lot of index lookups).

Does this make sense for you?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Heap-memory-estimation-rules-tp1042p1046.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

hueb1 hueb1
Reply | Threaded
Open this post in threaded view
|

Re: Heap memory estimation rules

In reply to this post by vkulichenko
Ok thank you for the detailed explanation, definitely helps.  I didn't have any backups or indexes.  Will look into offheap memory.