Ignite DataStreamer Memory Problems

classic Classic list List threaded Threaded
19 messages Options
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Ignite DataStreamer Memory Problems

I seem to be running into some sort of memory issues with my DataStreamers
and I'd like to get a better idea of how they work behind the scenes to
troubleshoot my problem.

I have a cluster of 4 nodes, each of which is pulling files from S3 over an
extended period of time and loading the contents. Each new opens up a new
DataStreamer, loads its contents and closes the DataStreamer. At most each
cache has 4 DataStreamers writing to 4 different caches simultaneously. A
new DataStreamer isn't created until the last one on that thread is closed.
I wait for the futures to complete, then close the DataStreamer. So far so
good.

After my nodes are running for a few hours, one or more inevitably ends up
crashing. Sometimes the Java heap overflows and Java exits, and sometimes
Java is killed by the kernel because of an OOM error.

Here are my specs per node:
Total Available Memory: 110GB
Memory Assigned to All Data Regions: 50GB
Total Checkpoint Page Buffers: 5GB
Java Heap: 25GB

Does DataStreamer.close block until data is loaded into the cache on remote
nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
progress loading data in the cache on the remote nodes/replicas, so I can
slow down my DataStreamers to keep pace?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Hello!

DataStreamer WILL block until all data is loaded in caches.

The recommendation here is probably reducing perNodeParallelOperations(), streamerBufferSize() and perThreadBufferSize(), and flush()ing your DataStreamer frequently to avoid data build-ups in temporary data structures of DataStreamer. Or maybe, if you have a few entries which are very large, you can just use Cache API to populate those.

Regards,
--
Ilya Kasnacheev


вс, 14 апр. 2019 г. в 18:45, kellan <[hidden email]>:
I seem to be running into some sort of memory issues with my DataStreamers
and I'd like to get a better idea of how they work behind the scenes to
troubleshoot my problem.

I have a cluster of 4 nodes, each of which is pulling files from S3 over an
extended period of time and loading the contents. Each new opens up a new
DataStreamer, loads its contents and closes the DataStreamer. At most each
cache has 4 DataStreamers writing to 4 different caches simultaneously. A
new DataStreamer isn't created until the last one on that thread is closed.
I wait for the futures to complete, then close the DataStreamer. So far so
good.

After my nodes are running for a few hours, one or more inevitably ends up
crashing. Sometimes the Java heap overflows and Java exits, and sometimes
Java is killed by the kernel because of an OOM error.

Here are my specs per node:
Total Available Memory: 110GB
Memory Assigned to All Data Regions: 50GB
Total Checkpoint Page Buffers: 5GB
Java Heap: 25GB

Does DataStreamer.close block until data is loaded into the cache on remote
nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
progress loading data in the cache on the remote nodes/replicas, so I can
slow down my DataStreamers to keep pace?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

I'm confused. If the DataStreamer blocks until all data is loaded into remote
caches and I'm only ever running a fixed number of DataStreamers (4 max),
which close after they read a single file of a more or less fixed length
each time (no more than 200MB; e.g. I shouldn't have more than 800MB +
additional Ignite Metadata at any point in my DataStreamers), I shouldn't be
seeing a gradual build-up of memory, but that's what I'm seeing.

Maybe I should have said before that this is a persistent cache and the
problem starts at some point after I've run out of memory in my data regions
(not immediately, but hours later).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Hello!

I suggest collecting a heap dump and taking a long look towards it.

Regards,
--
Ilya Kasnacheev


пн, 15 апр. 2019 г. в 15:35, kellan <[hidden email]>:
I'm confused. If the DataStreamer blocks until all data is loaded into remote
caches and I'm only ever running a fixed number of DataStreamers (4 max),
which close after they read a single file of a more or less fixed length
each time (no more than 200MB; e.g. I shouldn't have more than 800MB +
additional Ignite Metadata at any point in my DataStreamers), I shouldn't be
seeing a gradual build-up of memory, but that's what I'm seeing.

Maybe I should have said before that this is a persistent cache and the
problem starts at some point after I've run out of memory in my data regions
(not immediately, but hours later).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

A heap dump won't address non-heap memory issues, which is what I'm most
often running into. Where are places that memory build up can take place
with Ignite that is not in Durable Memory or Heap Memory?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

So I've done a heap dump and recorded heap metrics while running my
DataStreamers and the heap doesn't appear to be the problem here. Ignite
operates normally for several hours without the heap size ever reaching its
max. My durable memory also seems to be behaving as expected. While looking
at the output of top, however, I notice a gradual increase in memory above
the sum total of heap + durable memory, which continues to increase for
several hours until my kubernetes pod hits its memory limit and is killed.
My guess is this is an NIO problem.

I suppose this could originate from the Avro files I'm loading from S3, and
I'm investigating this, but I'd like to rule out there being a problem on
the Ignite end. Do DataStreamers use NIO and is there anyway these could end
up "leaking" memory, and if so, are there configuration parameters or best
practices I could use to prevent this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

After doing additional tests to isolate the issue, it looks like Ignite is
having a problem releasing Internal memory of cache objects passed into the
NIO ByteBuffers that back the DataStreamer objects. At first I thought this
might be on account of my Avro's ByteBuffers that get transformed into byte
arrays before being loaded into the Ignite DataStreamers, but I can run my
application without the DataStreamers (otherwise exactly the same) and there
is not memory leak.

I've posted more about it on StackOverflow:
https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer

I'm trying to productionalize an Ignite Cluster in Kubernetes and can't move
forward until I can solve this problem. Is there anyone who's used
DataStreamers to do heavy write loads in a k8s environment who has any
insight into what would be causing this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Looping in the dev list.

Community does it remind you any memory leak addressed in the master? What do we need to get down to the issue.

Denis

On Friday, April 19, 2019, kellan <[hidden email]> wrote:
After doing additional tests to isolate the issue, it looks like Ignite is
having a problem releasing Internal memory of cache objects passed into the
NIO ByteBuffers that back the DataStreamer objects. At first I thought this
might be on account of my Avro's ByteBuffers that get transformed into byte
arrays before being loaded into the Ignite DataStreamers, but I can run my
application without the DataStreamers (otherwise exactly the same) and there
is not memory leak.

I've posted more about it on StackOverflow:
https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer

I'm trying to productionalize an Ignite Cluster in Kubernetes and can't move
forward until I can solve this problem. Is there anyone who's used
DataStreamers to do heavy write loads in a k8s environment who has any
insight into what would be causing this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


--
--
Denis Magda

kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Update: I've been able to confirm a couple more details:

1. I'm experiencing the same leak with put, putAll as I am with the
DataStreamer
2. The problem is resolved when persistence is turned off



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Denis Magda-2 Denis Magda-2
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Hello, 

Copying Evgeniy and Stan, our community experts who'd guide you through. In the meantime, please try to capture the OOM with this approach:

-
Denis


On Sun, Apr 21, 2019 at 8:49 AM kellan <[hidden email]> wrote:
Update: I've been able to confirm a couple more details:

1. I'm experiencing the same leak with put, putAll as I am with the
DataStreamer
2. The problem is resolved when persistence is turned off



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

I've put a full answer on SO - https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer/55786023#55786023.

In short, so far it doesn't look like a memory leak to me - just a misconfiguration.
There is a memory pool in JVM for direct memory buffers which is by default bounded by the value of `-Xmx`. Most applications would use minuscule amount of it, but in some it can grow - and grow to the size of the heap, making your total Java usage not roughly `heap + data region` but `heap * 2 + data region`.

Set walSegmentSize=64mb and -XX:MaxDirectMemorySize=256mb and I think it's going to be OK.

Stan

On Sun, Apr 21, 2019 at 11:51 AM Denis Magda <[hidden email]> wrote:
Hello, 

Copying Evgeniy and Stan, our community experts who'd guide you through. In the meantime, please try to capture the OOM with this approach:

-
Denis


On Sun, Apr 21, 2019 at 8:49 AM kellan <[hidden email]> wrote:
Update: I've been able to confirm a couple more details:

1. I'm experiencing the same leak with put, putAll as I am with the
DataStreamer
2. The problem is resolved when persistence is turned off



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

No luck with the changed configuration. Memory still continues to rise until
the Kubernetes limit (110GB), then crashes. This is output I pulled from
jcmd at some point before the crash. I can post the detailed memory report
if that helps.

Total: reserved=84645150KB, committed=83359362KB
-                 Java Heap (reserved=25165824KB, committed=25165824KB)
                            (mmap: reserved=25165824KB,
committed=25165824KB)
 
-                     Class (reserved=1121992KB, committed=80356KB)
                            (classes #11821)
                            (malloc=1736KB #20912)
                            (mmap: reserved=1120256KB, committed=78620KB)
 
-                    Thread (reserved=198099KB, committed=198099KB)
                            (thread #193)
                            (stack: reserved=197248KB, committed=197248KB)
                            (malloc=626KB #975)
                            (arena=225KB #380)
 
-                      Code (reserved=260571KB, committed=65571KB)
                            (malloc=10971KB #16284)
                            (mmap: reserved=249600KB, committed=54600KB)
 
-                        GC (reserved=1047369KB, committed=1047369KB)
                            (malloc=80713KB #57810)
                            (mmap: reserved=966656KB, committed=966656KB)
 
-                  Compiler (reserved=597KB, committed=597KB)
                            (malloc=467KB #1235)
                            (arena=131KB #7)
 
-                  Internal (reserved=56763248KB, committed=56763248KB)
                            (malloc=56763216KB #1063361)
                            (mmap: reserved=32KB, committed=32KB)
 
-                    Symbol (reserved=17245KB, committed=17245KB)
                            (malloc=14680KB #138104)
                            (arena=2565KB #1)
 
-    Native Memory Tracking (reserved=20852KB, committed=20852KB)
                            (malloc=453KB #6407)
                            (tracking overhead=20399KB)
 
-               Arena Chunk (reserved=201KB, committed=201KB)
                            (malloc=201KB)
 
-                   Unknown (reserved=49152KB, committed=0KB)
                            (mmap: reserved=49152KB, committed=0KB)
 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Any suggestions from where I can go from here? I'd like to find a way to
isolate this problem before I have to look into another storage/grid
solutions. A lot of work has gone into integrating Ignite into our platform,
and I'd really hate to start from scratch. I can provide as much information
as needed to help pinpoint this problem/do additional tests on my end.

Are there any projects out there that have successfully run Ignite on
Kubernetes with Persistence and a high-volume write load?

I've been looking into using third-party persistence but we require SQL
queries to fetch the bulk of our data and it seems like this isn't really
possible with Cassandra, et al, unless I can know in advance what data needs
to be loaded into memory. Is that a safe assumption to make?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Stanislav Lukyanov Stanislav Lukyanov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Can you share your full configuration (Ignite config and JVM options) and the server logs of Ignite?

Which version of Ignite you use?

Can you confirm that on this version and configuration simply disabling Ignite persistence removes the problem?
If yes, can you try running with walMode=NONE? It will help to rule out at least some possibilities.

Also, if you can share a reproducer to this problem it should be easy for us to debug this.

Stan

On Tue, Apr 23, 2019 at 6:42 AM kellan <[hidden email]> wrote:
Any suggestions from where I can go from here? I'd like to find a way to
isolate this problem before I have to look into another storage/grid
solutions. A lot of work has gone into integrating Ignite into our platform,
and I'd really hate to start from scratch. I can provide as much information
as needed to help pinpoint this problem/do additional tests on my end.

Are there any projects out there that have successfully run Ignite on
Kubernetes with Persistence and a high-volume write load?

I've been looking into using third-party persistence but we require SQL
queries to fetch the bulk of our data and it seems like this isn't really
possible with Cassandra, et al, unless I can know in advance what data needs
to be loaded into memory. Is that a safe assumption to make?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Ignite Version: 2.7.0

Ignite Config:
https://gist.github.com/kellanburket/73971d076a9b2d4f001b073d02e2343a

Java Process: /opt/jdk/bin/java -XX:+AggressiveOpts
-XX:NativeMemoryTracking=detail -Xms24G -Xmx24G
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
-XX:MaxDirectMemorySize=256M -Duser.timezone=GMT
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=49112
-Dcom.sun.management.jmxremote.rmi.port=49112
-Djava.rmi.server.hostname=127.0.0.1 -DIGNITE_WAL_MMAP=true
-Djdk.nio.maxCachedBufferSize=262144 -DIGNITE_QUIET=true
-DIGNITE_SUCCESS_FILE=/opt/ignite/apache-ignite-2.7.0-bin/work/ignite_success_77a36388-73e4-4de6-9988-27e62775c3fc
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-DIGNITE_HOME=/opt/ignite/apache-ignite-2.7.0-bin
-DIGNITE_PROG_NAME=/opt/ignite/apache-ignite-2.7.0-bin/bin/ignite.sh -cp
/opt/ignite/apache-ignite-2.7.0-bin/libs/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-indexing/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-kubernetes/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spark/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spring/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-zookeeper/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/licenses/*
org.apache.ignite.startup.cmdline.CommandLineStartup
/opt/ignite/apache-ignite-2.7.0-bin/config/default-config.xml

I've already tried running with walMode=None, but I'll try it again just to
confirm

I'll put together a shareable reproducer today.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Here is a reproducible example of the DataStreamer memory leak:
https://github.com/kellanburket/ignite-leak

I've also added a public image to DockerHub: miraco/ignite:leak

This can be run on a machine with at least 22GB of memory available to
Docker and probably 50GB of storage between WAL and persistent storage, just
to be safe.
I'm following the guidelines here:
https://apacheignite.readme.io/docs/durable-memory-tuning#section-share-ram

10GB of Durable Memory
4GB of Heap

with a 22GB memory limit in Docker that adds up to about 63% of overall RAM

Now run this container: (You adjust the cpus as needed. I'm using AWS r4.4xl
nodes with 16 cores running Amazon Linux):

docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL
-m 22G --cpus=12 --memory-swappiness 0 --name ignite.leak -d
miraco/ignite:leak

I would expect memory usage to stabilize somewhere around 18-19GB (4GB Heap
+ 10GB Durable + 640M WAL + 2GB Checkpoint Buffer + 1-2GB jdk overhead), but
instead usage per docker stats rises to the container limit forcing an OOM
kill. Feel free to increase the memory limit above 22GB. Results should be
the same though it make take longer to get there.

Now this is interesting. If I replace the cache value type, which is
Array[Byte] with a Long and run it again, memory usage eventually stabilizes
at around 19-20GB:

docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL
-e VALUE_TYPE=ValueLong -m 22G --cpus=12 --memory-swappiness 0 --name
ignite.leak -d miraco/ignite:leak

Is there something I'm missing here, or is this a bug?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

The issue seems to be with the @QueryTextField annotation. Unless Lucene
indexes are supposed to be eating up all this memory, in which case it might
be worth improving your documentation.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

Hi,

Lucene indexes are stored in the heap, while I see that in reproducer you've limited heap size to 1gb. Are you sure that you used these JVM opts? Can you please share logs from your run, so I can check the heap usage?

Best Regards,
Evgenii

вт, 30 апр. 2019 г. в 00:23, kellan <[hidden email]>:
The issue seems to be with the @QueryTextField annotation. Unless Lucene
indexes are supposed to be eating up all this memory, in which case it might
be worth improving your documentation.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
kellan kellan
Reply | Threaded
Open this post in threaded view
|

Re: Ignite DataStreamer Memory Problems

At this point I've spent enough time on this problem and can move on with my
project without using @QueryTextField--I'm just letting anyone who's
concerned know what I've seen in case you want to probe into this issue any
further.

I've taken the time to write a reproducer that can be easily run on any
machine, go ahead and run it based on my instructions and you can see
whatever logs you'd like to see for yourself. It runs with 4GB of heap
default, not 1, though feel free to adjust it. With 10GB of durable memory
and 4GB of Heap and a 22GB memory limit on the container, it will consume
memory up until the limit triggering an OOM kill in Docker.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/