Ignite 2.8.0. Heap mem issue

classic Classic list List threaded Threaded
10 messages Options
dbutkovic dbutkovic
Reply | Threaded
Open this post in threaded view
|

Ignite 2.8.0. Heap mem issue

Hello,
I recently installed Ignite 2.8.0 on one node for test pourpose and noticed
Heap mem issue that I didn't have on 2.7.6.
The Ignite configuration of dataStorageConfiguration /
DataRegionConfiguration is identical.
In production I have two nodes (2.7.6) and for test purpose (2.8.0) only one
node.
The application consists of Apache Flume that inserts data into the cache
and several Python scripts that read/write data from/into tables using SQL
queries.
Does anyone have experience with this behavior on Ignite 2.8.0. ?
I tested using pyignite 0.3.4 and pygridgain 1.1.0.
The same issue occurs on GridGain CE 8.7.12.
On Ignite 2.7.5 with same application there is no issues with Heap mem.
The screenshots below show that on Ignite 2.7.6 Heap Mem is always within
normal ranges.
One second screenshot Ignite 2.8.0. Heap mem only grows and application
stops when the OOM happens. :-(

JVM_OPTS
-XX:+UseG1GC
-XX:+AlwaysPreTouch
-XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC
-Xms512m -Xmx1g

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=1ce47741, name=UMBOSS, uptime=00:46:08.050]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
    ^-- CPU [cur=52.07%, avg=12.77%, GC=24.1%]
    ^-- PageMemory [pages=115217]
    ^-- Heap [used=1001MB, free=2.15%, comm=1024MB]
    ^-- Off-heap [used=455MB, free=80.61%, comm=1480MB]
    ^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
    ^--   metastoreMemPlc region [used=0MB, free=99.95%, comm=0MB]
    ^--   PersistDataRegion region [used=440MB, free=56.95%, comm=1024MB]
    ^--   TxLog region [used=0MB, free=100%, comm=100MB]
    ^--   DefaultDataRegion region [used=14MB, free=98.59%, comm=256MB]
    ^-- Ignite persistence [used=479MB]
    ^--   sysMemPlc region [used=0MB]
    ^--   metastoreMemPlc region [used=0MB]
    ^--   PersistDataRegion region [used=479MB]
    ^--   TxLog region [used=0MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=1, idle=4, qSize=0]



Best regards

Dren

<http://apache-ignite-users.70518.x6.nabble.com/file/t2557/Ignite_Heap_mem_issue.jpg>



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

Hi,

Can you share configuration or even a code that you use? It would be nice to have a reproducer for this.

Thanks,
Evgenii

вт, 17 мар. 2020 г. в 03:27, dbutkovic <[hidden email]>:
Hello,
I recently installed Ignite 2.8.0 on one node for test pourpose and noticed
Heap mem issue that I didn't have on 2.7.6.
The Ignite configuration of dataStorageConfiguration /
DataRegionConfiguration is identical.
In production I have two nodes (2.7.6) and for test purpose (2.8.0) only one
node.
The application consists of Apache Flume that inserts data into the cache
and several Python scripts that read/write data from/into tables using SQL
queries.
Does anyone have experience with this behavior on Ignite 2.8.0. ?
I tested using pyignite 0.3.4 and pygridgain 1.1.0.
The same issue occurs on GridGain CE 8.7.12.
On Ignite 2.7.5 with same application there is no issues with Heap mem.
The screenshots below show that on Ignite 2.7.6 Heap Mem is always within
normal ranges.
One second screenshot Ignite 2.8.0. Heap mem only grows and application
stops when the OOM happens. :-(

JVM_OPTS
-XX:+UseG1GC
-XX:+AlwaysPreTouch
-XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC
-Xms512m -Xmx1g

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=1ce47741, name=UMBOSS, uptime=00:46:08.050]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
    ^-- CPU [cur=52.07%, avg=12.77%, GC=24.1%]
    ^-- PageMemory [pages=115217]
    ^-- Heap [used=1001MB, free=2.15%, comm=1024MB]
    ^-- Off-heap [used=455MB, free=80.61%, comm=1480MB]
    ^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
    ^--   metastoreMemPlc region [used=0MB, free=99.95%, comm=0MB]
    ^--   PersistDataRegion region [used=440MB, free=56.95%, comm=1024MB]
    ^--   TxLog region [used=0MB, free=100%, comm=100MB]
    ^--   DefaultDataRegion region [used=14MB, free=98.59%, comm=256MB]
    ^-- Ignite persistence [used=479MB]
    ^--   sysMemPlc region [used=0MB]
    ^--   metastoreMemPlc region [used=0MB]
    ^--   PersistDataRegion region [used=479MB]
    ^--   TxLog region [used=0MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=1, idle=4, qSize=0]



Best regards

Dren

<http://apache-ignite-users.70518.x6.nabble.com/file/t2557/Ignite_Heap_mem_issue.jpg>



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

This post was updated on .
In reply to this post by dbutkovic
Dren,
  Can you please attach the Flume/Ignite configs you are using and the
relevant code.
  We will check this use case in 2.8.0. Please attach your server log/and GC
logs.

   GC logs info here:
https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/troubleshooting#detailed-gc-logs

   
  Attach Heap Dump as well, if you can. This will help us to diagnose the
issue faster.

heap dump:
https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/troubleshooting#heap-dumps
to take dump of running process: jmap
-dump:live,format=b,file=/tmp/dump.hprof 12587

   
Thanks, Alex



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dbutkovic dbutkovic
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

In reply to this post by ezhuravlev
Hi Evgeny,

I can share pseudo code, not all developed code and python modules.
Syslog messages are inseted into cache via Flume.
Python daemons process messages ( read from CACHE ) and make
insert/update/delete in fiew Ignite tables via SQL.

Number of syslog messages and python code is same on prod and test.

##############################################

# scan CACHE, fetch k/v from CACHE, SQL insert/update/delete ...


        while True:
       
                try:
                        ig_connection = new_ignite_connection(config_params, logger)
                        stream_cache = ig_connection.get_cache("XY_CACHE_NAME") # cache is
created by code with get_or_create_cache
                                       
                        CACHE_SIZE = stream_cache.get_size()
                       
                        if CACHE_SIZE > 0:
                       
                                # returns a generator, that yields two-tuples of key and value
                                cache_list = stream_cache.scan()
                               
                                        for k, v in cache_list:
                               
                                                # parse k, v and make some INSERT/UPDATE in TABLES
                                                # TABLE ALERTS ~ 50 colums
                                                SQL_EVENT_INSERT_QUERY = "INSERT INTO ALERTS (%s) VALUES (%s)" %
(alerts_insert_column_names(), alerts_insert_parameters() )
                               
                                                ig_connection.sql(SQL_EVENT_INSERT_QUERY, query_args =
var_EVENT_INSERT_QUERY)
                                               
                                                # remove
                                                stream_cache.remove_key(k)
                       
                        ig_connection.close()
                               
                        time.sleep(POLL_INTERVAL)
                               
                except (OSError, SocketError) as e:
                        logger.error('ERROR in client connection: %s , client status: %s ' %(e,
ig_connection))
                       
                       


##############################################

<http://apache-ignite-users.70518.x6.nabble.com/file/t2557/Ignite_prod_test_HLD.png>




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dbutkovic dbutkovic
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

In reply to this post by akorensh
Hi Alex,

i did another test and collected all the logs, GC logs, Heap mem dump, fiew
screenshots.

All files are in zip file. File is to big for upload, please download from
Jumbo mail link.

https://jumboiskon.tportal.hr/download/eeab9848-2494-4ab7-a2cb-88766db0fafa

Thanks, Dren



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

Hi,

I tried to reproduce the same problem using the provided information, but without success. In the log I see that there are operations running on the cluster that were not described in pseudo code, for example, SQL queries withh count. Are you running something else? Can you provide full reproducer or try to run all operations by one to find the operation, which can cause this behaviour?

Evgenii

ср, 18 мар. 2020 г. в 07:07, dbutkovic <[hidden email]>:
Hi Alex,

i did another test and collected all the logs, GC logs, Heap mem dump, fiew
screenshots.

All files are in zip file. File is to big for upload, please download from
Jumbo mail link.

https://jumboiskon.tportal.hr/download/eeab9848-2494-4ab7-a2cb-88766db0fafa

Thanks, Dren



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Andrey Davydov Andrey Davydov
Reply | Threaded
Open this post in threaded view
|

RE: Re: Ignite 2.8.0. Heap mem issue

In reply to this post by dbutkovic

As I see in heapDump:

 

There are 29000+ connections stored here:

 

this     - value: org.apache.ignite.internal.processors.query.h2.H2ConnectionWrapper #1

<- key     - class: java.util.concurrent.ConcurrentHashMap$Node, value: org.apache.ignite.internal.processors.query.h2.H2ConnectionWrapper #1

  <- [41919]     - class: java.util.concurrent.ConcurrentHashMap$Node[], value: java.util.concurrent.ConcurrentHashMap$Node #32575

   <- table     - class: java.util.concurrent.ConcurrentHashMap, value: java.util.concurrent.ConcurrentHashMap$Node[] #11010

    <- detachedConns     - class: org.apache.ignite.internal.processors.query.h2.ConnectionManager, value: java.util.concurrent.ConcurrentHashMap #2109

 

But there are only 28 TCP connections in system.

 

I have already report some way for memory leaks in ConnectionManager in other mail thread, this is one more way. It seem that more problems is in ConnectionManager class.

 

 

Andrey.

 

От: [hidden email]
Отправлено: 23 марта 2020 г. в 20:37
Кому: [hidden email]
Тема: Re: Ignite 2.8.0. Heap mem issue

 

Hi,

 

I tried to reproduce the same problem using the provided information, but without success. In the log I see that there are operations running on the cluster that were not described in pseudo code, for example, SQL queries withh count. Are you running something else? Can you provide full reproducer or try to run all operations by one to find the operation, which can cause this behaviour?

 

Evgenii

 

ср, 18 мар. 2020 г. в 07:07, dbutkovic <[hidden email]>:

Hi Alex,

i did another test and collected all the logs, GC logs, Heap mem dump, fiew
screenshots.

All files are in zip file. File is to big for upload, please download from
Jumbo mail link.

https://jumboiskon.tportal.hr/download/eeab9848-2494-4ab7-a2cb-88766db0fafa

Thanks, Dren



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

dbutkovic dbutkovic
Reply | Threaded
Open this post in threaded view
|

RE: Re: Ignite 2.8.0. Heap mem issue

Hi,

in attachment is code for heap mem issue simulation. Ignite_2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2557/Ignite_2.zip>  

# create cache TEST and insert random data into cache TEST
test_insert_in_cache.py

# read data from cache TEST and inset/update data in table TEST_TABLE via
SQL
read_cache_insert_update_table.py


Best regads

Dren




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Andrey Davydov Andrey Davydov
Reply | Threaded
Open this post in threaded view
|

RE: RE: Re: Ignite 2.8.0. Heap mem issue

In reply to this post by dbutkovic

It seems detached connection NEVER become attached to thread other it was born. Because borrow method always return object related to caller thread. I.e. all detached connection borned in joined thread are not collectable forewer.

 

So possible reproduce scenario: start separate thread. Run in this thread some logic that creates detached connection, finish and join thread. Remove link to thread. Repeat.

 

Andrey.

 

От: [hidden email]
Отправлено: 23 марта 2020 г. в 21:37
Кому: [hidden email]
Тема: RE: Re: Ignite 2.8.0. Heap mem issue

 

As I see in heapDump:

 

There are 29000+ connections stored here:

 

this     - value: org.apache.ignite.internal.processors.query.h2.H2ConnectionWrapper #1

<- key     - class: java.util.concurrent.ConcurrentHashMap$Node, value: org.apache.ignite.internal.processors.query.h2.H2ConnectionWrapper #1

  <- [41919]     - class: java.util.concurrent.ConcurrentHashMap$Node[], value: java.util.concurrent.ConcurrentHashMap$Node #32575

   <- table     - class: java.util.concurrent.ConcurrentHashMap, value: java.util.concurrent.ConcurrentHashMap$Node[] #11010

    <- detachedConns     - class: org.apache.ignite.internal.processors.query.h2.ConnectionManager, value: java.util.concurrent.ConcurrentHashMap #2109

 

But there are only 28 TCP connections in system.

 

I have already report some way for memory leaks in ConnectionManager in other mail thread, this is one more way. It seem that more problems is in ConnectionManager class.

 

 

Andrey.

 

От: [hidden email]
Отправлено: 23 марта 2020 г. в 20:37
Кому: [hidden email]
Тема: Re: Ignite 2.8.0. Heap mem issue

 

Hi,

 

I tried to reproduce the same problem using the provided information, but without success. In the log I see that there are operations running on the cluster that were not described in pseudo code, for example, SQL queries withh count. Are you running something else? Can you provide full reproducer or try to run all operations by one to find the operation, which can cause this behaviour?

 

Evgenii

 

ср, 18 мар. 2020 г. в 07:07, dbutkovic <[hidden email]>:

Hi Alex,

i did another test and collected all the logs, GC logs, Heap mem dump, fiew
screenshots.

All files are in zip file. File is to big for upload, please download from
Jumbo mail link.

https://jumboiskon.tportal.hr/download/eeab9848-2494-4ab7-a2cb-88766db0fafa

Thanks, Dren



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

 

 

Taras Ledkov Taras Ledkov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite 2.8.0. Heap mem issue

In reply to this post by dbutkovic

Hi,

Thanks a lot for your reproducer.
It was not entirely clear to me, but very useful.


I've reproduced and discovered the issue.
The new ticket [1] is created and will be fixed ASAP.

Now you can try to use workaround:
- use constant values at the INSERT command;
- insert several rows by one query.
   e.g: INSERT INTO <tbl> VALUES (<fields values>), (<fields values>)


[1]. https://issues.apache.org/jira/browse/IGNITE-12848

On 18.03.2020 17:07, dbutkovic wrote:
Hi Alex, 

i did another test and collected all the logs, GC logs, Heap mem dump, fiew
screenshots.

All files are in zip file. File is to big for upload, please download from
Jumbo mail link.

https://jumboiskon.tportal.hr/download/eeab9848-2494-4ab7-a2cb-88766db0fafa

Thanks, Dren



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
-- 
Taras Ledkov
Mail-To: [hidden email]