OOME causing caches to be removed

classic Classic list List threaded Threaded
3 messages Options
djax djax
Reply | Threaded
Open this post in threaded view
|

OOME causing caches to be removed

This post was updated on .
Ignite version: 2.5
This is on a 2 node cluster in AWS with 32GB RAM and 8 cores each.

When executing the following SQL query, the query fills the JVM heap of each
node (specified at 10 GB each node) and exits afterwards. On restart, the
USER cache is removed from persistence (the cache on which the SELECT
statement is being performed)

This has happened to me before during similar queries. Any tips? Is this expected behavior?

SQL query:
INSERT INTO user_email
SELECT
email,max(sex),year(max(birthdate)),max(country),max(province),substr(email,position('@',email)+1)
FROM USER WHERE email IS NOT NULL GROUP BY email;

JVM options (excluding ignite libs and ignite home): -Xms18g -Xmx18g -server
-XX:+AggressiveOpts -XX:MaxMetaspaceSize=2048m
-Djava.net.preferIPv4Stack=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m -XX:MaxTenuringThreshold=0
-XX:SurvivorRatio=1024 -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=40 -XX:MaxGCPauseMillis=1000
-XX:InitiatingHeapOccupancyPercent=50 -XX:+UseCompressedOops
-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8 -XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/share/apache-ignite
-XX:+ExitOnOutOfMemoryError -DIGNITE_QUIET=false
-DIGNITE_SUCCESS_FILE=/usr/share/apache-ignite/work/ignite_success_ccedb5ad-5e37-4fee-aeae-19456d7da609
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49124
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

Config (AWS credentials removed):
<beans xmlns="http://www.springframework.org/schema/beans"                                                                                                                                                                                 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"                                                                                                                                                                               
       xsi:schemaLocation="http://www.springframework.org/schema/beans                                                                                                                                                                     
        http://www.springframework.org/schema/beans/spring-beans.xsd">                                                                                                                                                                     
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">                                                                                                                                                     
        <property name="peerClassLoadingEnabled" value="false"/>                                                                                                                                                                           
                                                                                                                                                                                                                                           
        <property name="dataStorageConfiguration">                                                                                                                                                                                         
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">                                                                                                                                                         
                                                                                                                                                                                                                                           
                <property name="defaultDataRegionConfiguration">                                                                                                                                                                           
                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                         
                        <property name="persistenceEnabled" value="true"/>                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                             
                       <property name="checkpointPageBufferSize" value="#{4L * 1024 * 1024 * 1024}"/>                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
                    </bean>                                                                                                                                                                                                                                                         
                </property>                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                   
              <property name="writeThrottlingEnabled" value="true"/>                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                           
              <property name="walMode" value="LOG_ONLY"/>                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                           
           </bean>                                                                                                                                                                                                                                                                                                                         
        </property>                                                                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                           
        <property name="cacheConfiguration">                                                                                                                                                                                                                                                                                               
            <list>                                                                                                                                                                                                                                                                                                                         
                <bean class="org.apache.ignite.configuration.CacheConfiguration">                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                   
                    <property name="name" value="parallel*"/>                                                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                                                                           
                    <property name="atomicityMode" value="ATOMIC"/>                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                     
                    <property name="queryParallelism" value="8"/>                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                           
                    <property name="backups" value="0"/>

                </bean>
            </list>

        </property>

<property name="failureDetectionTimeout" value="10000"/>

 
  <property name="discoverySpi">
    <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
      <property name="ipFinder">
        <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder">
          <property name="awsCredentials" ref=""/>
          <property name="bucketName" value=""/>
        </bean>
      </property>
 
   
   
    <property name="NetworkTimeout" value="120000"/>
    <property name="SocketTimeout" value="50000"/>
    <property name="AckTimeout" value="50000"/>

    <property name="ReconnectCount" value="501"/>

     <property name="LocalAddress" value="10.2.1.33"/>

   </bean>
  </property>

 <property name="communicationSpi">
    <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
      <property name="slowClientQueueLimit" value="1000"/> 


     
      <property name="socketWriteTimeout" value="60000"/>

    </bean>
  </property>

   </bean>




Logs:
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:16:54.273]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=0.03%, avg=3.28%, GC=0%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=2969MB, free=71%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.97%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[2018-07-09 09:35:47,893][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:17:54.275]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=0.1%, avg=3.27%, GC=0%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=2975MB, free=70.94%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.97%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[2018-07-09 09:36:47,889][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:18:54.276]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=0.03%, avg=3.24%, GC=0%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=2981MB, free=70.88%, comm=10239MB]
    ^-- Non heap [used=99MB, free=96.98%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
[09:37:51,534][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 23756 milliseconds.
[2018-07-09 09:37:52,010][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:19:58.394]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=100%, avg=3.26%, GC=107.4%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=7138MB, free=30.29%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.97%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=1, idle=4, qSize=0]
[09:38:31,285][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 30315 milliseconds.
[09:39:05,193][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 33678 milliseconds.
[2018-07-09 09:39:33,679][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:21:11.630]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=100%, avg=3.27%, GC=258.6%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=10203MB, free=0.35%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.95%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=1, idle=0, qSize=0]
[09:39:33,680][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28387 milliseconds.
[09:40:07,501][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 33771 milliseconds.
[2018-07-09 09:40:35,996][INFO ][db-checkpoint-thread-#48][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=3ms, reason='timeout']
[09:40:35,997][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28446 milliseconds.
[2018-07-09 09:40:36,085][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:22:42.475]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=100%, avg=3.27%, GC=118.63%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=10226MB, free=0.14%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.96%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=1, idle=0, qSize=0]
[09:41:10,325][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 34191 milliseconds.
[09:41:39,082][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28698 milliseconds.
[09:42:41,170][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 62047 milliseconds.
[09:43:09,709][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28489 milliseconds.
[2018-07-09 09:43:09,710][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:25:16.100]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=100%, avg=3.27%, GC=262.73%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=10239MB, free=0.01%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.96%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=1, qSize=0]
[2018-07-09 09:43:09,711][INFO ][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/10.2.1.191:55319, rmtPort=55319
[09:43:44,048][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 34289 milliseconds.
[09:44:12,611][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28513 milliseconds.
[09:44:46,759][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 34098 milliseconds.
[2018-07-09 09:44:46,760][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/10.2.1.191, rmtPort=52581]
[2018-07-09 09:44:46,760][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/10.2.1.191, rmtPort=52581]
[2018-07-09 09:45:15,298][INFO ][tcp-disco-sock-reader-#16][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/10.2.1.191:52581, rmtPort=52581]
[09:45:15,298][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28489 milliseconds.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /usr/share/apache-ignite/java_pid13220.hprof ...
[09:46:08,712][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 53364 milliseconds.
[09:46:37,143][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 28381 milliseconds.
Heap dump file created [15357656017 bytes in 138.870 secs]
Terminating due to java.lang.OutOfMemoryError: Java heap space
[2018-07-09 09:48:27,581][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=95dbf461, uptime=22:28:43.534]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=100%, avg=3.27%, GC=142.7%]
    ^-- PageMemory [pages=1596816]
    ^-- Heap [used=10239MB, free=0%, comm=10239MB]
    ^-- Non heap [used=100MB, free=96.96%, comm=103MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=0, qSize=0]




Restart log:


[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] Daemon mode: off
[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] OS: Linux 4.4.0-1062-aws amd64
[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] OS user: root
[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] PID: 28113
[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] Language runtime: Java Platform API Specification ver. 1.8
[2018-07-09 09:53:34,271][INFO ][main][IgniteKernal] VM information: OpenJDK Runtime Environment 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11 Oracle Corporation OpenJDK 64-Bit Server VM 25.171-b11
[2018-07-09 09:53:34,272][INFO ][main][IgniteKernal] VM total memory: 10.0GB
[2018-07-09 09:53:34,272][INFO ][main][IgniteKernal] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49123, auth: off, ssl: off)]
[2018-07-09 09:53:34,273][INFO ][main][IgniteKernal] Logger: Log4JLogger [quiet=false, config=/usr/share/apache-ignite/config/ignite-log4j.xml]
[2018-07-09 09:53:34,273][INFO ][main][IgniteKernal] IGNITE_HOME=/usr/share/apache-ignite
[2018-07-09 09:53:34,273][INFO ][main][IgniteKernal] VM arguments: [-Xms10g, -Xmx10g, -XX:+AggressiveOpts, -XX:MaxMetaspaceSize=2048m, -Djava.net.preferIPv4Stack=true, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+UseTLAB, -XX:NewSize
=128m, -XX:MaxNewSize=128m, -XX:MaxTenuringThreshold=0, -XX:SurvivorRatio=1024, -XX:+UseCMSInitiatingOccupancyOnly, -XX:CMSInitiatingOccupancyFraction=40, -XX:MaxGCPauseMillis=1000, -XX:InitiatingHeapOccupancyPercent=50, -XX:+UseCompres
sedOops, -XX:ParallelGCThreads=8, -XX:ConcGCThreads=8, -XX:+DisableExplicitGC, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/usr/share/apache-ignite, -XX:+ExitOnOutOfMemoryError, -DIGNITE_QUIET=false, -DIGNITE_SUCCESS_FILE=/usr/sha
re/apache-ignite/work/ignite_success_5233d3c6-a8fa-4659-ba1e-1e44799a5b34, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=49123, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=
false, -DIGNITE_HOME=/usr/share/apache-ignite, -DIGNITE_PROG_NAME=bin/ignite.sh]
[2018-07-09 09:53:34,273][INFO ][main][IgniteKernal] System cache's DataRegion size is configured to 40 MB. Use DataStorageConfiguration.systemCacheMemorySize property to change the setting.
[2018-07-09 09:53:34,273][INFO ][main][IgniteKernal] Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache'], in 'default' dataRegion: ['parallel*']]
[2018-07-09 09:53:34,281][INFO ][main][IgniteKernal] 3-rd party licenses can be found at: /usr/share/apache-ignite/libs/licenses
[2018-07-09 09:53:34,367][INFO ][main][IgnitePluginProcessor] Configured plugins:
[2018-07-09 09:53:34,367][INFO ][main][IgnitePluginProcessor]   ^-- None
[2018-07-09 09:53:34,367][INFO ][main][IgnitePluginProcessor]
[2018-07-09 09:53:34,368][INFO ][main][FailureProcessor] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0]]
[2018-07-09 09:53:34,420][INFO ][main][TcpCommunicationSpi] Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[2018-07-09 09:53:34,424][WARN ][main][TcpCommunicationSpi] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and r
eceiver sides.
[2018-07-09 09:53:34,443][WARN ][main][NoopCheckpointSpi] Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[2018-07-09 09:53:34,467][WARN ][main][GridCollisionManager] Collision resolution is disabled (all jobs will be activated upon arrival).
[2018-07-09 09:53:34,468][INFO ][main][IgniteKernal] Security status [authentication=off, tls/ssl=off]
[2018-07-09 09:53:34,494][WARN ][main][TcpDiscoverySpi] Failure detection timeout will be ignored (one of SPI parameters has been set explicitly)
[2018-07-09 09:53:34,509][INFO ][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=localhost/10.2.1.33, locNodeId=3d21865b-81ab-4c06-8b48-edd623154ed6]
[2018-07-09 09:53:34,522][INFO ][main][PdsFoldersResolver] Successfully locked persistence storage folder [/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece]
[2018-07-09 09:53:34,523][INFO ][main][PdsFoldersResolver] Consistent ID used for local node is [761e5630-30c1-4637-8583-86cd37513ece] according to persistence data storage folders
[2018-07-09 09:53:34,523][INFO ][main][CacheObjectBinaryProcessorImpl] Resolved directory for serialized binary metadata: /usr/share/apache-ignite/work/binary_meta/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:34,799][INFO ][main][FilePageStoreManager] Resolved page store work directory: /usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:34,800][INFO ][main][FileWriteAheadLogManager] Resolved write ahead log work directory: /usr/share/apache-ignite/work/db/wal/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:34,800][INFO ][main][FileWriteAheadLogManager] Resolved write ahead log archive directory: /usr/share/apache-ignite/work/db/wal/archive/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:34,815][INFO ][main][FileWriteAheadLogManager] Started write-ahead log manager [mode=LOG_ONLY]
[2018-07-09 09:53:34,857][INFO ][main][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd2
0-START.bin, endMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd20-END.bin]
[2018-07-09 09:53:34,870][INFO ][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24808, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[2018-07-09 09:53:34,871][INFO ][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastMarked=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastCheckpointId=5676123d-d387-445d-8dda-d0edfc67dd20]
[2018-07-09 09:53:35,110][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: Failed to read WAL record at position: 33342306, ptr=FileWALPointer [idx=983, fileOff=33342306, len=0]
[2018-07-09 09:53:35,111][INFO ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=5676123d-d387-445d-8dda-d0edfc67dd20, pos=FileWALPointer [idx=983, fileOff=33293209, len=49097]]
[2018-07-09 09:53:35,141][INFO ][main][GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastCheckpointId=5676123d-d387-445d-8dda-d0edfc67dd20]
[2018-07-09 09:53:35,159][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: Failed to read WAL record at position: 33342306, ptr=FileWALPointer [idx=983, fileOff=33342306, len=0]
[2018-07-09 09:53:35,160][INFO ][main][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=20ms]
[2018-07-09 09:53:35,200][INFO ][main][GridClusterStateProcessor] Restoring history for BaselineTopology[id=0]
[2018-07-09 09:53:35,478][INFO ][main][ClientListenerProcessor] Client connector processor has started on TCP port 10800
[2018-07-09 09:53:35,525][INFO ][main][GridTcpRestProtocol] Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[2018-07-09 09:53:35,563][INFO ][main][IgniteKernal] Non-loopback local IPs: 10.2.1.33
2018-07-09 09:53:35,563][INFO ][main][IgniteKernal] Enabled local MACs: 06C53002E4F0
[2018-07-09 09:53:35,582][WARN ][main][TcpDiscoveryS3IpFinder] Amazon client configuration is not set (will use default).
[2018-07-09 09:53:37,338][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/10.2.1.191, rmtPort=42252]
[2018-07-09 09:53:37,385][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/10.2.1.191, rmtPort=42252]
[2018-07-09 09:53:37,385][INFO ][tcp-disco-sock-reader-#4][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/10.2.1.191:42252, rmtPort=42252]
[2018-07-09 09:53:37,592][INFO ][tcp-disco-msg-worker-#3][GridClusterStateProcessor] Sending activate request with BaselineTopology BaselineTopology [id=0, branchingHash=496576913, branchingType='Cluster activation', baselineNodes=[099d7fb5-3763-4ff8-bd55-f187aa204047, 761e5630-30c1-4637-8583-86cd37513ece]]
[2018-07-09 09:53:37,647][INFO ][tcp-disco-msg-worker-#3][GridClusterStateProcessor] Received activate request with BaselineTopology[id=0]
[2018-07-09 09:53:37,650][INFO ][tcp-disco-msg-worker-#3][GridClusterStateProcessor] Started state transition: true
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal] Performance suggestions for grid  (fix if possible)
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal]   ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal]   ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal]
[2018-07-09 09:53:37,683][INFO ][main][IgniteKernal] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[2018-07-09 09:53:37,684][INFO ][main][IgniteKernal]
[2018-07-09 09:53:37,684][INFO ][main][IgniteKernal]

>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.5.0#20180523-sha1:86e110c750a340dc9be2d396415f0b80d7ed8813
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 4.4.0-1062-aws amd64
>>> CPU(s): 8
>>> Heap: 10.0GB
>>> VM name: 28113@localhost
>>> Local node [ID=3D21865B-81AB-4C06-8B48-EDD623154ED6, order=2, clientMode=false]
>>> Local node addresses: [localhost/10.2.1.33]
>>> Local ports: TCP:10800 TCP:11211 TCP:47100 TCP:47500

[2018-07-09 09:53:37,688][INFO ][main][GridDiscoveryManager] Topology snapshot [ver=2, servers=2, clients=0, CPUs=16, offheap=13.0GB, heap=20.0GB]
[2018-07-09 09:53:37,688][INFO ][main][GridDiscoveryManager]   ^-- Node [id=3D21865B-81AB-4C06-8B48-EDD623154ED6, clusterState=INACTIVE]
[2018-07-09 09:53:37,689][INFO ][main][GridDiscoveryManager]   ^-- Baseline [id=0, size=2, online=2, offline=0]
[2018-07-09 09:53:37,689][INFO ][main][GridDiscoveryManager]   ^-- All baseline nodes are online, will start auto-activation
[2018-07-09 09:53:37,689][INFO ][main][GridDiscoveryManager] Data Regions Configured:
[2018-07-09 09:53:37,689][INFO ][main][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=6.3 GiB, persistenceEnabled=true]
[2018-07-09 09:53:37,702][INFO ][exchange-worker-#42][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], crd=false, evt=DISCOVERY_CUSTOM_EVT, evtNode=3d21865b-81ab-4c06-8b48-edd623154ed6, customEvt=ChangeGlobalStateMessage [id=996e57e7461-6b6a85bd-5258-44fe-ae07-88ea72d7d8fd, reqId=779ffcc0-0f7c-4ee9-87b0-eed18cacceda, initiatingNodeId=3d21865b-81ab-4c06-8b48-edd623154ed6, activate=true, baselineTopology=BaselineTopology [id=0, branchingHash=496576913, branchingType='Cluster activation', baselineNodes=[099d7fb5-3763-4ff8-bd55-f187aa204047, 761e5630-30c1-4637-8583-86cd37513ece]], forceChangeBaselineTopology=false, timestamp=1531130017592], allowMerge=false]
[2018-07-09 09:53:37,711][INFO ][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Start activation process [nodeId=3d21865b-81ab-4c06-8b48-edd623154ed6, client=false, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
[2018-07-09 09:53:37,711][INFO ][exchange-worker-#42][FilePageStoreManager] Resolved page store work directory: /usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:37,712][INFO ][exchange-worker-#42][FileWriteAheadLogManager] Resolved write ahead log work directory: /usr/share/apache-ignite/work/db/wal/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:37,722][INFO ][exchange-worker-#42][FileWriteAheadLogManager] Resolved write ahead log archive directory: /usr/share/apache-ignite/work/db/wal/archive/node00-761e5630-30c1-4637-8583-86cd37513ece
[2018-07-09 09:53:37,735][INFO ][exchange-worker-#42][FileWriteAheadLogManager] Started write-ahead log manager [mode=LOG_ONLY]
[2018-07-09 09:53:37,784][INFO ][exchange-worker-#42][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24808, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[2018-07-09 09:53:37,828][INFO ][exchange-worker-#42][PageMemoryImpl] Started page memory [memoryAllocated=6.3 GiB, pages=1596800, tableSize=124.2 MiB, checkpointBuffer=4.0 GiB]
[2018-07-09 09:53:37,830][INFO ][exchange-worker-#42][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24808, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[2018-07-09 09:53:38,987][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd20-START.bin, endMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd20-END.bin]
[2018-07-09 09:53:38,987][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastMarked=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastCheckpointId=5676123d-d387-445d-8dda-d0edfc67dd20]
[2018-07-09 09:53:39,005][INFO ][exchange-worker-#42][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: Failed to read WAL record at position: 33342306, ptr=FileWALPointer [idx=983, fileOff=33342306, len=0]
[2018-07-09 09:53:39,006][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=5676123d-d387-445d-8dda-d0edfc67dd20, pos=FileWALPointer [idx=983, fileOff=33293209, len=49097]]
[2018-07-09 09:53:39,020][INFO ][exchange-worker-#42][FileWriteAheadLogManager] Resuming logging to WAL segment [file=/usr/share/apache-ignite/work/db/wal/node00-761e5630-30c1-4637-8583-86cd37513ece/0000000000000003.wal, offset=33342306, ver=2]
[2018-07-09 09:53:39,035][INFO ][exchange-worker-#42][GridClusterStateProcessor] Writing BaselineTopology[id=0]
[2018-07-09 09:53:39,379][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=ignite-sys-cache, id=-2100569601, memoryPolicyName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647]
[2018-07-09 09:53:39,430][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_USER, id=-2066146608, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,663][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_ACTION, id=-1878918149, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,667][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=template, id=-1321546630, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,671][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_CAMPAIGN_LINK, id=-1170769820, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,675][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_USER_EMAIL, id=-1053133491, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,679][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_TAG, id=-897935403, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,683][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_USERTEMP, id=-363434332, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,686][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_LINK_TAG, id=1097579066, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,690][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=parallel, id=1171402247, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,671][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_CAMPAIGN_LINK, id=-1170769820, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,675][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_USER_EMAIL, id=-1053133491, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,679][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_TAG, id=-897935403, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,683][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_USERTEMP, id=-363434332, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,686][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=SQL_PUBLIC_LINK_TAG, id=1097579066, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,690][INFO ][exchange-worker-#42][GridCacheProcessor] Started cache [name=parallel, id=1171402247, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=0]
[2018-07-09 09:53:39,710][INFO ][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Successfully activated caches [nodeId=3d21865b-81ab-4c06-8b48-edd623154ed6, client=false, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
[2018-07-09 09:53:39,732][INFO ][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], waitTime=0ms, futInfo=NA]
[2018-07-09 09:53:39,757][INFO ][grid-nio-worker-tcp-comm-0-#25][TcpCommunicationSpi] Established outgoing communication connection [locAddr=/10.2.1.33:50354, rmtAddr=/10.2.1.191:47100]
[2018-07-09 09:53:39,774][INFO ][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Finished waiting for partitions release latch: ClientLatch [coordinator=TcpDiscoveryNode [id=ac19a8e6-1d47-42d6-a20b-7f98acb06853, addrs=[10.2.1.191], sockAddrs=[/10.2.1.191:47500, localhost/10.2.1.33:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1531130017419, loc=false, ver=2.5.0#20180523-sha1:86e110c7, isClient=false], ackSent=true, super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]]
[2018-07-09 09:53:39,775][INFO ][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], waitTime=0ms, futInfo=NA]
[2018-07-09 09:53:39,775][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Read checkpoint status [startMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd20-START.bin, endMarker=/usr/share/apache-ignite/work/db/node00-761e5630-30c1-4637-8583-86cd37513ece/cp/1531127399929-5676123d-d387-445d-8dda-d0edfc67dd20-END.bin]
[2018-07-09 09:53:39,776][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=983, fileOff=33293209, len=49097], lastCheckpointId=5676123d-d387-445d-8dda-d0edfc67dd20]
[2018-07-09 09:53:39,797][WARN ][exchange-worker-#42][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=983,Offset=33483315}, Actual state : {Index=3519108834,Offset=54} ]
[2018-07-09 09:54:08,067][INFO ][exchange-worker-#42][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=28298ms]
[2018-07-09 09:54:08,124][INFO ][db-checkpoint-thread-#48][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=52c34d20-a827-4a4e-99f2-e6e001aac8e3, startPtr=FileWALPointer [idx=983, fileOff=37691879, len=18925], checkpointLockWait=0ms, checkpointLockHoldTime=28ms, walCpRecordFsyncDuration=21ms, pages=1061, reason='node started']
[2018-07-09 09:54:08,214][INFO ][exchange-worker-#42][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], crd=false]
[2018-07-09 09:54:08,348][INFO ][tcp-disco-msg-worker-#3][GridClusterStateProcessor] Received state change finish message: true
[2018-07-09 09:54:08,355][INFO ][sys-#53][GridDhtPartitionsExchangeFuture] Received full message, will finish exchange [node=ac19a8e6-1d47-42d6-a20b-7f98acb06853, resVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
[2018-07-09 09:54:08,395][INFO ][sys-#53][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=2, minorTopVer=1], err=null]
[2018-07-09 09:54:08,475][INFO ][sys-#56][GridClusterStateProcessor] Successfully performed final activation steps [nodeId=3d21865b-81ab-4c06-8b48-edd623154ed6, client=false, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=1]]
[2018-07-09 09:54:08,480][INFO ][exchange-worker-#42][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=3d21865b-81ab-4c06-8b48-edd623154ed6]
[2018-07-09 09:54:08,804][INFO ][db-checkpoint-thread-#48][GridCacheDatabaseSharedManager] Checkpoint finished [cpId=52c34d20-a827-4a4e-99f2-e6e001aac8e3, pages=1061, markPos=FileWALPointer [idx=983, fileOff=37691879, len=18925], walSegmentsCleared=0, markDuration=56ms, pagesWrite=64ms, fsync=614ms, total=734ms]
[2018-07-09 09:54:37,695][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=3d21865b, uptime=00:01:00.018]
    ^-- H/N/C [hosts=2, nodes=2, CPUs=16]
    ^-- CPU [cur=0.13%, avg=3%, GC=0%]
    ^-- PageMemory [pages=112715]
    ^-- Heap [used=160MB, free=98.44%, comm=10239MB]
    ^-- Non heap [used=63MB, free=98.08%, comm=64MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akurbanov akurbanov
Reply | Threaded
Open this post in threaded view
|

Re: OOME causing caches to be removed

Hi,

How do you create USER cache?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
djax djax
Reply | Threaded
Open this post in threaded view
|

Re: OOME causing caches to be removed

I use the following SQL sentence:

CREATE TABLE user (
                user_id BIGINT NOT NULL,
                db_id INTEGER NOT NULL,
                list_id INTEGER NOT NULL,
                con_id INTEGER NOT NULL,
                status VARCHAR NOT NULL,
                procedence VARCHAR,
                sex CHAR(1),
                birthdate DATE,
                country VARCHAR,
                province INTEGER,
                email VARCHAR,
                CONSTRAINT user_id PRIMARY KEY (user_id)
) WITH "template=parallel";

The error has also occurred with the default template.

The query did finish executing when I set the JVM heap of each node to 18
GB, but that seems excessive, given that the total amount of data on disk of
the USER table is less than 7 GB. During the execution of the query Ignite
reached a peak of around 25 GB of JVM heap usage.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/