About Ignite Map Reduce OOM problem

classic Classic list List threaded Threaded
4 messages Options
NateQu NateQu
Reply | Threaded
Open this post in threaded view
|

About Ignite Map Reduce OOM problem

This post was updated on .
Hi All,
We were trying to use Ignite Map Reduce to accelerate Hive Query on an existing HDFS, changes we made includes:

  1.  Changed core-site.xml, added
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hacluster</value>
</property>

  2.  Changed hive-site.xml, added
<property>
    <name>hive.rpc.query.plan</name>
    <value>true</value>
</property>

  3.  Changed mapred-site.xml, added
<property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
</property>
<property>
    <name>mapreduce.jobtracker.address</name>
    <value>localhost:11211</value>
</property>

  4.  Added ignite-core, ignite-hadoop and ignite-shmem into hadoop class path.
  5.  downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite from https://ignite.apache.org/download.cgi
  6.  Changed ${ignite_home}/conf/default-config.xml, added
<property name="communicationSpi">
    <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="messageQueueLimit" value="1024"/>
    </bean>
</property>

  7.  Changed ${ignite_home}/bin/ignite.sh, enabled G1GC
  8.  Increased both on heap and off heap size.
  9.  Restarted HiveServer to let it pick up the latest config.
Then we started a beeline, executed some queries with around 1b records. It turns out that for a cluster of two nodes:

  1.  node1 got
[04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 9417 milliseconds.
[04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 12707 milliseconds.
[04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 8077 milliseconds.
[04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 30785 milliseconds.
[04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 8936 milliseconds.
[04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 14208 milliseconds.
Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded

  2.  after a while, node2 got
Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500, rmtPort=47500]

  3.  eventually, the terminal which executes beeling got
Caused by: java.io.IOException: Did not receive any packets within ping response interval (connection is considered to be half-opened) [lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438, now=1549397562438, timeout=7000, addr=/192.69.2.12:11211]

Any ideas how could we solve this problem? Thanks a lot in advanced.

ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: About Ignite Map Reduce OOM problem

Hello!

Have you tried profiling heap, to see where the heap is trapped?

Regards,
--
Ilya Kasnacheev


ср, 6 февр. 2019 г. в 00:08, Xia Qu <[hidden email]>:

Hi All,

We were trying to use Ignite Map Reduce to accelerate Hive Query on an existing HDFS, changes we made includes:

  1. Changed core-site.xml, added

<property>

    <name>fs.defaultFS</name>

    <value>hdfs://hacluster</value>

</property>

  1. Changed hive-site.xml, added

<property>

    <name>hive.rpc.query.plan</name>

    <value>true</value>

</property>

  1. Changed mapred-site.xml, added

<property>

    <name>mapreduce.framework.name</name>

    <value>ignite</value>

</property>

<property>

    <name>mapreduce.jobtracker.address</name>

    <value>localhost:11211</value>

</property>

  1. Added ignite-coreignite-hadoop and ignite-shmem into hadoop class path.
  2. downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite from https://ignite.apache.org/download.cgi
  3. Changed ${ignite_home}/conf/default-config.xml, added

<property name="communicationSpi">

    <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">

        <property name="messageQueueLimit" value="1024"/>

    </bean>

</property>

  1. Changed ${ignite_home}/bin/ignite.sh, enabled G1GC
  2. Increased both on heap and off heap size.
  3. Restarted HiveServer to let it pick up the latest config.

Then we started a beeline, executed some queries with around 1b records. It turns out that for a cluster of two nodes:

  1. node1 got

[04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 9417 milliseconds.

[04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 12707 milliseconds.

[04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 8077 milliseconds.

[04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 30785 milliseconds.

[04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 8936 milliseconds.

[04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM pause: 14208 milliseconds.

Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded

  1. after a while, node2 got

Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500, rmtPort=47500]

  1. eventually, the terminal which executes beeling got

Caused by: java.io.IOException: Did not receive any packets within ping response interval (connection is considered to be half-opened) [lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438, now=1549397562438, timeout=7000, addr=/192.69.2.12:11211]

Any ideas how could we solve this problem?

 

NateQu NateQu
Reply | Threaded
Open this post in threaded view
|

Re: About Ignite Map Reduce OOM problem

Tried to monitor heap with verbose mode enabled, any suggestion what is the
best way to profiling heap for a cluster? Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: About Ignite Map Reduce OOM problem

Hello!

You should profile heap of problematic nodes' JVM using any preferred tool set. I prefer Eclipse MAT personally.

Regards,
--
Ilya Kasnacheev


чт, 7 февр. 2019 г. в 18:03, NateQu <[hidden email]>:
Tried to monitor heap with verbose mode enabled, any suggestion what is the
best way to profiling heap for a cluster? Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/