Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

classic Classic list List threaded Threaded
6 messages Options
vinod.jv vinod.jv
Reply | Threaded
Open this post in threaded view
|

Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Hi,

We are using Apache Ignite in embedded mode to store data in key value pairs
and query the data.
Sometimes, the spark jobs run for really long time than expected and in
those scenarios we have noticed that Ignite nodes are not responding in the
heart beat interval time and hence the re-balancing of data is happening and
followed by query cancellation.

When the job is running in the expected time we don't see any exceptions in
the log.

Here are the exceptions we get.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10
seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]

19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Hi,

> we have noticed that Ignite nodes are not responding in the heart beat interval time and hence the re-balancing of data is happening and followed by query cancellation.

It looks like a long GC pauses for me. Can you share full logs from all the nodes? As the fast workaround, I can suggest increasing failureDetectionTimeout, but it's definitely not a final solution for this.

Thanks,
Evgenii

чт, 13 июн. 2019 г. в 14:57, vinod.jv <[hidden email]>:
Hi,

We are using Apache Ignite in embedded mode to store data in key value pairs
and query the data.
Sometimes, the spark jobs run for really long time than expected and in
those scenarios we have noticed that Ignite nodes are not responding in the
heart beat interval time and hence the re-balancing of data is happening and
followed by query cancellation.

When the job is running in the expected time we don't see any exceptions in
the log.

Here are the exceptions we get.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10
seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]

19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
        at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
        at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
        at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

In reply to this post by vinod.jv
The documentation recommends against using embedded mode for what’s likely to be a related reason.

Embedded mode implies starting Ignite server nodes within Spark executors which can cause unexpected rebalancing or even data loss. Therefore this mode is currently deprecated and will be eventually discontinued. Consider starting a separate Ignite cluster and using standalone mode to avoid data consistency and performance issues.


Regards,
Stephen

On 13 Jun 2019, at 12:57, vinod.jv <[hidden email]> wrote:

Hi,

We are using Apache Ignite in embedded mode to store data in key value pairs
and query the data.
Sometimes, the spark jobs run for really long time than expected and in
those scenarios we have noticed that Ignite nodes are not responding in the
heart beat interval time and hence the re-balancing of data is happening and
followed by query cancellation.

When the job is running in the expected time we don't see any exceptions in
the log.

Here are the exceptions we get.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10
seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]

19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Loredana Radulescu Ivanoff Loredana Radulescu Ivanoff
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Would it be correct to extrapolate this statement and say that Ignite should be started as a standalone application as opposed to being embedded inside an application server that has its own lifecycle and additional responsibilities?



On Thu, Jun 13, 2019 at 7:48 AM Stephen Darlington <[hidden email]> wrote:
The documentation recommends against using embedded mode for what’s likely to be a related reason.

Embedded mode implies starting Ignite server nodes within Spark executors which can cause unexpected rebalancing or even data loss. Therefore this mode is currently deprecated and will be eventually discontinued. Consider starting a separate Ignite cluster and using standalone mode to avoid data consistency and performance issues.


Regards,
Stephen

On 13 Jun 2019, at 12:57, vinod.jv <[hidden email]> wrote:

Hi,

We are using Apache Ignite in embedded mode to store data in key value pairs
and query the data.
Sometimes, the spark jobs run for really long time than expected and in
those scenarios we have noticed that Ignite nodes are not responding in the
heart beat interval time and hence the re-balancing of data is happening and
followed by query cancellation.

When the job is running in the expected time we don't see any exceptions in
the log.

Here are the exceptions we get.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10
seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]

19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Correct.

Which is not to say never do it. But the bigger and more complicated your application, the more likely you’re going to have problems with embedding.

Regards,
Stephen

On 13 Jun 2019, at 16:45, Loredana Radulescu Ivanoff <[hidden email]> wrote:

Would it be correct to extrapolate this statement and say that Ignite should be started as a standalone application as opposed to being embedded inside an application server that has its own lifecycle and additional responsibilities?



On Thu, Jun 13, 2019 at 7:48 AM Stephen Darlington <[hidden email]> wrote:
The documentation recommends against using embedded mode for what’s likely to be a related reason.

Embedded mode implies starting Ignite server nodes within Spark executors which can cause unexpected rebalancing or even data loss. Therefore this mode is currently deprecated and will be eventually discontinued. Consider starting a separate Ignite cluster and using standalone mode to avoid data consistency and performance issues.


Regards,
Stephen

On 13 Jun 2019, at 12:57, vinod.jv <[hidden email]> wrote:

Hi,

We are using Apache Ignite in embedded mode to store data in key value pairs
and query the data.
Sometimes, the spark jobs run for really long time than expected and in
those scenarios we have noticed that Ignite nodes are not responding in the
heart beat interval time and hence the re-balancing of data is happening and
followed by query cancellation.

When the job is running in the expected time we don't see any exceptions in
the log.

Here are the exceptions we get.

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10
seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[10 seconds]

19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:558)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
19/06/13 02:14:08 ERROR twostep.GridMapQueryExecutor: Failed to execute
local query.
class org.apache.ignite.cache.query.QueryCancelledException: The query was
cancelled while executing.
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:595)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:449)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:203)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:1915)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/




vinod.jv vinod.jv
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Nodes Go Down - org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval

Thank you. Our application is huge and complicated. Will explore Standalone
also.
But, why is this inconsistent. We have seen the same job taking 20 min also
and even 2hrs also.
IN the yarn log that runs for 20 min we dont see any exceptions, whereas in
the job that runs for 2hrs we see all the exceptions that i mentioned
multiple times.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/