Why the client and server behaves like that?

classic Classic list List threaded Threaded
4 messages Options
F7753 F7753
Reply | Threaded
Open this post in threaded view
|

Why the client and server behaves like that?

Here I launched 3 ignite node using ${IGNITE_HOME}/bin/ignite.sh, but the output of the control like below:
-------------------------------------------------------------------------------------------------------------
[18:51:09] Topology snapshot [ver=4, servers=3, clients=1, CPUs=96, heap=53.0GB]
[18:51:16] Topology snapshot [ver=5, servers=3, clients=2, CPUs=96, heap=100.0GB]
[18:51:16] Topology snapshot [ver=6, servers=3, clients=3, CPUs=96, heap=150.0GB]
[18:51:16] Topology snapshot [ver=7, servers=3, clients=4, CPUs=96, heap=200.0GB]
-------------------------------------------------------------------------------------------------------------
what does the server and client mean? I have one driver and 3 worker in my spark cluster, and I run the  ${IGNITE_HOME}/bin/ignite.sh on my worker node.
Then after a while, it throws:
-------------------------------------------------------------------------------------------------------------
[18:52:28,869][SEVERE][tcp-client-disco-reconnector-#8%null%][TcpDiscoverySpi] Failed to reconnect
class org.apache.ignite.IgniteCheckedException: Failed to deserialize object with given class loader: sun.misc.Launcher$AppClassLoader@26f44031
        at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal(JdkMarshaller.java:105)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1213)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at org.apache.ignite.marshaller.jdk.JdkMarshallerInputStreamWrapper.read(JdkMarshallerInputStreamWrapper.java:53)
        at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2310)
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(JdkMarshallerObjectInputStream.java:39)
        at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal(JdkMarshaller.java:100)
        ... 2 more
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 8, nobida144): class org.apache.ignite.IgniteClientDisconnectedException: Client node disconnected: null
        at org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:87)
        at org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:3017)
        at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2467)
        at org.apache.ignite.spark.impl.IgniteAbstractRDD.ensureCache(IgniteAbstractRDD.scala:35)
        at org.apache.ignite.spark.IgniteRDD$$anonfun$savePairs$1.apply(IgniteRDD.scala:174)
        at org.apache.ignite.spark.IgniteRDD$$anonfun$savePairs$1.apply(IgniteRDD.scala:170)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
        at org.apache.ignite.spark.IgniteRDD.savePairs(IgniteRDD.scala:170)
        at main.scala.StreamingJoin$.main(StreamingJoin.scala:241)
        at main.scala.StreamingJoin.main(StreamingJoin.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
^C[18:53:01] Ignite node stopped OK [uptime=00:01:52:668]

-------------------------------------------------------------------------------------------------------------
F7753 F7753
Reply | Threaded
Open this post in threaded view
|

Re: Why the client and server behaves like that?

I found that GC was the main problem in my circumstance, each of my node throws the GC exception:
----------------------------------------------------------------------------------------------------------------
Exception in thread "shmem-worker-#175%null%" java.lang.OutOfMemoryError: GC overhead limit exceeded
[05-Apr-2016 19:05:26][ERROR][shmem-worker-#176%null%][TcpCommunicationSpi] Runtime error caught during grid runnable execution: ShmemWorker [endpoint=IpcSharedMemoryClientEndpoint [inSpace=IpcSharedMemorySpace [opSize=262144, shmemPtr=139664015134784, shmemId=2883604, semId=2392067, closed=true, isReader=true, writerPid=11421, readerPid=11230, tokFileName=/opt/apache-ignite-1.5.0.final-src/work/ipc/shmem/a3bcd536-31b5-47f6-b248-80f5a43e50dc-11230/gg-shmem-space-46-11421-262144, closed=true], outSpace=IpcSharedMemorySpace [opSize=262144, shmemPtr=139663894958144, shmemId=2916373, semId=2424836, closed=true, isReader=false, writerPid=11230, readerPid=11421, tokFileName=/opt/apache-ignite-1.5.0.final-src/work/ipc/shmem/a3bcd536-31b5-47f6-b248-80f5a43e50dc-11230/gg-shmem-space-47-11421-262144, closed=true], checkIn=true, checkOut=true]]
java.lang.OutOfMemoryError: GC overhead limit exceeded
----------------------------------------------------------------------------------------------------------------
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: Why the client and server behaves like that?

Hi,

Server is the node that can store the data. In your case you start them with ignite.sh scripts. All clients are started automatically by IgniteContext - one client per worker and the fourth one on the driver.

-Val
F7753 F7753
Reply | Threaded
Open this post in threaded view
|

Re: Why the client and server behaves like that?

Thanks a lot to let me know that. I think I'd use some time to refer to the ignite doc more carefully.
And I created another topic about the GC OOM in my cluster:
http://apache-ignite-users.70518.x6.nabble.com/How-to-end-up-the-GC-overhead-problem-in-the-IgniteRDD-tc3945.html