Ignite node failure after network issue

classic Classic list List threaded Threaded
5 messages Options
ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

Ignite node failure after network issue

This post was updated on .
Hi Igniters,

We had a network glitch (still trying to find the problem) last night and one node halted itself. Both client
and node logs are attached, can someone have a look and tell me the exact
problem here;

Archive.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/Archive.zip

ignite config is;
ignite-config.zip

ignite client config is;
ignite-client-config.zip

We are on version 2.7.6. Out client application runs on spring-boot v2.0.6

I have been searching all over the Apache Ignite online sources to find what
should be the best practices for network problem/s handling on both server
and client side, if someone has such a source, I would be happy to read it.

Regards.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

RE: Ignite node failure after network issue

This post was updated on .
Hi,

 Your discovery section is non-standard. You don't need to repeat the entries.
Also best if you prepend the IP to be on the safe side.


    <property name="addresses">
                            <list>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                                <value>:47500..47509</value>
                            </list>
                        </property>

The same discovery section is  on the client side



Also the thread pool config section is on the large side

        <property name="systemThreadPoolSize" value="128"/>
        <property name="publicThreadPoolSize" value="128"/>
        <property name="queryThreadPoolSize" value="128"/>
        <property name="serviceThreadPoolSize" value="128"/>
        <property name="stripedPoolSize" value="128"/>
        <property name="dataStreamerThreadPoolSize" value="64"/>
        <property name="rebalanceThreadPoolSize" value="8"/>

It will create many unnecessary threads unless you have a specific need for that.
https://apacheignite.readme.io/docs/performance-tips#section-configure-thread-pools


The root cause of the stoppage looks to be incorrectly configured library/serialization issue with joda objects

Caused by: org.apache.ignite.IgniteException: Failed to create string representation of binary object.
        at org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryObjectExImpl.java:189) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryObjectImpl.toString(BinaryObjectImpl.java:920) ~[ignite-core-2.7.6.jar:2.7.6]
        at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_201]
        at org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005) ~[ignite-core-2.7.6.jar:2.7.6]
        ... 26 more
Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to read field: iChronology
        at org.apache.ignite.internal.binary.BinaryReaderExImpl.wrapFieldException(BinaryReaderExImpl.java:447) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryReaderExImpl.unmarshalField(BinaryReaderExImpl.java:344) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryObjectImpl.field(BinaryObjectImpl.java:626) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryObjectExImpl.java:225) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryObjectExImpl.appendValue(BinaryObjectExImpl.java:280) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryObjectExImpl.java:229) ~[ignite-core-2.7.6.jar:2.7.6]

Caused by: org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same versions of all classes are available on all nodes or enable peer-class-loading) [clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, cls=org.joda.time.chrono.ISOChronology$Stub]
        at org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.unmarshal0(OptimizedMarshaller.java:233) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1762) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.binary.BinaryUtils.unmarshal(BinaryUtils.java:1971) ~[ignite-core-2.7.6.jar:2.7.6]


looks like this library is missing/incorrectly configured in your setup
Caused by: java.lang.ClassNotFoundException: org.joda.time.chrono.ISOChronology$Stub
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_201]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_201]
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) ~[?:1.8.0_201]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_201]
        at java.lang.Class.forName0(Native Method) ~[?:1.8.0_201]
        at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_201]
        at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8775) [ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:349) ~[ignite-core-2.7.6.jar:2.7.6]
        at org.apache.ignite.internal.marshaller.optimized.OptimizedMarshallerUtils.classDescriptor(OptimizedMarshallerUtils.java:264) ~[ignite-core-2.7.6.jar:2.7.6]


[2019-10-25T02:10:34,352][ERROR][sys-stripe-50-#51][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to create string representation of binary object.]]

There is stub in this class that implements serializable, causing ignite to use a different marshaller, and leading to error.
Joda class in question: look on bottom
https://github.com/JodaOrg/joda-time/blob/master/src/main/java/org/joda/time/chrono/ISOChronology.java

information about binary marshalling ignite:
https://apacheignite.readme.io/docs/binary-marshaller

specific marshaller used in this case.
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/marshaller/optimized/OptimizedMarshaller.java

Make sure all nodes have this library setup, and that IGNITE_HOME is properly configured.

Thanks, Alex

-----Original Message-----
From: ihalilaltun <ibrahim.altun@segmentify.com> 
Sent: Friday, October 25, 2019 10:28 AM
To: user@ignite.apache.org
Subject: Ignite node failure after network issue

Hi Igniters,

We had a network glitch last night and one node halted itself. Both client and node logs are attached, can someone have a look and tell me the exact problem here;

Archive.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t2515/Archive.zip

We are on version 2.7.6. Out client application runs on spring-boot v2.0.6

I have been searching all over the Apache Ignite online sources to find what should be the best practices for network problem/s handling on both server and client side, if someone has such a source, I would be happy to read it.

Regards.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

RE: Ignite node failure after network issue

Hi Alex,

I've been removed the IP addreses for the sake of security reasons thats why
it seems non-standart.

I'll try to adjust all thread-pool sizes, I am not sure if we need them or
not, since the configurations are made by our previous software architect.

I'll look further on the serialization and marsheller problems, thanks. Will
it be enough if I add the jar files under the lib directories on server
nodes in order not to get these serialization problems?

thanks.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

RE: Ignite node failure after network issue

Yes, putting the jar files with the necessary classes into the classpath of all nodes should solve the serialization issue.

You should see the following in your logs:
[2019-10-28 14:52:29,893][INFO ][sys-stripe-7-#8%node1%][GridDeploymentLocalStore] Class locally deployed: class org.joda.time.chrono.ISOChronology$Stub
[2019-10-28 14:52:29,896][INFO ][sys-stripe-7-#8%node1%][GridDeploymentLocalStore] Class locally deployed: class org.joda.time.DateTimeZone$Stub


-----Original Message-----
From: ihalilaltun <[hidden email]>
Sent: Monday, October 28, 2019 6:03 AM
To: [hidden email]
Subject: RE: Ignite node failure after network issue

Hi Alex,

I've been removed the IP addreses for the sake of security reasons thats why it seems non-standart.

I'll try to adjust all thread-pool sizes, I am not sure if we need them or not, since the configurations are made by our previous software architect.

I'll look further on the serialization and marsheller problems, thanks. Will it be enough if I add the jar files under the lib directories on server nodes in order not to get these serialization problems?

thanks.



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

ihalilaltun ihalilaltun
Reply | Threaded
Open this post in threaded view
|

RE: Ignite node failure after network issue

Hi Alex,

Thnaks for the response. We've made some optimizations on thread sizes and
reorganize classpaths. I'll write againg if we face the problem again.

regards



-----
İbrahim Halil Altun
Senior Software Engineer @ Segmentify
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
İbrahim Halil Altun
Senior Software Engineer @ Segmentify