changed cache configuration and restarted server nodes. Getting exception.

classic Classic list List threaded Threaded
12 messages Options
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

changed cache configuration and restarted server nodes. Getting exception.

Hi,

I faced an issue today and couldn't figure out whats wrong hence though of asking on this forum.
I added expiration policy to 2 cacheConfigurations, stopped all cache server nodes and then started one by one. My client nodes had near caches for the these caches and i am not sure if this caused the issue. Issue was that i started getting "org.apache.ignite.cache.CachePartialUpdateException: Failed to update keys (retry update if possible)." exception in my apps that were using these caches.
I thought that may be there are some old entries in near caches where as server caches are empty and this is causing issue somehow. I checked cache statistics and all caches were empty.
Still i tried to clear caches using visor and faced following exception.

visor> cache -clear -c=PROGRAMS
[16:43:42,883][SEVERE][mgmt-#22%null%][GridTaskWorker] Failed to reduce job results due to undeclared user exception [task=o.a.i.i.v.cache.VisorCacheClearTask@54656dd, err=class o.a.i.IgniteException: Failed to deserialize object with given class loader: WebappClassLoader
  context: /myWebService
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@2b71fc7e
]
class org.apache.ignite.IgniteException: Failed to deserialize object with given class loader: WebappClassLoader
  context: /myWebService
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
java.net.URLClassLoader@2b71fc7e

        at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:882)

..
..
..
..
..
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to deserialize object with given class loader: WebappClassLoader
..
..
..
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.ignite.marshaller.optimized.OptimizedObjectInputStream.readExternalizable(OptimizedObjectInputStream.java:523
..
..
..
Caused by: java.io.InvalidObjectException: Ignite instance with provided name doesn't exist. Did you call Ignition.start(..) to start an Ignite instance? [name=null]
        at org.apache.ignite.internal.processors.cache.GridCacheContext.readResolve(GridCacheContext.java:1999)
        ... 37 more
Caused by: class org.apache.ignite.IgniteIllegalStateException: Ignite instance with provided name doesn't exist. Did you call Ignition.start(..) to start an Ignite instance? [name=null]
        at org.apache.ignite.internal.IgnitionEx.gridx(IgnitionEx.java:1267)
        at org.apache.ignite.internal.processors.cache.GridCacheContext.readResolve(GridCacheContext.java:1989)
        ... 37 more
(wrn) <visor>: class org.apache.ignite.IgniteException: Failed to deserialize object with given class loader: WebappClassLoader


Regards,
Vinay Sharma
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vinay,

CachePartialUpdateException is thrown by an update operation (put, putAll, remove, removeAll, ...) if updates for one or more keys involved in this operation failed. This exception has failedKeys() method that tells you which keys failed so that you can retry only them, no need to retry successful ones.

Most likely you were getting these exceptions when there were no server nodes in the topology. Is this the case?

-Val
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Val

At the time of this issue i checked topology through visor and all 3 client and 2 server nodes were there. There were no items in any cache. I could see caches created on all 5 nodes (near cache on 3 clients and replicated cache on 2 servers). I also tried cleaning a cache through visor through command "cache -clear -c=PROGRAMS" which caused exception trace as i mentioned previously. I tried multiple times and got same error. I was running visor on one on the hosts of server nodes.

Shouldn't a node be dropped from topology if its not accessible due to any issue? Exception trace with class loader related exceptions, error in uodate when cache is empty on all nodes, all nodes visible in visor topology and getting same exception on repeated tries to clean a cache seems like pointing to problem other than network or node accessibility issue.

I restarted ignite server nodes but problem was still there. I had to stop all ignite nodes including clients to resolve problem.

Even more interesting thing is that i did not faced any issue on my DEV and QA environment when i did cache changes and restarted just server nodes. I faced this problem on pre-prod where i had to restart all nodes.

Regards,
Vinay Sharma

On Feb 17, 2016 7:05 PM, "vkulichenko [via Apache Ignite Users]" <[hidden email]> wrote:
Hi Vinay,

CachePartialUpdateException is thrown by an update operation (put, putAll, remove, removeAll, ...) if updates for one or more keys involved in this operation failed. This exception has failedKeys() method that tells you which keys failed so that you can retry only them, no need to retry successful ones.

Most likely you were getting these exceptions when there were no server nodes in the topology. Is this the case?

-Val


To unsubscribe from changed cache configuration and restarted server nodes. Getting exception., click here.
NAML
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vinay,

It looks like there was a problem with serialization of one of Ignite internal components. Could you please provide the full stack trace of this exception?
Any additional information like your source code or Ignite XML configuration could also help.

Vladimir.

On Thu, Feb 18, 2016 at 4:32 AM, vinshar <[hidden email]> wrote:

Hi Val

At the time of this issue i checked topology through visor and all 3 client and 2 server nodes were there. There were no items in any cache. I could see caches created on all 5 nodes (near cache on 3 clients and replicated cache on 2 servers). I also tried cleaning a cache through visor through command "cache -clear -c=PROGRAMS" which caused exception trace as i mentioned previously. I tried multiple times and got same error. I was running visor on one on the hosts of server nodes.

Shouldn't a node be dropped from topology if its not accessible due to any issue? Exception trace with class loader related exceptions, error in uodate when cache is empty on all nodes, all nodes visible in visor topology and getting same exception on repeated tries to clean a cache seems like pointing to problem other than network or node accessibility issue.

I restarted ignite server nodes but problem was still there. I had to stop all ignite nodes including clients to resolve problem.

Even more interesting thing is that i did not faced any issue on my DEV and QA environment when i did cache changes and restarted just server nodes. I faced this problem on pre-prod where i had to restart all nodes.

Regards,
Vinay Sharma

On Feb 17, 2016 7:05 PM, "vkulichenko [via Apache Ignite Users]" <[hidden email]> wrote:
Hi Vinay,

CachePartialUpdateException is thrown by an update operation (put, putAll, remove, removeAll, ...) if updates for one or more keys involved in this operation failed. This exception has failedKeys() method that tells you which keys failed so that you can retry only them, no need to retry successful ones.

Most likely you were getting these exceptions when there were no server nodes in the topology. Is this the case?

-Val


To unsubscribe from changed cache configuration and restarted server nodes. Getting exception., click here.
NAML


View this message in context: Re: changed cache configuration and restarted server nodes. Getting exception.

Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

This post was updated on .
Hi Vladimir,

Please find attached stack traces and visor output for my multiple tries to identify and resolve issue. Also find attached my server side configurations. We start all caches from client in local mode and all distributed caches have to be defined in server configs. Cache in attached file with entries (name ends with _ALL) is local cache. Other caches are replicated and defined in server configs.

Attached stack trace file contains topology and cache statistics during these multiple tries. I tried cleaning caches on a node by its ID and also clearing a cache by its name but both failed. attached file has stack traces for all.
I have masked some information like IPs etc. let me know in case if you need any more information.

default-config.xml

visor_ignite_stack_trace_masked.txt

client_stack_trace.txt

Regards,
Vinay Sharma
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vinay,

It looks like we have several problems here. 

First, Visor clear task doesn't work. Most probably this relates to Ignite internal components serialization problem which is currently being fixed as a part of IGNITE-2649 ticket.

Second - the main problem:
org.apache.ignite.binary.BinaryObjectException: Cannot find metadata for object with compact footer: -135842108

It looks like something is wrong with binary serialization of your objects. Please add the following XML snippet to your XML configuration and restart nodes. It should help:

<property name="binaryConfiguration">
    <bean class="org.apache.ignite.configuration.BinaryConfiguration">
        <property name="compactFooter" value="false"/>
    </bean>
</property>

Though, this is not the solution, but rather a workaround. In order to understand the root cause we need exact steps to reproduce the problem. Is it possible to provide the source code of the key and value classes you put into cache?

Vladimir.

On Thu, Feb 18, 2016 at 6:19 PM, vinshar <[hidden email]> wrote:
Hi Vladimir,

Please find attached stack traces and visor output for my multiple tries to
identify and resolve issue. Also find attached my server side
configurations. We start all caches from client in local mode and all
distributed caches have to be defined in server configs. Cache in attached
file with entries (name ends with _ALL) is local cache. Other caches are
replicated and defined in server configs.

Attached stack trace file contains topology and cache statistics during
these multiple tries. I tried cleaning caches on a node by its ID and also
clearing a cache by its name but both failed. attached file has stack traces
for all.
I have masked some information like IPs etc. let me know in case if you need
any more information.

default-config.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/n3085/default-config.xml>

visor_ignite_stack_trace_masked.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/n3085/visor_ignite_stack_trace_masked.txt>

Regards,
Vinay Sharma



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/changed-cache-configuration-and-restarted-server-nodes-Getting-exception-tp3064p3085.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Thanks for workaround Vladimir. I am using multiple caches. Each cache has key as long and value are POJOs. One is Program Cache of type <Long, ProgramDto> and similar other caches exists. All had this problem. All these value classes are simple POJOs which implement serializable, does not override any of Object classes methods (equals, hashcode, toString etc) and have class attributes of type - long, String, Boolean, java.sql.Timestamp with getter setter methods. All classes do have auto generated "private static final long serialVersionUID "

I also thought of a workaround of having a MBean which ,out of many other tasks, can also restart encapsulated Ignite instance in my custom cache manager. I landed in a  issue there which i will share in a separate thread.
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vinay,

Thanks for provided description. But I am afraid it is too broad for us to start investigation, because there are lots similar cases when all works fine. Though, the problem you faced seems pretty serious to me and we definitely need to find the root cause. 

Can we expect more assistance from your side with it? Any more hints - XML configuration, simplified code sample to reproduce the issue, etc. are appreciated.

Vladimir.

On Fri, Feb 19, 2016 at 6:32 PM, vinshar <[hidden email]> wrote:
Thanks for workaround Vladimir. I am using multiple caches. Each cache has
key as long and value are POJOs. One is Program Cache of type <Long,
ProgramDto> and similar other caches exists. All had this problem. All these
value classes are simple POJOs which implement serializable, does not
override any of Object classes methods (equals, hashcode, toString etc) and
have class attributes of type - long, String, Boolean, java.sql.Timestamp
with getter setter methods. All classes do have auto generated "private
static final long serialVersionUID "

I also thought of a workaround of having a MBean which ,out of many other
tasks, can also restart encapsulated Ignite instance in my custom cache
manager. I landed in a  issue there which i will share in a separate thread.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/changed-cache-configuration-and-restarted-server-nodes-Getting-exception-tp3064p3100.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vladimir,

Sure. I am up for any assistance to make Ignite better.

I shared server configs and stacktraces in previous communications. I will try my best to replicate this issue again.  If i could find exact steps then i will try to write a test case for simplification.

It may take some time for me to replicate this issue as this issue is intermittent and we faced it on only one of our 3 environments. I will keep group updated on the same.

Regards,
Vinay Sharma

vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

In reply to this post by Vladimir Ozerov
Hi Vladimir,

I am able to reproduce problem and it's not intermittent. Exception occurs everytime.
I am attaching class with main method which replicates the issue ans also attaching logs.

IgniteProblemTest.java
log.txt

Below is summary of what i am doing in main method
1) create a server node 1 with a replicated cache EMPLOYEE. Wait 10 seconds.
2) create a server node 2 with a replicated cache EMPLOYEE. Wait 10 seconds.
3) create a client node 1 with a near cache cache EMPLOYEE. Wait 10 seconds.
4) create a client node 2 with a near cache cache EMPLOYEE. Wait 10 seconds.
5) put 100 entries to both client caches. Only 10 entries remain. Others get evicted.
6) Close both servers and wait for 5 seconds.
7) start servers again with same configs. Wait 10 seconds after start of each server. Some exceptions seen during server close.
8) getOrCreate near caches again on client nodes.
9) Try putting objects again. Exception occurred.

Regards,
Vinay
vinshar vinshar
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

One observation. Everything works fine if i do not add any QueryIndex to QueryEntity. Seems like problem is due to old QueryIndex metadata instance being somehow used by client nodes even though all caches on all nodes were destroyed and all server nodes restarted.

vinshar wrote
Hi Vladimir,

I am able to reproduce problem and it's not intermittent. Exception occurs everytime.
I am attaching class with main method which replicates the issue ans also attaching logs.

IgniteProblemTest.java
log.txt

Below is summary of what i am doing in main method
1) create a server node 1 with a replicated cache EMPLOYEE. Wait 10 seconds.
2) create a server node 2 with a replicated cache EMPLOYEE. Wait 10 seconds.
3) create a client node 1 with a near cache cache EMPLOYEE. Wait 10 seconds.
4) create a client node 2 with a near cache cache EMPLOYEE. Wait 10 seconds.
5) put 100 entries to both client caches. Only 10 entries remain. Others get evicted.
6) Close both servers and wait for 5 seconds.
7) start servers again with same configs. Wait 10 seconds after start of each server. Some exceptions seen during server close.
8) getOrCreate near caches again on client nodes.
9) Try putting objects again. Exception occurred.

Regards,
Vinay
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: changed cache configuration and restarted server nodes. Getting exception.

Hi Vinay,

Thank you for attaching the code. I was able to get to the bottom of the issue and created a ticket - https://issues.apache.org/jira/browse/IGNITE-2779
Hope it will be fixed soon.

Vladimir.

On Tue, Mar 1, 2016 at 11:21 PM, vinshar <[hidden email]> wrote:
One observation. Everything works fine if i do not add any QueryIndex to
QueryEntity. Seems like problem is due to old QueryIndex metadata instance
being somehow used by client nodes even though all caches on all nodes were
destroyed and all server nodes restarted.


vinshar wrote
> Hi Vladimir,
>
> I am able to reproduce problem and it's not intermittent. Exception occurs
> everytime.
> I am attaching class with main method which replicates the issue ans also
> attaching logs.
> IgniteProblemTest.java
> <http://apache-ignite-users.70518.x6.nabble.com/file/n3307/IgniteProblemTest.java>
>
> log.txt
> <http://apache-ignite-users.70518.x6.nabble.com/file/n3307/log.txt>
>
> Below is summary of what i am doing in main method
> 1) create a server node 1 with a replicated cache EMPLOYEE. Wait 10
> seconds.
> 2) create a server node 2 with a replicated cache EMPLOYEE. Wait 10
> seconds.
> 3) create a client node 1 with a near cache cache EMPLOYEE. Wait 10
> seconds.
> 4) create a client node 2 with a near cache cache EMPLOYEE. Wait 10
> seconds.
> 5) put 100 entries to both client caches. Only 10 entries remain. Others
> get evicted.
> 6) Close both servers and wait for 5 seconds.
> 7) start servers again with same configs. Wait 10 seconds after start of
> each server. Some exceptions seen during server close.
> 8) getOrCreate near caches again on client nodes.
> 9) Try putting objects again. Exception occurred.
>
> Regards,
> Vinay





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/changed-cache-configuration-and-restarted-server-nodes-Getting-exception-tp3064p3310.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.