Ignit Cache Stopped

classic Classic list List threaded Threaded
9 messages Options
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Ignit Cache Stopped

Hi,

We noticed whenever long running queries fired, nodes are going out of topology and entire ignite cluster is down.

In my case, a filter criteria could get 5L records. So each API request could fetch 250 records. When page number is getting increased the query execution time is high and entire cluster is down


Can we set seperate thread pool for queries executions, compute jobs and other services instead of common public thread pool ?

Thanks


agura agura
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Anil,

IGNITE-4003 isn't related with your problem.

I think that nodes are going out of topology due to long GC pauses.
You can easily check this using GC logs.

On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:

> Hi,
>
> We noticed whenever long running queries fired, nodes are going out of
> topology and entire ignite cluster is down.
>
> In my case, a filter criteria could get 5L records. So each API request
> could fetch 250 records. When page number is getting increased the query
> execution time is high and entire cluster is down
>
>  https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>
> Can we set seperate thread pool for queries executions, compute jobs and
> other services instead of common public thread pool ?
>
> Thanks
>
>
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Hi Andrey,

I checked GClogs  and everything looks good. 

Thanks

On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
Anil,

IGNITE-4003 isn't related with your problem.

I think that nodes are going out of topology due to long GC pauses.
You can easily check this using GC logs.

On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
> Hi,
>
> We noticed whenever long running queries fired, nodes are going out of
> topology and entire ignite cluster is down.
>
> In my case, a filter criteria could get 5L records. So each API request
> could fetch 250 records. When page number is getting increased the query
> execution time is high and entire cluster is down
>
https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>
> Can we set seperate thread pool for queries executions, compute jobs and
> other services instead of common public thread pool ?
>
> Thanks
>
>

Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Hi Andrey,

The queyr execution time is very high when limit 10000+250 .

10 GB of heap memory for both client and servers. I have attached the gc logs of 4 servers. Could you please take a look ? thanks.
 

On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
Hi Andrey,

I checked GClogs  and everything looks good. 

Thanks

On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
Anil,

IGNITE-4003 isn't related with your problem.

I think that nodes are going out of topology due to long GC pauses.
You can easily check this using GC logs.

On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
> Hi,
>
> We noticed whenever long running queries fired, nodes are going out of
> topology and entire ignite cluster is down.
>
> In my case, a filter criteria could get 5L records. So each API request
> could fetch 250 records. When page number is getting increased the query
> execution time is high and entire cluster is down
>
https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>
> Can we set seperate thread pool for queries executions, compute jobs and
> other services instead of common public thread pool ?
>
> Thanks
>
>



gc-logs.zip (1M) Download Attachment
agura agura
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

From GC logs at the end of files I see Full GC pauses like this:

2017-02-17T04:29:22.118-0800: 21122.643: [Full GC (Allocation Failure)
 10226M->8526M(10G), 26.8952036 secs]
   [Eden: 0.0B(512.0M)->0.0B(536.0M) Survivors: 0.0B->0.0B Heap:
10226.0M(10.0G)->8526.8M(10.0G)], [Metaspace:
77592K->77592K(1120256K)]

Your heap is exhausted. During GC discovery doesn't receive heart
betas and nodes stopped due to segmentation. Please check your nodes'
logs for NODE_SEGMENTED pattern. If it is your case try to tune GC or
reduce load on GC (see for details [1])

[1] https://apacheignite.readme.io/docs/jvm-and-system-tuning

On Fri, Feb 17, 2017 at 6:35 PM, Anil <[hidden email]> wrote:

> Hi Andrey,
>
> The queyr execution time is very high when limit 10000+250 .
>
> 10 GB of heap memory for both client and servers. I have attached the gc
> logs of 4 servers. Could you please take a look ? thanks.
>
>
> On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
>>
>> Hi Andrey,
>>
>> I checked GClogs  and everything looks good.
>>
>> Thanks
>>
>> On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
>>>
>>> Anil,
>>>
>>> IGNITE-4003 isn't related with your problem.
>>>
>>> I think that nodes are going out of topology due to long GC pauses.
>>> You can easily check this using GC logs.
>>>
>>> On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
>>> > Hi,
>>> >
>>> > We noticed whenever long running queries fired, nodes are going out of
>>> > topology and entire ignite cluster is down.
>>> >
>>> > In my case, a filter criteria could get 5L records. So each API request
>>> > could fetch 250 records. When page number is getting increased the
>>> > query
>>> > execution time is high and entire cluster is down
>>> >
>>> >  https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>>> >
>>> > Can we set seperate thread pool for queries executions, compute jobs
>>> > and
>>> > other services instead of common public thread pool ?
>>> >
>>> > Thanks
>>> >
>>> >
>>
>>
>
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Hi Andrey,

Does client ignite gc impact ignite cluster topology ? 

Thanks

On 17 February 2017 at 22:56, Andrey Gura <[hidden email]> wrote:
From GC logs at the end of files I see Full GC pauses like this:

2017-02-17T04:29:22.118-0800: 21122.643: [Full GC (Allocation Failure)
 10226M->8526M(10G), 26.8952036 secs]
   [Eden: 0.0B(512.0M)->0.0B(536.0M) Survivors: 0.0B->0.0B Heap:
10226.0M(10.0G)->8526.8M(10.0G)], [Metaspace:
77592K->77592K(1120256K)]

Your heap is exhausted. During GC discovery doesn't receive heart
betas and nodes stopped due to segmentation. Please check your nodes'
logs for NODE_SEGMENTED pattern. If it is your case try to tune GC or
reduce load on GC (see for details [1])

[1] https://apacheignite.readme.io/docs/jvm-and-system-tuning

On Fri, Feb 17, 2017 at 6:35 PM, Anil <[hidden email]> wrote:
> Hi Andrey,
>
> The queyr execution time is very high when limit 10000+250 .
>
> 10 GB of heap memory for both client and servers. I have attached the gc
> logs of 4 servers. Could you please take a look ? thanks.
>
>
> On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
>>
>> Hi Andrey,
>>
>> I checked GClogs  and everything looks good.
>>
>> Thanks
>>
>> On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
>>>
>>> Anil,
>>>
>>> IGNITE-4003 isn't related with your problem.
>>>
>>> I think that nodes are going out of topology due to long GC pauses.
>>> You can easily check this using GC logs.
>>>
>>> On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
>>> > Hi,
>>> >
>>> > We noticed whenever long running queries fired, nodes are going out of
>>> > topology and entire ignite cluster is down.
>>> >
>>> > In my case, a filter criteria could get 5L records. So each API request
>>> > could fetch 250 records. When page number is getting increased the
>>> > query
>>> > execution time is high and entire cluster is down
>>> >
>>> >  https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>>> >
>>> > Can we set seperate thread pool for queries executions, compute jobs
>>> > and
>>> > other services instead of common public thread pool ?
>>> >
>>> > Thanks
>>> >
>>> >
>>
>>
>

agura agura
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Anil,

No, it doesn't. Only client should left topology in this case.

On Mon, Feb 20, 2017 at 3:44 PM, Anil <[hidden email]> wrote:

> Hi Andrey,
>
> Does client ignite gc impact ignite cluster topology ?
>
> Thanks
>
> On 17 February 2017 at 22:56, Andrey Gura <[hidden email]> wrote:
>>
>> From GC logs at the end of files I see Full GC pauses like this:
>>
>> 2017-02-17T04:29:22.118-0800: 21122.643: [Full GC (Allocation Failure)
>>  10226M->8526M(10G), 26.8952036 secs]
>>    [Eden: 0.0B(512.0M)->0.0B(536.0M) Survivors: 0.0B->0.0B Heap:
>> 10226.0M(10.0G)->8526.8M(10.0G)], [Metaspace:
>> 77592K->77592K(1120256K)]
>>
>> Your heap is exhausted. During GC discovery doesn't receive heart
>> betas and nodes stopped due to segmentation. Please check your nodes'
>> logs for NODE_SEGMENTED pattern. If it is your case try to tune GC or
>> reduce load on GC (see for details [1])
>>
>> [1] https://apacheignite.readme.io/docs/jvm-and-system-tuning
>>
>> On Fri, Feb 17, 2017 at 6:35 PM, Anil <[hidden email]> wrote:
>> > Hi Andrey,
>> >
>> > The queyr execution time is very high when limit 10000+250 .
>> >
>> > 10 GB of heap memory for both client and servers. I have attached the gc
>> > logs of 4 servers. Could you please take a look ? thanks.
>> >
>> >
>> > On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
>> >>
>> >> Hi Andrey,
>> >>
>> >> I checked GClogs  and everything looks good.
>> >>
>> >> Thanks
>> >>
>> >> On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
>> >>>
>> >>> Anil,
>> >>>
>> >>> IGNITE-4003 isn't related with your problem.
>> >>>
>> >>> I think that nodes are going out of topology due to long GC pauses.
>> >>> You can easily check this using GC logs.
>> >>>
>> >>> On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > We noticed whenever long running queries fired, nodes are going out
>> >>> > of
>> >>> > topology and entire ignite cluster is down.
>> >>> >
>> >>> > In my case, a filter criteria could get 5L records. So each API
>> >>> > request
>> >>> > could fetch 250 records. When page number is getting increased the
>> >>> > query
>> >>> > execution time is high and entire cluster is down
>> >>> >
>> >>> >  https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>> >>> >
>> >>> > Can we set seperate thread pool for queries executions, compute jobs
>> >>> > and
>> >>> > other services instead of common public thread pool ?
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>
Anil Anil
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

Thanks Andrey.

I see node is down even gc log looks good. I will try to reproduce.

May I know what is the org.h2.value.ValueString objects in the attached the screenshot ?

Thanks.  

On 20 February 2017 at 18:37, Andrey Gura <[hidden email]> wrote:
Anil,

No, it doesn't. Only client should left topology in this case.

On Mon, Feb 20, 2017 at 3:44 PM, Anil <[hidden email]> wrote:
> Hi Andrey,
>
> Does client ignite gc impact ignite cluster topology ?
>
> Thanks
>
> On 17 February 2017 at 22:56, Andrey Gura <[hidden email]> wrote:
>>
>> From GC logs at the end of files I see Full GC pauses like this:
>>
>> 2017-02-17T04:29:22.118-0800: 21122.643: [Full GC (Allocation Failure)
>>  10226M->8526M(10G), 26.8952036 secs]
>>    [Eden: 0.0B(512.0M)->0.0B(536.0M) Survivors: 0.0B->0.0B Heap:
>> 10226.0M(10.0G)->8526.8M(10.0G)], [Metaspace:
>> 77592K->77592K(1120256K)]
>>
>> Your heap is exhausted. During GC discovery doesn't receive heart
>> betas and nodes stopped due to segmentation. Please check your nodes'
>> logs for NODE_SEGMENTED pattern. If it is your case try to tune GC or
>> reduce load on GC (see for details [1])
>>
>> [1] https://apacheignite.readme.io/docs/jvm-and-system-tuning
>>
>> On Fri, Feb 17, 2017 at 6:35 PM, Anil <[hidden email]> wrote:
>> > Hi Andrey,
>> >
>> > The queyr execution time is very high when limit 10000+250 .
>> >
>> > 10 GB of heap memory for both client and servers. I have attached the gc
>> > logs of 4 servers. Could you please take a look ? thanks.
>> >
>> >
>> > On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
>> >>
>> >> Hi Andrey,
>> >>
>> >> I checked GClogs  and everything looks good.
>> >>
>> >> Thanks
>> >>
>> >> On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
>> >>>
>> >>> Anil,
>> >>>
>> >>> IGNITE-4003 isn't related with your problem.
>> >>>
>> >>> I think that nodes are going out of topology due to long GC pauses.
>> >>> You can easily check this using GC logs.
>> >>>
>> >>> On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > We noticed whenever long running queries fired, nodes are going out
>> >>> > of
>> >>> > topology and entire ignite cluster is down.
>> >>> >
>> >>> > In my case, a filter criteria could get 5L records. So each API
>> >>> > request
>> >>> > could fetch 250 records. When page number is getting increased the
>> >>> > query
>> >>> > execution time is high and entire cluster is down
>> >>> >
>> >>> >  https://issues.apache.org/jira/browse/IGNITE-4003 related to this ?
>> >>> >
>> >>> > Can we set seperate thread pool for queries executions, compute jobs
>> >>> > and
>> >>> > other services instead of common public thread pool ?
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>


Memory.JPG (272K) Download Attachment
agura agura
Reply | Threaded
Open this post in threaded view
|

Re: Ignit Cache Stopped

I think it is just H2 wrapper for string values.

On Tue, Feb 21, 2017 at 8:21 AM, Anil <[hidden email]> wrote:

> Thanks Andrey.
>
> I see node is down even gc log looks good. I will try to reproduce.
>
> May I know what is the org.h2.value.ValueString objects in the attached the
> screenshot ?
>
> Thanks.
>
> On 20 February 2017 at 18:37, Andrey Gura <[hidden email]> wrote:
>>
>> Anil,
>>
>> No, it doesn't. Only client should left topology in this case.
>>
>> On Mon, Feb 20, 2017 at 3:44 PM, Anil <[hidden email]> wrote:
>> > Hi Andrey,
>> >
>> > Does client ignite gc impact ignite cluster topology ?
>> >
>> > Thanks
>> >
>> > On 17 February 2017 at 22:56, Andrey Gura <[hidden email]> wrote:
>> >>
>> >> From GC logs at the end of files I see Full GC pauses like this:
>> >>
>> >> 2017-02-17T04:29:22.118-0800: 21122.643: [Full GC (Allocation Failure)
>> >>  10226M->8526M(10G), 26.8952036 secs]
>> >>    [Eden: 0.0B(512.0M)->0.0B(536.0M) Survivors: 0.0B->0.0B Heap:
>> >> 10226.0M(10.0G)->8526.8M(10.0G)], [Metaspace:
>> >> 77592K->77592K(1120256K)]
>> >>
>> >> Your heap is exhausted. During GC discovery doesn't receive heart
>> >> betas and nodes stopped due to segmentation. Please check your nodes'
>> >> logs for NODE_SEGMENTED pattern. If it is your case try to tune GC or
>> >> reduce load on GC (see for details [1])
>> >>
>> >> [1] https://apacheignite.readme.io/docs/jvm-and-system-tuning
>> >>
>> >> On Fri, Feb 17, 2017 at 6:35 PM, Anil <[hidden email]> wrote:
>> >> > Hi Andrey,
>> >> >
>> >> > The queyr execution time is very high when limit 10000+250 .
>> >> >
>> >> > 10 GB of heap memory for both client and servers. I have attached the
>> >> > gc
>> >> > logs of 4 servers. Could you please take a look ? thanks.
>> >> >
>> >> >
>> >> > On 17 February 2017 at 20:52, Anil <[hidden email]> wrote:
>> >> >>
>> >> >> Hi Andrey,
>> >> >>
>> >> >> I checked GClogs  and everything looks good.
>> >> >>
>> >> >> Thanks
>> >> >>
>> >> >> On 17 February 2017 at 20:45, Andrey Gura <[hidden email]> wrote:
>> >> >>>
>> >> >>> Anil,
>> >> >>>
>> >> >>> IGNITE-4003 isn't related with your problem.
>> >> >>>
>> >> >>> I think that nodes are going out of topology due to long GC pauses.
>> >> >>> You can easily check this using GC logs.
>> >> >>>
>> >> >>> On Fri, Feb 17, 2017 at 6:04 PM, Anil <[hidden email]> wrote:
>> >> >>> > Hi,
>> >> >>> >
>> >> >>> > We noticed whenever long running queries fired, nodes are going
>> >> >>> > out
>> >> >>> > of
>> >> >>> > topology and entire ignite cluster is down.
>> >> >>> >
>> >> >>> > In my case, a filter criteria could get 5L records. So each API
>> >> >>> > request
>> >> >>> > could fetch 250 records. When page number is getting increased
>> >> >>> > the
>> >> >>> > query
>> >> >>> > execution time is high and entire cluster is down
>> >> >>> >
>> >> >>> >  https://issues.apache.org/jira/browse/IGNITE-4003 related to
>> >> >>> > this ?
>> >> >>> >
>> >> >>> > Can we set seperate thread pool for queries executions, compute
>> >> >>> > jobs
>> >> >>> > and
>> >> >>> > other services instead of common public thread pool ?
>> >> >>> >
>> >> >>> > Thanks
>> >> >>> >
>> >> >>> >
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>