Nodes are restarting when i try to drop a table created with persistence enabled

classic Classic list List threaded Threaded
15 messages Options
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Nodes are restarting when i try to drop a table created with persistence enabled

This post was updated on .
Hi all,
I created a table with JDBC connection with native persistence enabled in
partitioned mode and i have 2 ignite nodes (2.7.0 version) running in
kubernetes environment, then i ingested 1500000 records, when i try to drop
the table both the pods are restarting one after the other.
Please find the attached thread dump logs
and after this drop statement is unsuccessful

0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |          
TABLE_NAME           |           TABLE_TYPE           |            REMARKS            
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                        
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                                
|
|                                | PUBLIC                         | CELL                          
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> DROP TABLE IF EXISTS
PUBLIC.DEVICE;
Error: Statement is closed. (state=,code=0)
java.sql.SQLException: Statement is closed.
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.ensureNotClosed(JdbcThinStatement.java:862)
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.getWarnings(JdbcThinStatement.java:454)
        at sqlline.Commands.execute(Commands.java:849)
        at sqlline.Commands.sql(Commands.java:733)
        at sqlline.SqlLine.dispatch(SqlLine.java:795)
        at sqlline.SqlLine.begin(SqlLine.java:668)
        at sqlline.SqlLine.start(SqlLine.java:373)
        at sqlline.SqlLine.main(SqlLine.java:265)
0: jdbc:ignite:thin://ignite-service.cign.svc> !quit
Closing: org.apache.ignite.internal.jdbc.thin.JdbcThinConnection
[root@vm-10-99-26-135 bin]# ./sqlline.sh --verbose=true -u
"jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;"
issuing: !connect
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
'' '' org.apache.ignite.IgniteJdbcThinDriver
Connecting to
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
Connected to: Apache Ignite (version 2.7.0#19700101-sha1:00000000)
Driver: Apache Ignite Thin JDBC Driver (version
2.7.0#20181130-sha1:256ae401)
Autocommit status: true
Transaction isolation: TRANSACTION_REPEATABLE_READ
sqlline version 1.3.0
0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |          
TABLE_NAME           |           TABLE_TYPE           |            REMARKS            
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                        
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                                
|
|                                | PUBLIC                         | CELL                          
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> select count(*) from DEVICE;
+--------------------------------+
|            COUNT(*)            |
+--------------------------------+
| 1500000                        |
+--------------------------------+
1 row selected (5.665 seconds)
0: jdbc:ignite:thin://ignite-service.cign.svc>

ignite_thread_dump.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t2244/ignite_thread_dump.txt>   


shiva





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Seems that the system worker has been blocked on your end for more than 30 seconds and this caused the shutdown due to an watchdog:
[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=db-checkpoint-thread, blockedFor=32s]
[2019-04-12T10:52:27,451][WARN ][tcp-disco-msg-worker-#2][G] Thread [name="db-checkpoint-thread-#61", id=115, state=WAITING, blockCnt=39, waitCnt=309]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@92173e8, ownerName=null, ownerId=-1]

[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=db-checkpoint-thread, igniteInstanceName=null, finished=false, heartbeatTs=1555066315438]]]

Try to tune this watchdog or disable. That's what the docs say:
https://apacheignite.readme.io/docs/critical-failures-handling#section-critical-workers-health-check
Ignite has an internal mechanism for verifying that critical workers are operational. Each worker is regularly checked whether it's alive and is updating its heartbeat timestamp. If either of the conditions is not observed for the configured period of time, the worker is regarded as blocked and Ignite will output that information to the log file. The period of inactivity is specified by the IgniteConfiguration.systemWorkerBlockedTimeout property (in milliseconds; the default value equals the failure detection timeout).

This behavior will be revisited in Ignite soon: http://apache-ignite-developers.2346864.n4.nabble.com/GridDhtInvalidPartitionException-takes-the-cluster-down-td41459.html

-
Denis


On Mon, Apr 15, 2019 at 9:13 PM shivakumar <[hidden email]> wrote:
Hi all,
I created a table with JDBC connection with native persistence enabled in
partitioned mode and i have 2 ignite nodes (2.7.0 version) running in
kubernetes environment, then i ingested 1500000 records, when i try to drop
the table both the pods are restarting one after the other.
Please find the attached thread dump logs
and after this drop statement is unsuccessful

0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |         
TABLE_NAME           |           TABLE_TYPE           |            REMARKS             
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                       
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                               
|
|                                | PUBLIC                         | CELL                         
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> DROP TABLE IF EXISTS
PUBLIC.DEVICE;
Error: Statement is closed. (state=,code=0)
java.sql.SQLException: Statement is closed.
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.ensureNotClosed(JdbcThinStatement.java:862)
        at
org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.getWarnings(JdbcThinStatement.java:454)
        at sqlline.Commands.execute(Commands.java:849)
        at sqlline.Commands.sql(Commands.java:733)
        at sqlline.SqlLine.dispatch(SqlLine.java:795)
        at sqlline.SqlLine.begin(SqlLine.java:668)
        at sqlline.SqlLine.start(SqlLine.java:373)
        at sqlline.SqlLine.main(SqlLine.java:265)
0: jdbc:ignite:thin://ignite-service.cign.svc> !quit
Closing: org.apache.ignite.internal.jdbc.thin.JdbcThinConnection
[root@vm-10-99-26-135 bin]# ./sqlline.sh --verbose=true -u
"jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;"
issuing: !connect
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
'' '' org.apache.ignite.IgniteJdbcThinDriver
Connecting to
jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
Connected to: Apache Ignite (version 2.7.0#19700101-sha1:00000000)
Driver: Apache Ignite Thin JDBC Driver (version
2.7.0#20181130-sha1:256ae401)
Autocommit status: true
Transaction isolation: TRANSACTION_REPEATABLE_READ
sqlline version 1.3.0
0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|           TABLE_CAT            |          TABLE_SCHEM           |         
TABLE_NAME           |           TABLE_TYPE           |            REMARKS             
|
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
|                                | PUBLIC                         | DEVICE                       
| TABLE                          |                                 |
|                                | PUBLIC                         |
DIMENSIONS                     | TABLE                          |                               
|
|                                | PUBLIC                         | CELL                         
| TABLE                          |                                 |
+--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
0: jdbc:ignite:thin://ignite-service.cign.svc> select count(*) from DEVICE;
+--------------------------------+
|            COUNT(*)            |
+--------------------------------+
| 1500000                        |
+--------------------------------+
1 row selected (5.665 seconds)
0: jdbc:ignite:thin://ignite-service.cign.svc>

ignite_thread_dump.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t2244/ignite_thread_dump.txt>   


shiva





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

HI Denis,

is there any specific reason for the blocking of critical thread, like CPU
is full or Heap is full ?
We are again and again hitting this issue.
is there any other way to drop tables/cache ?
This looks like a critical issue.

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hi Shiva,

That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?

-
Denis


On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
HI Denis,

is there any specific reason for the blocking of critical thread, like CPU
is full or Heap is full ?
We are again and again hitting this issue.
is there any other way to drop tables/cache ?
This looks like a critical issue.

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hi dmagda,

I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"

This is the error messages in logs:

""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?

We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.

even during delete from table operation, I have seen:

"Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|


On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
Hi Shiva,

That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?

-
Denis


On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
HI Denis,

is there any specific reason for the blocking of critical thread, like CPU
is full or Heap is full ?
We are again and again hitting this issue.
is there any other way to drop tables/cache ?
This looks like a critical issue.

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Shiva,

Does this issue still exist? Ignite Dev how do we debug this sort of thing?

-
Denis


On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]> wrote:
Hi dmagda,

I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"

This is the error messages in logs:

""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?

We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.

even during delete from table operation, I have seen:

"Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|


On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
Hi Shiva,

That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?

-
Denis


On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
HI Denis,

is there any specific reason for the blocking of critical thread, like CPU
is full or Heap is full ?
We are again and again hitting this issue.
is there any other way to drop tables/cache ?
This looks like a critical issue.

regards,
shiva



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Denis Mekhanikov Denis Mekhanikov
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <[hidden email]>:

>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
shivakumar shivakumar
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hi dmagda,

When I insert many records (~ 10 or 20 million) to the same table and try to drop table or delete records from the table, nodes are restarting, the restarts happens In the middle of drop or delete operation.
According to the logs the cause for restart looks like OOM in the data region.

regards,
shiva

On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <[hidden email]> wrote:
I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <[hidden email]>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Ivan, Igor, Andrey, as SQL experts, 

Does this sound like a known limitation or issue? If not, what do we need to reproduce the scenario - heapdums?

-
Denis


On Thu, Sep 26, 2019 at 2:12 AM Shiva Kumar <[hidden email]> wrote:
Hi dmagda,

When I insert many records (~ 10 or 20 million) to the same table and try to drop table or delete records from the table, nodes are restarting, the restarts happens In the middle of drop or delete operation.
According to the logs the cause for restart looks like OOM in the data region.

regards,
shiva

On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov <[hidden email]> wrote:
I think, the issue is that Ignite can't recover from
IgniteOutOfMemory, even by removing data.
Shiva, did IgniteOutOfMemory occur for the first time when you did the
DROP TABLE, or before that?

Denis

ср, 25 сент. 2019 г. в 02:30, Denis Magda <[hidden email]>:
>
> Shiva,
>
> Does this issue still exist? Ignite Dev how do we debug this sort of thing?
>
> -
> Denis
>
>
> On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar <[hidden email]> wrote:
>>
>> Hi dmagda,
>>
>> I am trying to drop the table which has around 10 million records and I am seeing "Out of memory in data region" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting.
>> I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence.
>> Earlier I was in an impression that restart was caused by "SYSTEM_WORKER_BLOCKED" errors but now I am realized that  "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
>>
>> This is the error messages in logs:
>>
>> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155]
>> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
>>   ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
>>   ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
>>   ^-- Enable eviction or expiration policies]]
>>
>> Could you please help me on why drop table operation causing  "Out of memory in data region"? and how I can avoid it?
>>
>> We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records.
>>
>> even during delete from table operation, I have seen:
>>
>> "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.|
>>
>>
>>
>> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda <[hidden email]> wrote:
>>>
>>> Hi Shiva,
>>>
>>> That was designed to prevent global cluster performance degradation or other outages. Have you tried to apply my recommendation of turning of the failure handler for this system threads?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar <[hidden email]> wrote:
>>>>
>>>> HI Denis,
>>>>
>>>> is there any specific reason for the blocking of critical thread, like CPU
>>>> is full or Heap is full ?
>>>> We are again and again hitting this issue.
>>>> is there any other way to drop tables/cache ?
>>>> This looks like a critical issue.
>>>>
>>>> regards,
>>>> shiva
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

We noted the same on 2.7.6 as well. Deleting tables continuously from a thick client causes out of memory exceptions in other thick clients.
The fix, regarding grid partition message exchanges,  that went in 2.7.6 does not seem to work.
Ivan Pavlukhin Ivan Pavlukhin
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hi,

Stacktrace and exception message has some valuable details:
org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Failed to
find a page for eviction [segmentCapacity=126515, loaded=49628,
maxDirtyPages=37221, dirtyPages=49627, cpPages=0, pinnedInSegment=1,
failedToPrepare=49628]

I see a following:
1. Not all data fits data region memory.
2. Exception occurs when underlying cache is destroyed
(IgniteCacheOffheapManagerImpl.stopCache/removeCacheData call in stack
trace).
3. Page for replacement to disk was not found (loaded=49628,
failedToPrepare=49628). Almost all pages are dirty (dirtyPages=49627).

Answering several questions can help:
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
2. Does the same occur if SQL is not enabled for a cache?
3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
4. Need to figure out why almost all pages are dirty. It might be a clue.
maheshkr76private maheshkr76private
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Shivakumar's system configuration and mine could be different. But I feel, we
are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go
out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump
attached. Please go through this.


Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a
ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP
TABLE?
All the caches we destroy are SQL caches. SO we use drop table.
IgniteCache.destroy gives an exception.
Exception in thread "main" class org.apache.ignite.IgniteException: Only
cache created with cache API may be removed with direct call to destroyCache
[cacheName=SQL_PUBLIC_PERSON1000]


2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use
SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data
is in memory, we have about 100GB in memory and the data regions together
are about 128GB.


I don't want to confuse this thread as Shivakumar scenario could be
different
I don't mind discussing this on the original thread I opened (specified
above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other
thick clients to go OOM. This can be seen on 2.7.6 too.  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

In reply to this post by Ivan Pavlukhin
Shivakumar's system configuration and mine could be different. But I feel, we are seeing the same issue here.

Deleting tables via a single thick client causes other thick clients to go out of memory. This OOM issue was reported below here.
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-Ignite-client-memory-leak-td28938.html
Now, this thread has the server and client configs, client JVM heap-dump attached. Please go through this.

Reproducibility of this problem
take Ignite 2.7.6
- allocation about -XMX 1GB for each of the thick clients, connect them to a ignite cluster.
- Let the ignite cluster have about 500 dummy tables. Keep deleting them.
Eventually, you will see thick clients failing with OOM.

Now coming to your questions
1. Does the same occur if IgniteCache.destroy() is called instead of DROP TABLE?
All the caches we destroy are SQL caches. SO we use drop table. IgniteCache.destroy gives an exception.

Exception in thread "main" class org.apache.ignite.IgniteException: Only cache created with cache API may be removed with direct call to destroyCache [cacheName=SQL_PUBLIC_PERSON1000]


2. Does the same occur if SQL is not enabled for a cache?
We did not check this and it is not a use case that we have. We primary use SQL caches.

3. It would be nice to see IgniteConfiguration and CacheConfiguration
causing problems.
Attached in a different thread, specified above.

4. Need to figure out why almost all pages are dirty. It might be a clue.
This is probably the scenario in Shivakumar sent. In my case, all the data is in memory, we have about 100GB in memory and the data regions together are about 128GB.

I don't want to confuse this thread as Shivakumar scenario could be different
I don't mind discussing this on the other thread I opened (specified above. memory leaks)
Bottom line is: deleting tables from one thick client, is causing other thick clients to go OOM. This can be seen on 2.7.6 too.  


Mahesh Renduchintala Mahesh Renduchintala
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Upon reviewing 12255, the description of this issue shows an exception occurring on the thick client side. 
However, the logs, that I attached show a null pointer exception on the ALL the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?
maheshkr76private maheshkr76private
Reply | Threaded
Open this post in threaded view
|

Re: nodes are restarting when i try to drop a table created with persistence enabled

Hello, please ignore the below comment on this topic

>>>
https://issues.apache.org/jira/browse/IGNITE-12255
Upon reviewing 12255, the description of this issue shows an exception
occurring on the thick client side.
However, the logs, that I attached show a null pointer exception on the ALL
the server nodes, leading to a complete cluster crash.
isnt the issue, I am reporting here different from 12255?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/