Long transaction suspended

classic Classic list List threaded Threaded
22 messages Options
12
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Long transaction suspended

Hi all,

I'm trying to commit a very large transaction (8M keys and ~4GB of data).

After a while, I can see this diagnostics message in node log:
[08:56:31,721][WARNING][sys-#989][diagnostic] >>> Transaction
[startTime=08:55:22.095, curTime=08:56:31.712, ... *state=SUSPENDED* ...

Does anyone know why it is suspended, and how to avoid it?

Thanks in advance
José





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hi again,

For an smaller succeeding transaction 1.2M keys and 600MB in size, I noticed
it changed its state something similar as follows:

SUSPENDED -> ACTIVE -> COMMITTING

... and it takes around 3 min to finish.

For another test with 4M keys and 2GB it is still in SUSPENDED state after
30 min.

There is a maximum number of keys/size for a single transaction?
There is any documentation out there about transaction states?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Another test with 2M keys and 1GB also remains in SUSPENDED state after 11
minutes...

I don't understand where the difference between this one and the successful
1.2M keys and 600MB could be.  Any idea is welcomed



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

This post was updated on .
Hi,
  Make sure that your code is not suspending these transactions under high
load conditions.
 
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/transactions/Transaction.html#suspend--

  See this guide:
   https://ignite.apache.org/docs/latest/key-value-api/transactions

 Try monitoring your transactions:

sql views:        
https://ignite.apache.org/docs/latest/monitoring-metrics/system-views#transactions
new metrics:  
https://ignite.apache.org/docs/latest/monitoring-metrics/metrics#monitoring-transactions
jmx:              
https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#transactions
control script:  
https://ignite.apache.org/docs/latest/tools/control-script#transaction-management

   Check what configs you've set:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/TransactionConfiguration.html
 

  Try altering the topology, use one node only, and/or a memory only
cluster.
  Simplify your code to make the transaction itself simpler.

  If all these steps fail, and you are able to create a reproducer, post it
here, along w/the logs
and the version #'s.

Thanks, Alex




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
akorensh akorensh
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

I would also recommend taking a thread dump to see where this suspension is
coming from.
Attach this thread dump here along w/the reproducer.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hello!

I can see that the only occurrence of transaction suspending in our own code is in thin client implementation.

Do you happen to use thin client for this operation?

Regards,
--
Ilya Kasnacheev


пн, 8 февр. 2021 г. в 20:32, akorensh <[hidden email]>:
I would also recommend taking a thread dump to see where this suspension is
coming from.
Attach this thread dump here along w/the reproducer.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

In reply to this post by akorensh
Hi,

First off, thanks for your help.

In the test, I'm using a single server node cluster with the official 2.9.1
version.  Client is a C++ Thin Client with transactions support (commit
685c1b70ca from master branch).

The test is very simple:

      struct Blob
      {
         int8_t m_blob[512];
      };

      IgniteClient client = IgniteClient::Start(cfg);

      CacheClient<int32_t, Blob> cache = client.GetOrCreateCache<int32_t,
examples::Blob>("vds");

      cache.Clear();

      std::map<int32_t, Blob> map;

      for (uint32_t i = 0; i < 2000000; ++i)
         map.insert (std::make_pair(i, Blob()));

      ClientTransactions transactions = client.ClientTransactions();

      ClientTransaction tx = transactions.TxStart(PESSIMISTIC,
READ_COMMITTED);

      cache.PutAll(map);

      tx.Commit();

As you can see, the total size of the transaction (not taking keys into
account) is 2M * 512B = 1GB.  If we limit the loop up to 1.9M, it works...
and I've found where the problem is:

<http://apache-ignite-users.70518.x6.nabble.com/file/t3059/bug.png>

As you can see, as "doubleCap" is an int, trying to double it when "cap" is
big enough makes it negative, therefore, it's not finally doubled... which
leads to a reallocation of 1GB each time a new key-value entry is added to
the tcp message.

Using integers to store capacity in your C++ Thin Client is implicitly
limiting your maximum transaction size up to 1GB.  Maybe you should consider
to use uint64_t instead...






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

In reply to this post by ilya.kasnacheev
Hello Ilya,

Yes, but it has nothing to do with suspending an active transaction... the
problem is that transaction never reaches ACTIVE state because it takes a
long time creating the tcp message.

Please, take a look to my previous post.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

In reply to this post by jjimeno
Hello!

Would you care to create a JIRA ticket for that issue?

Regards,
--
Ilya Kasnacheev


ср, 10 февр. 2021 г. в 14:18, jjimeno <[hidden email]>:
Hi,

First off, thanks for your help.

In the test, I'm using a single server node cluster with the official 2.9.1
version.  Client is a C++ Thin Client with transactions support (commit
685c1b70ca from master branch).

The test is very simple:

      struct Blob
      {
         int8_t m_blob[512];
      };

      IgniteClient client = IgniteClient::Start(cfg);

      CacheClient<int32_t, Blob> cache = client.GetOrCreateCache<int32_t,
examples::Blob>("vds");

      cache.Clear();

      std::map<int32_t, Blob> map;

      for (uint32_t i = 0; i < 2000000; ++i)
         map.insert (std::make_pair(i, Blob()));

      ClientTransactions transactions = client.ClientTransactions();

      ClientTransaction tx = transactions.TxStart(PESSIMISTIC,
READ_COMMITTED);

      cache.PutAll(map);

      tx.Commit();

As you can see, the total size of the transaction (not taking keys into
account) is 2M * 512B = 1GB.  If we limit the loop up to 1.9M, it works...
and I've found where the problem is:

<http://apache-ignite-users.70518.x6.nabble.com/file/t3059/bug.png>

As you can see, as "doubleCap" is an int, trying to double it when "cap" is
big enough makes it negative, therefore, it's not finally doubled... which
leads to a reallocation of 1GB each time a new key-value entry is added to
the tcp message.

Using integers to store capacity in your C++ Thin Client is implicitly
limiting your maximum transaction size up to 1GB.  Maybe you should consider
to use uint64_t instead...






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

I wouldn't mind, but I'm afraid I'm not allowed to... at least, I couldn't
find the option on that page :)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hello!

I think you need to register first.

Btw, why do you need such large transactions? Have you considered data streamer instead?

Regards,
--
Ilya Kasnacheev


ср, 10 февр. 2021 г. в 15:28, jjimeno <[hidden email]>:
I wouldn't mind, but I'm afraid I'm not allowed to... at least, I couldn't
find the option on that page :)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hi,

Because of the kind of product we have to develop, we currently have a set
of scenarios with this kind of transactions and we're evaluating several
datastores as RocksDB and, sadly, timings there are quite better than the
ones I've got in Ignite... :(

Data streamer is not available in C++ afaik...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hello!

RocksDB is an embedded database whereas Apache Ignite is a distributed database.

Regards,
--
Ilya Kasnacheev


ср, 10 февр. 2021 г. в 16:11, jjimeno <[hidden email]>:
Hi,

Because of the kind of product we have to develop, we currently have a set
of scenarios with this kind of transactions and we're evaluating several
datastores as RocksDB and, sadly, timings there are quite better than the
ones I've got in Ignite... :(

Data streamer is not available in C++ afaik...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Zhenya Stanilovsky Zhenya Stanilovsky
Reply | Threaded
Open this post in threaded view
|

Re[2]: Long transaction suspended

In reply to this post by jjimeno
Hi !


 
Hi,

Because of the kind of product we have to develop, we currently have a set
of scenarios with this kind of transactions and we're evaluating several
datastores as RocksDB and, sadly, timings there are quite better than the
ones I've got in Ignite... :(
 
I believe tx.putAll will be fixed soon ) I have working prototype for now, need a little bit time to fix all tests )
 

Data streamer is not available in C++ afaik...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
 
 
 
 
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Long transaction suspended

Hello!

Unfortunately, this C++ thin client / platforms issue is not so easily fixable. Our platforms interaction does not expect buffers larger than 2G apparently.

Regards,
--
Ilya Kasnacheev


ср, 10 февр. 2021 г. в 16:43, Zhenya Stanilovsky <[hidden email]>:
Hi !


 
Hi,

Because of the kind of product we have to develop, we currently have a set
of scenarios with this kind of transactions and we're evaluating several
datastores as RocksDB and, sadly, timings there are quite better than the
ones I've got in Ignite... :(
 
I believe tx.putAll will be fixed soon ) I have working prototype for now, need a little bit time to fix all tests )
 

Data streamer is not available in C++ afaik...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
 
 
 
 
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

In reply to this post by ilya.kasnacheev
Hello!

That's exactly the reason why we would prefer to choose Ignite over RocksDB.
Otherwise, we will have to implement scalability by ourselves and, believe
me, that's not something we would like to do.

We also know they're not directly comparable. We would agree to pay the
price for scalability with slightly worse performance but, based on our
tests, it's too big.

For instance:
  - Single node cluster in the same host as the application (no
communication over the wire, trying to get closer to an embedded database)
  - A single user (no multiple users working either on the application or
the database)

A transactional commit with 1.8M keys and 1GB in size takes 97 seconds with
NO persistence, and this time is doubled if persistence is enabled.  RocksDB
takes around 100 seconds to perform a transaction with 4M keys and 4GB in
size, persistence included.  As you can see, there is a huge difference.

On the other hand, limitations like the ones we have found in one month of
research:
  -  PutAll performance in transactional cache
<https://issues.apache.org/jira/browse/IGNITE-14076>  
  -  Not asynchronous tcp connection
<https://issues.apache.org/jira/browse/IGNITE-13997>  
  - The maximum transaction size of 1GB we are discussing in this thread

don't really help to go for Ignite, at least in our kind of project.

But we would still like to do more tests to be 100% sure about our decision,
that's why I'd like to ask you:
  - Should I get a better performance in a multi-node cluster?
Read/Write/Both?
  - Should I do the tests in a different way?

Thanks in advance!




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Long transaction suspended

In reply to this post by ilya.kasnacheev
Hello!

I'm sorry hearing that.
Would you think it could be fixed to reach these 2GB? Currently it's only
1GB in the C++ Thin Client

Regards



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: Long transaction suspended

In reply to this post by Zhenya Stanilovsky
Great!... I'm really looking forward it :)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

In reply to this post by jjimeno
I’ve not been following this thread closely, so I apologise if I’ve missed something.

>  - Should I get a better performance in a multi-node cluster? Read/Write/Both?

As per the documentation:

“Ignite is designed and optimized for distributed computing scenarios. Deploy and benchmark a multi-node cluster rather than a single-node one.” (https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips)

So yes, all else being equal, more nodes will give you better performance. Keeping the volume of data the same, doubling the number of nodes will roughly halve the number of reads/writes going to each node. But even there, things like the use of transactions and thin clients will limit your throughput to well below what Ignite is capable of “flat out."

Without analysing your architecture it’s difficult to give specific advice, but best write performance is achieved with many nodes, fast disks, JVM tuning and thick clients using the data streamer API.

Regards,
Stephen

> On 11 Feb 2021, at 07:08, jjimeno <[hidden email]> wrote:
>
> Hello!
>
> That's exactly the reason why we would prefer to choose Ignite over RocksDB.
> Otherwise, we will have to implement scalability by ourselves and, believe
> me, that's not something we would like to do.
>
> We also know they're not directly comparable. We would agree to pay the
> price for scalability with slightly worse performance but, based on our
> tests, it's too big.
>
> For instance:
>  - Single node cluster in the same host as the application (no
> communication over the wire, trying to get closer to an embedded database)
>  - A single user (no multiple users working either on the application or
> the database)
>
> A transactional commit with 1.8M keys and 1GB in size takes 97 seconds with
> NO persistence, and this time is doubled if persistence is enabled.  RocksDB
> takes around 100 seconds to perform a transaction with 4M keys and 4GB in
> size, persistence included.  As you can see, there is a huge difference.
>
> On the other hand, limitations like the ones we have found in one month of
> research:
>  -  PutAll performance in transactional cache
> <https://issues.apache.org/jira/browse/IGNITE-14076>  
>  -  Not asynchronous tcp connection
> <https://issues.apache.org/jira/browse/IGNITE-13997>  
>  - The maximum transaction size of 1GB we are discussing in this thread
>
> don't really help to go for Ignite, at least in our kind of project.
>
> But we would still like to do more tests to be 100% sure about our decision,
> that's why I'd like to ask you:
>  - Should I get a better performance in a multi-node cluster?
> Read/Write/Both?
>  - Should I do the tests in a different way?
>
> Thanks in advance!
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


jjimeno jjimeno
Reply | Threaded
Open this post in threaded view
|

Re: Long transaction suspended

Hi, thanks for pointing it out

This confirms our tests... moving from a single-node cluster to a two-nodes
one dropped the read timings to less than the half!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
12