Improving Get operation performance

classic Classic list List threaded Threaded
15 messages Options
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Improving Get operation performance

I am running some comparison tests (ignite vs cassandra) to check how to
improve the performance of 'get' operation. The data is fairly
straightforward. A simple Employee Object(10 odd fields), being stored as
BinaryObject in the cache as

IgniteCache<String, BinaryObject> empCache;

The cache is configured with, Write Sync Mode - FULL_SYNC, Atomicity -
TRANSACTIONAL, Backup - 1 & Persistence - Enabled

Cluster config, 3 server + 1 client node. Setup on 2 machine, server machine
(Intel(R) Xeon(R) CPU X5675  @ 3.07GHz) & client machine (Intel(R) Xeon(R)
CPU X5560  @ 2.80GHz).

Client has multiple threads(configurable) making concurrent 'get' calls.
Using 'get' on purpose due to use case requirements.

For about 500k request, i am getting a throughput of about 1500/sec. Given
all of the data is in off-heap with cache hits percentage = 100%.
Interestingly with Cassandra i am getting a similar performance, with key
Cache and limited row cache.
I've tried running with 10/20/30 threads, the performance is more/less same.

Letting the defaults for most of the Data configuration. For this test i
turned the persistence off. Ideally for get's it shouldn't really matter.
The performance is the same.

============================================
Data Regions Configured:
[19:35:58]   ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB,
persistence=false]

Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1,
state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
============================================

Additionally ran top on both the machines to check if they are hitting the
resources,
------ Server
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
14159 root      20   0   29.7g   3.2g  15216 S  10.3  4.5   1:35.69 java
14565 root      20   0   29.4g   2.9g  15224 S   8.3  4.2   1:33.41 java
13770 root      20   0   30.0g   2.9g  15184 S   6.3  4.2   1:36.99 java

----- Client
3731 root      20   0   27.8g   1.1g  15304 S 136.5  1.5   2:39.16 java

As you can see everything is well under.

Frankly, i was expecting Ignite gets to be pretty fast, given all data is in
cache. Atleast looking at this test
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
<https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing>  

Planning to run one more test tomorrow with no-persistence and setting near
cache (on heap) to see if it helps.

Let me know if you guys see any obvious configurations that should be set.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mikael Mikael
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Hi!

The numbers sound very low, I run on hardware close to yours (3 nodes
(X5660*5) and 1 client), and I get way more than 1500/sec, not sure how
much, I will have to check, but as long as you do single get's there is
not so much you can do, each get will be one roundtrip over the network,
and with single get's latency can have a huge impact, I modified my code
and most of the time I cache all get's over 100ms into a getAll and that
makes a huge impact on performance.

Not that much to change in configuration, number of backups don't have
much impact on reads (unless you do replicated of course)

I am not sure how the traffic works but if there is only one tcp
connection to each node you will not have much use for more than 3
threads I would think.

Did you read 500K unique entries or the same multiple times ?

Mikael

Den 2019-11-26 kl. 21:38, skrev Victor:

> I am running some comparison tests (ignite vs cassandra) to check how to
> improve the performance of 'get' operation. The data is fairly
> straightforward. A simple Employee Object(10 odd fields), being stored as
> BinaryObject in the cache as
>
> IgniteCache<String, BinaryObject> empCache;
>
> The cache is configured with, Write Sync Mode - FULL_SYNC, Atomicity -
> TRANSACTIONAL, Backup - 1 & Persistence - Enabled
>
> Cluster config, 3 server + 1 client node. Setup on 2 machine, server machine
> (Intel(R) Xeon(R) CPU X5675  @ 3.07GHz) & client machine (Intel(R) Xeon(R)
> CPU X5560  @ 2.80GHz).
>
> Client has multiple threads(configurable) making concurrent 'get' calls.
> Using 'get' on purpose due to use case requirements.
>
> For about 500k request, i am getting a throughput of about 1500/sec. Given
> all of the data is in off-heap with cache hits percentage = 100%.
> Interestingly with Cassandra i am getting a similar performance, with key
> Cache and limited row cache.
> I've tried running with 10/20/30 threads, the performance is more/less same.
>
> Letting the defaults for most of the Data configuration. For this test i
> turned the persistence off. Ideally for get's it shouldn't really matter.
> The performance is the same.
>
> ============================================
> Data Regions Configured:
> [19:35:58]   ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB,
> persistence=false]
>
> Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1,
> state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
> ============================================
>
> Additionally ran top on both the machines to check if they are hitting the
> resources,
> ------ Server
> PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 14159 root      20   0   29.7g   3.2g  15216 S  10.3  4.5   1:35.69 java
> 14565 root      20   0   29.4g   2.9g  15224 S   8.3  4.2   1:33.41 java
> 13770 root      20   0   30.0g   2.9g  15184 S   6.3  4.2   1:36.99 java
>
> ----- Client
> 3731 root      20   0   27.8g   1.1g  15304 S 136.5  1.5   2:39.16 java
>
> As you can see everything is well under.
>
> Frankly, i was expecting Ignite gets to be pretty fast, given all data is in
> cache. Atleast looking at this test
> https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
> <https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing>
>
> Planning to run one more test tomorrow with no-persistence and setting near
> cache (on heap) to see if it helps.
>
> Let me know if you guys see any obvious configurations that should be set.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

It's 500k unique gets, spread across multiple threads. Max i tried with 30
threads.

I cant use getAll for this usecase, since it is user driven and the user
will load one record at a time. In any case i expected event the single gets
to be pretty fast as well. Given the benchmark reference -
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing

There too the code seems to be using a single get. But the throughput is
massive for 32 threads its about 120k. So now i am not sure if the numbers
listed are accurate or was the test done in a controlled setting with
additional configurations.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Hello Viktor,

The benchmarks you're referring to are real and list all the configuration parameters as well as the source code. No cheating. 

The first catchy difference between your and those benchmarks is that you're using TRANSACTIONAL mode for Ignite. This involves a 2-phase-commit protocol making TRANSACTIONAL gets slower than ATOMIC gets. Plus, if there is a chance your benchmark queries similar keys in parallel then some of the Threads will be blocked until the locked keys are released. So, check for ATOMIC caches or, to make benchmark fair, use lightweight transactions of Cassandra.

Also, I would look into the following areas:

-
Denis


On Tue, Nov 26, 2019 at 3:00 PM Victor <[hidden email]> wrote:
It's 500k unique gets, spread across multiple threads. Max i tried with 30
threads.

I cant use getAll for this usecase, since it is user driven and the user
will load one record at a time. In any case i expected event the single gets
to be pretty fast as well. Given the benchmark reference -
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing

There too the code seems to be using a single get. But the throughput is
massive for 32 threads its about 120k. So now i am not sure if the numbers
listed are accurate or was the test done in a controlled setting with
additional configurations.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Hi Viktor,

It looks like you're running 3 server nodes on the same physical machine, right? How do you run Cassandra benchmarks? Do you use the same 2 machines? How many Cassandra instances do you have?

Also, how much memory do you have on this machine? 

Best Regards,
Evgenii

ср, 27 нояб. 2019 г. в 10:29, Denis Magda <[hidden email]>:
Hello Viktor,

The benchmarks you're referring to are real and list all the configuration parameters as well as the source code. No cheating. 

The first catchy difference between your and those benchmarks is that you're using TRANSACTIONAL mode for Ignite. This involves a 2-phase-commit protocol making TRANSACTIONAL gets slower than ATOMIC gets. Plus, if there is a chance your benchmark queries similar keys in parallel then some of the Threads will be blocked until the locked keys are released. So, check for ATOMIC caches or, to make benchmark fair, use lightweight transactions of Cassandra.

Also, I would look into the following areas:

-
Denis


On Tue, Nov 26, 2019 at 3:00 PM Victor <[hidden email]> wrote:
It's 500k unique gets, spread across multiple threads. Max i tried with 30
threads.

I cant use getAll for this usecase, since it is user driven and the user
will load one record at a time. In any case i expected event the single gets
to be pretty fast as well. Given the benchmark reference -
https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing

There too the code seems to be using a single get. But the throughput is
massive for 32 threads its about 120k. So now i am not sure if the numbers
listed are accurate or was the test done in a controlled setting with
additional configurations.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

In reply to this post by dmagda
Thanks Denis for confirming the benchmarks are real.

I am using the latest ignite version i.e. 2.6.7.

I tried with Atomic as well, don't see much variation. Marginal changes. So
currently, in my test,

I am using
<ignite>/examples/config/persistentstore/example-persistent-store.xml, with
persistence disabled.
And starting my 3 server nodes simply via <ignite>/bin/ignite.sh
example-persistent-store.xml

Client uses the same config xml.

As for threads sharing the same key, no that is not a possibility. My daemon
thread iterates over all keys in a loop and every key is handed over to a
threadpool executor. So no 2 threads would get the same key.

For now, i am keeping Cassandra aside. Since my first goal is to atleast see
comparable performance numbers for "get", which is the primary reason to
evaluate Ignite.

After my initial tests, i had run perf test to check the network throughput
between the 2 boxes, and it was around 1GB/s.

So now as part of my next test, i am going to try moving my client to the
same box as the server, getting network related issue out of play and see if
it scales. Additionally try adding the applicable jvm properties you
suggested.

With this, the 2 primary reasons for performance dependencies are out,
network and disk. Everything should be in memory and on the same box.

I am not allocating any heap, and since this is primarily a 'get' as against
'query' test, we should be ok, i suppose. But let me know if heap allocation
is needed. The benchmark test did not mention that.

Lastly, here is the basic client code i am using,

// ======== configuration
Ignition.setClientMode(true);
               
ignite = Ignition.start("/home/example-persistent-store.xml");
ignite.cluster().active(true);
               
CacheConfiguration<String, BinaryObject> cacheConfig = new
CacheConfiguration<>("empCache");
cacheConfig.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfig.setBackups(1);
cacheConfig.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC);
cacheConfig.setIndexedTypes(String.class, BinaryObject.class);
cacheConfig.setSqlSchema("PUBLIC");
cacheConfig.setStatisticsEnabled(true);
               
QueryEntity queryEntity = new QueryEntity();
queryEntity.setValueType("employee");
queryEntity.setKeyType(String.class.getName());
           
LinkedHashMap<String, String> fields = new LinkedHashMap<>();
fields.put(Employee.FIELD_ID, String.class.getName());
fields.put(Employee.FIELD_NAME, String.class.getName());
fields.put(Employee.FIELD_DESIGNATION, String.class.getName());
fields.put(Employee.FIELD_EXPERIENCE, Integer.class.getName());
fields.put(Employee.FIELD_PHONE, Long.class.getName());
fields.put(Employee.FIELD_ISPERMANANT, Boolean.class.getName());
fields.put(Employee.FIELD_DEPARTMENTS, byte[].class.getName());
fields.put(Employee.FIELD_JOININGDATE, Timestamp.class.getName());
fields.put(Employee.FIELD_SALARY, Double.class.getName());

queryEntity.setFields(fields);
queryEntity.setIndexes(Arrays.asList(
        new QueryIndex(Employee.FIELD_ID),
        new QueryIndex(Employee.FIELD_NAME),
        new QueryIndex(Employee.FIELD_DESIGNATION),
        new QueryIndex(Employee.FIELD_EXPERIENCE),
        new QueryIndex(Employee.FIELD_PHONE),
        new QueryIndex(Employee.FIELD_JOININGDATE),
        new QueryIndex(Employee.FIELD_SALARY)
));
           
cacheConfig.setQueryEntities(Arrays.asList(queryEntity));
empCache = ignite.getOrCreateCache(cacheConfig).withKeepBinary();

//=====================
// Get
public Employee get(UUID id) throws Exception {
        BinaryObject empBinary = empCache.get(id.toString());
        if (empBinary == null) System.out.println("Employee not found for Id[" +
id.toString() + "]");
        return retrieveEmployee(empBinary);
}



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

In reply to this post by ezhuravlev
Yes, ran Cassandra on the same box. Similar config, 3 nodes on one box and
client on another. Have about 75G on both boxes.

However for now, i am keeping Cassandra aside, since my primary goal around
evaluating Ignite is to see similar performance numbers for "get" as seen in
the benchmark.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Victor,

Then, I would recommend to check if you have a swapping enabled in OS. If you have only 75gb on the machine and you started 3 nodes with 14 gb off heap and something like a 16gb heap each, probably OS started a swapping process, which will affect a performance.

Additionally, there is no need to run more than one Ignite node per physical machine, you can use 3 smaller machine instead or start one instance on this machine and give more memory to it.

Evgenii

ср, 27 нояб. 2019 г. в 11:33, Victor <[hidden email]>:
Yes, ran Cassandra on the same box. Similar config, 3 nodes on one box and
client on another. Have about 75G on both boxes.

However for now, i am keeping Cassandra aside, since my primary goal around
evaluating Ignite is to see similar performance numbers for "get" as seen in
the benchmark.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Performed one more test. Moved the client on the same box, and changed the
off & on heap values.

The Employee record is barely about 75-100bytes. So 500k records would just
range between 40-50mb + 1 backup, so another 40-50mb, so about 100mb worth
of data.

I set the off-heap to 1GB and -Xmx to 1GB as well. Here is what the topology
looks like,

Topology snapshot [ver=4, locNode=faaf52cf, servers=3, clients=1,
state=ACTIVE, CPUs=24, offheap=4.0GB, heap=4.0GB]

With 8GB cluster(on+off heap) swapping shouldn't really happen anymore

Still the throughput is around 2000/s. Which i feel is largely due to no
network hop. But this still is woefully slow, nowhere close to the benchmark
numbers.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Hello!

I don't understand why the network hop is relevant here, if you are (supposedly) running those gets in parallel.

Regards,
--
Ilya Kasnacheev


чт, 28 нояб. 2019 г. в 04:09, Victor <[hidden email]>:
Performed one more test. Moved the client on the same box, and changed the
off & on heap values.

The Employee record is barely about 75-100bytes. So 500k records would just
range between 40-50mb + 1 backup, so another 40-50mb, so about 100mb worth
of data.

I set the off-heap to 1GB and -Xmx to 1GB as well. Here is what the topology
looks like,

Topology snapshot [ver=4, locNode=faaf52cf, servers=3, clients=1,
state=ACTIVE, CPUs=24, offheap=4.0GB, heap=4.0GB]

With 8GB cluster(on+off heap) swapping shouldn't really happen anymore

Still the throughput is around 2000/s. Which i feel is largely due to no
network hop. But this still is woefully slow, nowhere close to the benchmark
numbers.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Not sure i follow. The data is on server node/s. Even for a single/multiple
requests, 'get' from a client will need to make a n/w round trip if server
and client are on different boxes vs both being on the same box. So n/w
latency becomes quite relevant.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Hello!

Not if there's enough parallelism, since nodes are not busy while requests do round trips.

I recommend gathering jstack stack traces from all nodes, seeing what threads are up to.

Regards,
--
Ilya Kasnacheev


чт, 28 нояб. 2019 г. в 21:58, Victor <[hidden email]>:
Not sure i follow. The data is on server node/s. Even for a single/multiple
requests, 'get' from a client will need to make a n/w round trip if server
and client are on different boxes vs both being on the same box. So n/w
latency becomes quite relevant.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ezhuravlev ezhuravlev
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

As you have 4 nodes on the same machine now, you have a lot of context
switching, probably all the nodes just competing for CPU resources with each
other.

Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Victor Victor
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

In reply to this post by ilya.kasnacheev
Update,

1. So there were 2 issues, there was old batch processing app that
periodically ran, that loaded lot of data in memory. Which i think was
causing some memory contention. So i shut that down for me tests.

2. Thread dumps showed some odd wait times between 2 get calls. I had
overtly complicated my client. which did a bunch of thing, so i commented
out most of it, kept the load to thread distribution simple.

With these changes, my get numbers looked good.

for 10 threads i got about 30k/sec for 500k requests.
for 30 threads i got about 71k/sec for 1M requests.

Thanks for all the help with troubleshooting this.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Improving Get operation performance

Good to hear that you got to the root cause, Viktor!

Do you have any suggestions for extra performance/troubleshooting tips/tricks that you had to learn hard way as long as the information was not documented?

-
Denis


On Fri, Dec 6, 2019 at 4:11 AM Victor <[hidden email]> wrote:
Update,

1. So there were 2 issues, there was old batch processing app that
periodically ran, that loaded lot of data in memory. Which i think was
causing some memory contention. So i shut that down for me tests.

2. Thread dumps showed some odd wait times between 2 get calls. I had
overtly complicated my client. which did a bunch of thing, so i commented
out most of it, kept the load to thread distribution simple.

With these changes, my get numbers looked good.

for 10 threads i got about 30k/sec for 500k requests.
for 30 threads i got about 71k/sec for 1M requests.

Thanks for all the help with troubleshooting this.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/