Re: Query performance

classic Classic list List threaded Threaded
23 messages Options
12
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

So what is your question?

Sergi

2015-08-17 21:38 GMT+03:00 javadevmtl <[hidden email]>:
Running 1.3.0

I start my cache as follows...

IgniteConfiguration igniteCfg = new IgniteConfiguration();
igniteCfg.setMarshaller(new OptimizedMarshaller(true));
igniteCfg.setPeerClassLoadingEnabled(false);

CacheConfiguration<Long, Trx> myCfg = new CacheConfiguration<>("cache");
myCfg.setCacheMode(CacheMode.PARTITIONED);
myCfg.setAtomicityMode(CacheAtomicityMode.ATOMIC);
myCfg.setMemoryMode(CacheMemoryMode.OFFHEAP_TIERED);
myCfg.setOffHeapMaxMemory(64 * 1024L * 1024L * 1024L);
myCfg.setIndexedTypes(Long.class, com.xxx.model.Trx.class);
myCfg.setStartSize(50 * 1024 * 1024);
myCfg.setBackups(0);


ignite = Ignition.start(igniteCfg);
cache = ignite.getOrCreateCache(myCfg).withAsync();

Then in my web request handler...

Trx trx = new Trx();
// set the the properties...

cache.put(trx.getStrId(), trx);
cache.future().listen(putFut -> {

        String sql = "SELECT someField FROM Trx WHERE myField = ?";

        QueryCursor<List&lt;?>> cursor = cache.query(new
SqlFieldsQuery(sql).setArgs(trx.getMyField()));

        String res = "" + cursor.getAll();

        // Get back to vertx context
        context.runOnContext((v) -> {
                myHandler.reply(new JsonObject().put("result", res));
        });
});

The put give us about 9K puts per seconf from 0-10 million records with
average put latency of 4ms (this include network time)

Once I add the query it starts off at 3K request per second and slowly
decreases with average response time of 45ms (includes network time)









--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Query-performance-tp1004.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Well I would not expect SQL query to be that slow...

A simple cache.put allow up to almost 9K puts/sec  up to about 10 millions records per node. So from 0 records up to 20 Millions records (2 nodes, no backups) I can get that performance.

Now a simple query on a field...

select someField from someObject where anotherField = ? it is slow on just a few thousand records.

My logic is as follows...
1- Receive request.
2- Put to cache
3- Query
4- Return response.

So the query from 0 records up to 30K records is slow.

The query most of the time will return 1 record. So I would doubt that ignite would be that slow on the select described above. I would expect slowdown after a few million records inserted.




Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

How many nodes do you have in cluster? Do you have index on a field which you use in query?

Of course SQL query is a much more expensive operation than cache put because it runs on a whole cluster while put on partitioned cache touches only primary and backup nodes.

The basic pattern is: try to use cache put/get operations as much as possible (batch operations like putAll/getAll are even better), SQL queries are mainly intended for analytical purposes, but not for selecting a single value by key.

Sergi



2015-08-17 22:31 GMT+03:00 javadevmtl <[hidden email]>:
Well I would not expect SQL query to be that slow...

A simple cache.put allow up to almost 9K puts/sec  up to about 10 millions
records per node. So from 0 records up to 20 Millions records (2 nodes, no
backups) I can get that performance.

Now a simple query on a field...

select someField from someObject where anotherField = ? it is slow on just a
few thousand records.

My logic is as follows...
1- Receive request.
2- Put to cache
3- Query
4- Return response.

So the query from 0 records up to 30K records is slow.

The query most of the time will return 1 record. So I would doubt that
ignite would be that slow on the select described above. I would expect
slowdown after a few million records inserted.








--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Query-performance-tp1009p1010.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Hi, thanks

2 nodes as indicated in the original thread.

I'm pretty sure I have @QuerySqlField(index = true)


But I would still expect a specific constant response time up to a certain amount of records instead of off the start line the query time degrades from 0-30K records there something going on...

I need to do count(distinct someField) where another field = ? It's doable using put/get but then I would need to keep track of individual keys for each count I want to do. So for a single entry I need 18 different keys to track unique counts for each property.

Now if we can do analytics style queries using indexes in real time that would be huge! Especially in the couple milliseconds!

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

I would assume though that if H2 Hash index was used it would make huge difference?
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

I don't think performance of the index is a problem here. BTW, Ignite does not support hash indexes currently.

If you have 2 nodes with partitioned cache and 0 backups it means, that you will need 0 or 1 network round trip
per put depending on whether your key is local to your current node by affinity function or not (=0.5 in avg).
When you are adding query you will always have 1 additional round trip per operation (=1.5 in avg),
 also queries have locking interference with puts, so your 3k vs 9k requests with increased latencies
look more or less valid to me.

Again, it does not make sense to compare performance of queries to performance of cache put operation.

But if you will run query with REPLICATED cache, you will have another picture: all the queries will run
just locally, but every put has to touch every remote node. I'd suggest to try this setup with and without queries as well.

Sergi



2015-08-18 0:33 GMT+03:00 javadevmtl <[hidden email]>:
I would assume though that if H2 Hash index was used it would make huge
difference?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Query-performance-tp1009p1015.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

What does lock interferance mean? That the query will block the write and the write will only "commit" once query has finished?

Also is it normal that a query takes 40ms on 30,000 records. i would assume even with and without index it would have decent speed.
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

javadevmtl wrote
Also is it normal that a query takes 40ms on 30,000 records. i would assume even with and without index it would have decent speed.
Are you just running the query once or multiple times? If you only run it once, can you run it in a loop, say 10 times, and check the execution time for every iteration? The reason is that the Hotspot JVM needs to warm up before  you start timing your benchmarks.
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Basically I launch my "web handler" which starts a new cache and then I use Jmeter to connect to it and it make a few thousand requests to it, so I make at least 30K

So cache starts at 0 records and I go all the way up to about 50K to 60K records before I just stop it (cause there's no point). The performance just degrades from the beginning. It peeks at 3K per second and then slows down to nothing.

A simple WHERE field = ? query should be decently fast even if it's not a get and even if there is a hop. I would expect it to be constant speed up to a certain amount of records.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Should I put up a github so you guys can test it?
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

javadevmtl wrote
Should I put up a github so you guys can test it?
Yes, this is a good idea. This will be the fastest way for us to provide suggestions.
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

It's pretty straight forward Http.java has all the logic

There is also sample JMeter script in the test/resources folder...

All tests right now are done on single node...
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Could you please isolate your query performance test without using any external tools like JMeter?
Preferably it should be a simple class with main method.

Sergi

2015-08-18 22:22 GMT+03:00 javadevmtl <[hidden email]>:
It's pretty straight forward Http.java has all the logic

There is also sample JMeter script in the test/resources folder...

All tests right now are done on single node...



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Query-performance-tp1009p1041.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Humm ok...

I supose I can do a bunch of insert to initialize the cache and then do a bunch of selects?

The point is a do a insert and then select...
javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Ok it's a single main now, but I canot reproduce the issue. It's because this is doing all executions serially, while in a web app there is parallel write and queries going on from multiple web requests.

https://github.com/javadevmtl/ignite-for-github

With 3,000,000 records inserted the avg query speed is 112000 nanoseconds which seems very acceptable.

The code is simple it randomly generates some string to index in the model then it randomly picks a key uses that key to get a string and then executes an SQL query based on that indexed field.

1- Is it possible in the web scenario that there is multiple threads trying to write to the cache at the same time and at the same time to read there some contention of resources?

2- The other idea I have is to convert my model into a key/value type model. This can work but I required far more RAM to do the work. Since i have to store each property of my model as it's own key/value.

Any thoughts?
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

First of all it makes sense to run your test in the same conditions,
I mean you have to start 2 nodes with the same config as for your web app
(for simplicity you can do that in a single JVM) and run your operations
from the same number of threads.

Another idea came to my mind is that your query result anyways will not be exact
in case of many parallel updates. Thus may be it makes sense to issue a query not
after every update, but only if there are no threads currently running the same query.
I believe this approach can improve your throughput and response time (if queries
you are running are mostly the same).

The simplistic code for Java 8 CompletableFuture can look like:

private static final ConcurrentHashMap<String, CompletableFuture<String>> map = new CHM<>():

Trx trx = new Trx();
// set the the properties...

cache.put(trx.getStrId(), trx);
cache.future().listen(putFut -> {

        String qryKey = "query for myField = " + trx.getMyField();

        CompletableFuture fut = map.putIfAbsent(qryKey, new CompletableFuture());

        String res;

        if (fut != null) { // Such a query already running.
               res = fut.get();
        }
        else {
                String sql = "SELECT someField FROM Trx WHERE myField = ?";

                 QueryCursor<List&lt;?>> cursor = cache.query(new
                 SqlFieldsQuery(sql).setArgs(trx.getMyField()));

                 res = "" + cursor.getAll();    

                 map.remove(qryKey).complete(res);
         }

        // Get back to vertx context
        context.runOnContext((v) -> {
                myHandler.reply(new JsonObject().put("result", res));
        });
});


Sergi


2015-08-27 0:16 GMT+03:00 javadevmtl <[hidden email]>:
Ok it's a single main now, but I canot reproduce the issue. It's because this
is doing all executions serially, while in a web app there is parallel write
and queries going on from multiple web requests.

https://github.com/javadevmtl/ignite-for-github

With 3,000,000 records inserted the avg query speed is 112000 nanoseconds
which seems very acceptable.

The code is simple it randomly generates some string to index in the model
then it randomly picks a key uses that key to get a string and then executes
an SQL query based on that indexed field.

1- Is it possible in the web scenario that there is multiple threads trying
to write to the cache at the same time and at the same time to read there
some contention of resources?

2- The other idea I have is to convert my model into a key/value type model.
This can work but I required far more RAM to do the work. Since i have to
store each property of my model as it's own key/value.

Any thoughts?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Query-performance-tp1009p1149.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

By the way I never do updates... Only put.

The idea is for each new web request put the pojo in the cache and then query to see if any of the data from the new pojo exist in the cache. It's basically a cross correlation check.

So If the pojo was a "person"

For example we could say...
Given the name "John Smith", how many unique phone numbers.
Given the phone number "555-555-5555" how many unique names

So if John Smith ended up having 10 phone numbers we know this person is doing something funny.
Or vise versa if the phone 555-555-5555 has 300 names against it, we can maybe deduce something fishy going on with that number.

So my logic is simple.
1- Receive http post request
2- Put into cache
3- Execute multiple queries
4- Return count back.

javadevmtl javadevmtl
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

So 2 different web requests will
put new objects in and then run the same queries, but each web request needs to have it's own responses back.
Sergi Vladykin Sergi Vladykin
Reply | Threaded
Open this post in threaded view
|

Re: Query performance

Ok, I understand your use case, it seems that my trick with shared result will not help here.

Then lets dive a bit deeper into your use case:
What is your main concern? Throughput, response time, fault tolerance, too large data volume for a single server, etc...?
I mean you've made some decisions in your setup (like choosing 2 nodes with partitioned cache and 0 backups) and
I want to understand if your setup matches your use case at all. May be it makes sense to switch to replicated cache or something?

Sergi



2015-08-27 21:48 GMT+03:00 javadevmtl <[hidden email]>:
So 2 different web requests will
put new objects in and then run the same queries, but each web request needs
to have it's own responses back.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Re-Query-performance-tp1009p1161.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

12