Input data is no significant change in multi-threading

classic Classic list List threaded Threaded
11 messages Options
woo charles woo charles
Reply | Threaded
Open this post in threaded view
|

Input data is no significant change in multi-threading

When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?





Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <[hidden email]> wrote:
When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?








--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
woo charles woo charles
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2 client program) to input data as they use the same queue in remote nodes?

2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <[hidden email]> wrote:
When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?








--
Best regards,
Andrey V. Mashenkov

Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

It may have effect if you prepare data for streamer (call addData) slowly and it is possible to utilize more resources for it. Of course remote nodes should be able to bear pressure of data.
Performance can increased, but usually slightly as network will be a bottleneck.


On Wed, Apr 19, 2017 at 12:29 PM, woo charles <[hidden email]> wrote:
Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2 client program) to input data as they use the same queue in remote nodes?

2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <[hidden email]> wrote:
When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?








--
Best regards,
Andrey V. Mashenkov




--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
woo charles woo charles
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

2017-04-19 18:33 GMT+08:00 Andrey Mashenkov <[hidden email]>:
It may have effect if you prepare data for streamer (call addData) slowly and it is possible to utilize more resources for it. Of course remote nodes should be able to bear pressure of data.
Performance can increased, but usually slightly as network will be a bottleneck.


On Wed, Apr 19, 2017 at 12:29 PM, woo charles <[hidden email]> wrote:
Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2 client program) to input data as they use the same queue in remote nodes?

2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <[hidden email]> wrote:
When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?








--
Best regards,
Andrey V. Mashenkov




--
Best regards,
Andrey V. Mashenkov

Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Hi Woo,

AddData() add entry to one of local buffer according to key affinity, then buffer is sent to server node that is primary for all buffer keys.

On Thu, Apr 20, 2017 at 8:16 AM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

2017-04-19 18:33 GMT+08:00 Andrey Mashenkov <[hidden email]>:
It may have effect if you prepare data for streamer (call addData) slowly and it is possible to utilize more resources for it. Of course remote nodes should be able to bear pressure of data.
Performance can increased, but usually slightly as network will be a bottleneck.


On Wed, Apr 19, 2017 at 12:29 PM, woo charles <[hidden email]> wrote:
Is that mean the performance of input data will not affect if I use 2 IgniteDataStreamer(2 client program) to input data as they use the same queue in remote nodes?

2017-04-19 10:02 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

IgniteDataStreamer uses per node buffer to make bulk cache updates that shows much better throughput than single updates.
Also, IgniteDataStreamer send jobs to remote nodes, to utilize multiple threads on remote nodes.

In multi-node grid IgniteDataStreamer usually shows better results than single updates in from multiple threads.



On Wed, Apr 19, 2017 at 4:30 AM, woo charles <[hidden email]> wrote:
When I try to input data(80 table, each 10000 records) to a cluster with 3 server node(each 2 gb), it only has a small change in time if multi thread is performed
(ie. max decrease from 8s to 6.5s if using IgniteCache)

Is it normal?

Also, I found that multi thread do not affect the data input speed in IgniteDataStreamer.

Is it true?








--
Best regards,
Andrey V. Mashenkov




--
Best regards,
Andrey V. Mashenkov




--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
dsetrakyan dsetrakyan
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

In reply to this post by woo charles

On Wed, Apr 19, 2017 at 10:16 PM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

addData() will buffer the data on the client side. As a matter of fact, there are multiple buffers on the client side, which each buffer associated with some server node.

Ignite will never send the data to a random node. The data is always sent exactly to the node where it will be cached.

D. 
woo charles woo charles
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

If the data is buffered in client side, the bottleneck should be also in client side.
If I use 2 programs to input same set data, it should be a significant change in data input time.
Is it right?

2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <[hidden email]>:

On Wed, Apr 19, 2017 at 10:16 PM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

addData() will buffer the data on the client side. As a matter of fact, there are multiple buffers on the client side, which each buffer associated with some server node.

Ignite will never send the data to a random node. The data is always sent exactly to the node where it will be cached.

D. 

Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Hi Woo,

DataStreamer is designed to fill cache with maximum throughput. By default, streamer will not rewrite cache data, until allowOverwite option is set.

Why you need to input same set of data? Why do you expected data input time will change significantly with 2 programs compared to 1 if data set is put twice?
Or I missed smth?

Anyway, if you do not get a speed up but you sure you should, then a bottleneck have to be found at first.

On Fri, Apr 21, 2017 at 5:02 AM, woo charles <[hidden email]> wrote:
If the data is buffered in client side, the bottleneck should be also in client side.
If I use 2 programs to input same set data, it should be a significant change in data input time.
Is it right?

2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <[hidden email]>:

On Wed, Apr 19, 2017 at 10:16 PM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

addData() will buffer the data on the client side. As a matter of fact, there are multiple buffers on the client side, which each buffer associated with some server node.

Ignite will never send the data to a random node. The data is always sent exactly to the node where it will be cached.

D. 




--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
woo charles woo charles
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Same data  set mean that I separate original data into 2 parts & input them from 2 separate programs.
E.g. a data set with id  1 - 100. Program A input id 1-50. Program B input 51 - 100.

2017-04-21 17:24 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

DataStreamer is designed to fill cache with maximum throughput. By default, streamer will not rewrite cache data, until allowOverwite option is set.

Why you need to input same set of data? Why do you expected data input time will change significantly with 2 programs compared to 1 if data set is put twice?
Or I missed smth?

Anyway, if you do not get a speed up but you sure you should, then a bottleneck have to be found at first.

On Fri, Apr 21, 2017 at 5:02 AM, woo charles <[hidden email]> wrote:
If the data is buffered in client side, the bottleneck should be also in client side.
If I use 2 programs to input same set data, it should be a significant change in data input time.
Is it right?

2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <[hidden email]>:

On Wed, Apr 19, 2017 at 10:16 PM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

addData() will buffer the data on the client side. As a matter of fact, there are multiple buffers on the client side, which each buffer associated with some server node.

Ignite will never send the data to a random node. The data is always sent exactly to the node where it will be cached.

D. 




--
Best regards,
Andrey V. Mashenkov

Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: Input data is no significant change in multi-threading

Hi Woo,

It may be reasonable, if you see, nodes resources utilization is too low and rising per-node-buffer size have no effect (that means you prepare data for nodes too slow).
Of course, you should check first if network isn't a bottleneck.

On Tue, Apr 25, 2017 at 10:08 AM, woo charles <[hidden email]> wrote:
Same data  set mean that I separate original data into 2 parts & input them from 2 separate programs.
E.g. a data set with id  1 - 100. Program A input id 1-50. Program B input 51 - 100.

2017-04-21 17:24 GMT+08:00 Andrey Mashenkov <[hidden email]>:
Hi Woo,

DataStreamer is designed to fill cache with maximum throughput. By default, streamer will not rewrite cache data, until allowOverwite option is set.

Why you need to input same set of data? Why do you expected data input time will change significantly with 2 programs compared to 1 if data set is put twice?
Or I missed smth?

Anyway, if you do not get a speed up but you sure you should, then a bottleneck have to be found at first.

On Fri, Apr 21, 2017 at 5:02 AM, woo charles <[hidden email]> wrote:
If the data is buffered in client side, the bottleneck should be also in client side.
If I use 2 programs to input same set data, it should be a significant change in data input time.
Is it right?

2017-04-21 6:46 GMT+08:00 Dmitriy Setrakyan <[hidden email]>:

On Wed, Apr 19, 2017 at 10:16 PM, woo charles <[hidden email]> wrote:
When I call addData() in streamer. this data will send & buffer in server node. is that correct?
If I correct, this data will buffer in random server node or only the one it directly connected?

addData() will buffer the data on the client side. As a matter of fact, there are multiple buffers on the client side, which each buffer associated with some server node.

Ignite will never send the data to a random node. The data is always sent exactly to the node where it will be cached.

D. 




--
Best regards,
Andrey V. Mashenkov




--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.