Question about memory when uploading CSV using .NET DataStreamer

classic Classic list List threaded Threaded
11 messages Options
camer314 camer314
Reply | Threaded
Open this post in threaded view
|

Question about memory when uploading CSV using .NET DataStreamer

I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ptupitsyn ptupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

Sounds nasty, can you share a reproducer please?

On Thu, Nov 14, 2019 at 10:12 AM camer314 <[hidden email]> wrote:
I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ptupitsyn ptupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

Here is what I tried:

Ran for a minute or so, 200Mb used on client, 5Gb on server, seems to work as expected to me.

On Thu, Nov 14, 2019 at 2:14 PM Pavel Tupitsyn <[hidden email]> wrote:
Sounds nasty, can you share a reproducer please?

On Thu, Nov 14, 2019 at 10:12 AM camer314 <[hidden email]> wrote:
I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Mikael Mikael
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

In reply to this post by ptupitsyn

Hi!

If each row is stored as an entry in the cache you can expect an overhead of around 200 byte per entry, so 200MB just for the actual entries (1M) not counting your data (more if you have any index).

You can control the streamer, how much data and when it should be flushed, I have no idea how this work on the .NET client though, so maybe something there, you could try and manually call flush on the streamer at intervals (this is not needed, but just to see if it makes any difference), I use a lot of streamers (from java) and have never had any problems with it so maybe it is something on the .NET side.

Mikael

Den 2019-11-14 kl. 12:14, skrev Pavel Tupitsyn:
Sounds nasty, can you share a reproducer please?

On Thu, Nov 14, 2019 at 10:12 AM camer314 <[hidden email]> wrote:
I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

In reply to this post by camer314
Hello!

Since we're in 2019, we don't recommend running any Ignite nodes with -Xmx2G (that is, 2 gigabytes of heap allowance).

It is certainly possible to run Ignite with less heap, but the reasoning of such is not very clear.

Please also note that our JDBC thin driver supports streaming, and it should be usable from .Net in some way. In this case, memory overhead is supposed to be small.

Regards,
--
Ilya Kasnacheev


чт, 14 нояб. 2019 г. в 10:12, camer314 <[hidden email]>:
I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ptupitsyn ptupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

>  Since we're in 2019, we don't recommend running any Ignite nodes with -Xmx2G (that is, 2 gigabytes of heap allowance
Does 2019 somehow allow us to consume 2Gb for nothing?
I don't think a client node needs that much.

Let's see a reproducer.
My testing shows that streaming works out of the box on client node, no custom JVM tuning or anything else required.

On Thu, Nov 14, 2019 at 4:12 PM Ilya Kasnacheev <[hidden email]> wrote:
Hello!

Since we're in 2019, we don't recommend running any Ignite nodes with -Xmx2G (that is, 2 gigabytes of heap allowance).

It is certainly possible to run Ignite with less heap, but the reasoning of such is not very clear.

Please also note that our JDBC thin driver supports streaming, and it should be usable from .Net in some way. In this case, memory overhead is supposed to be small.

Regards,
--
Ilya Kasnacheev


чт, 14 нояб. 2019 г. в 10:12, camer314 <[hidden email]>:
I have a large CSV file (50 million rows) that i wish to upload to a cache. I
am using .NET and a DataStreamer from my application which is designated as
a client only node.

What i dont understand is i quickly run out of memory on my C# streaming
(client) application while my data node (an instance of Apache.Ignite.exe)
slowly increases RAM usage but not at the rate as my client app does.

So it would seem that either (A) my client IS actually being used to cache
data or (B) there is a memory leak where data that has been sent to the
cache is not released.

As for figures, Apache.Ignite.exe when first started uses 165Mb. After
loading in 1 million records and letting it all settle down,
Apache.Ignite.exe now sits at 450Mb while my client app (the one streaming)
sits at 1.5Gb.

The total size of the input file is 5Gb so 1 million records should really
only be about 100Mb so i dont know how my client even gets to 1.5Gb to begin
with. If i comment out the AddData() then my client never gets past 200Mb so
its certainly something happening in the cache.

Is this expected behaviour? If so then i dont know how to import huge CSV
files without memory issues on the streaming machine.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
camer314 camer314
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

Here is my source file and a 1 million row CSV file.

I am not sure whats different between my code and yours but my version
quickly consumes memory on the client side for some reason.

Caveat, I am normally a Python programmer so i might have missed something
obvious...

https://wtwdeeplearning.blob.core.windows.net/ignite/Program.zip?st=2019-11-15T00%3A58%3A20Z&se=2019-11-25T00%3A58%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=IkMuGbNJ4YAp5Ko%2BmcqC5PkbSLeuUfQLegMXpj3WNQ0%3D





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
camer314 camer314
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

In my sample code i had a bit of a bug, this should be the line to add:

var _ = ldr.AddData(id++,data);

However it doesnt appear to make any difference, this is the state of memory
(with ignite.exe being my client executable). This is paused after insertion
of 1 million rows, why is my client memory usage still so high?

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled.png>

If i comment out the AddData then i get:

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled2.png>



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ptupitsyn ptupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

I've ran your code under .NET and Java memory profilers.
In short, everything is working fine - nothing to worry about.


DotMemory:
image.png

.NET managed memory usage is under 1Mb, unmanaged memory is much higher - that is what Java part allocates.

jvisualvm:
image.png
(I've clicked Perform GC - this corresponds to the last used heap drop)

As we can see, streamer usage have caused some heap allocations, but in the end it settled down to 17Mb
To put it simply, JVM reserves more memory from OS than it actually uses, so Task Manager reports high memory usage to you.




On Fri, Nov 15, 2019 at 4:45 AM camer314 <[hidden email]> wrote:
In my sample code i had a bit of a bug, this should be the line to add:

var _ = ldr.AddData(id++,data);

However it doesnt appear to make any difference, this is the state of memory
(with ignite.exe being my client executable). This is paused after insertion
of 1 million rows, why is my client memory usage still so high?

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled.png>

If i comment out the AddData then i get:

<http://apache-ignite-users.70518.x6.nabble.com/file/t2675/Untitled2.png>



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
camer314 camer314
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

Ok yes i see. Seems like with my code changes I made to provide the example
that the memory consumption is way more inline with expectations, so I guess
it was a code error on my part.

However, it seems strange that my client node, which has no cache, still
wants to hang onto over 1Gb of heap space even though its using less than
100Mb. Is there no way to release that back?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
ptupitsyn ptupitsyn
Reply | Threaded
Open this post in threaded view
|

Re: Question about memory when uploading CSV using .NET DataStreamer

I would not recommend doing so, because it may affect Ignite performance,
but you can tweak JVM to use less memory and return it to OS more frequently like this:
var cfg = new IgniteConfiguration
{
ClientMode = true,
JvmOptions = new[]{"-XX:MaxHeapFreeRatio=30", "-XX:MinHeapFreeRatio=10"},
JvmInitialMemoryMb = 100,
JvmMaxMemoryMb = 900
};



On Mon, Nov 18, 2019 at 3:42 AM camer314 <[hidden email]> wrote:
Ok yes i see. Seems like with my code changes I made to provide the example
that the memory consumption is way more inline with expectations, so I guess
it was a code error on my part.

However, it seems strange that my client node, which has no cache, still
wants to hang onto over 1Gb of heap space even though its using less than
100Mb. Is there no way to release that back?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/