Loading cache with DataStreamer

classic Classic list List threaded Threaded
8 messages Options
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Loading cache with DataStreamer

I am using apache ignite version 1.6. Assuming datastreamer is a better approach to bulk load data performance wise I am trying to execute the below code that gets hanged

        IgniteDataStreamer<String, ProductDetails> streamer = ignite.dataStreamer(cacheName);

        List<IgniteFuture> futures = Lists.newArrayList();

        for (long i=1; i<= 100000; i++) {
            ProductDetails phone = new ProductDetails("phone" + i);
            //set other properties
 
            futures.add(streamer.addData(phone.getName(), phone));

        }

        //wait for future objects
        for (IgniteFuture f : futures) {
            f.get();
        }

What am I missing in the above example?

Thanks,
Amit.
visagan visagan
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

 You should probably do this:      
       IgniteDataStreamer<String, ProductDetails> streamer = ignite.dataStreamer(cacheName);
       for (long i=1; i<= 100000; i++) {
            ProductDetails phone = new ProductDetails("phone" + i);
            //set other properties
            streamer.addData(phone.getName(), phone);
        }
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

I tried that but instead of loading 100000 entries it loaded only 99330. Doesn't datastreamer do a parallel load through multiple threads and hence return a future object that we need to wait on?

Thanks!
visagan visagan
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

The Streamer actually buffers the data. Buffer Default Size is 1024, either the buffer size is reached Or you set a Flush Frequency for the buffer, it does not deliver the data to the nodes.
And it does a parallel load with multiple threads by reading the data that has been buffered.

Try Setting this property to something like 10Seconds and see if all the data you add to the streamer are flushed. By default Auto flush frequency is disabled.  
autoFlushFrequency(10000)
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

Hi Amit,

You can also close() the streamer or call flush() explicitly to make sure all the added data was added to the cache.

2016-06-04 10:57 GMT-07:00 visagan <[hidden email]>:
The Streamer actually buffers the data. Buffer Default Size is 1024, either
the buffer size is reached Or you set a Flush Frequency for the buffer, it
does not deliver the data to the nodes.
And it does a parallel load with multiple threads by reading the data that
has been buffered.

Try Setting this property to something like 10Seconds and see if all the
data you add to the streamer are flushed. By default Auto flush frequency is
disabled.
autoFlushFrequency(10000)



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Loading-cache-with-DataStreamer-tp5421p5424.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

Thanks for the replies.

Calling flush() and later close() made sure all the entries were added to the cache. I think this step should be added to the examples  since without this the code cannot assume when would the cache be completely populated with all the entries.

AutoFlushFrequency and/or the buffer size seem to be performance tuning parameters.
alexey.goncharuk alexey.goncharuk
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

Note that IgniteDataStreamer implements AutoCloseable, so the code in the example you are referring to is correct because data streamer is used in try-with-resources block. It is not required to call flush() before calling close() because close() will flush the data automatically.‚Äč
pragmaticbigdata pragmaticbigdata
Reply | Threaded
Open this post in threaded view
|

Re: Loading cache with DataStreamer

Got it. Thanks for detailing it out.