Force flush of IGFS to secondary file system in DUAL_ASYNC mode

classic Classic list List threaded Threaded
3 messages Options
Juan Rodríguez Hortalá Juan Rodríguez Hortalá
Reply | Threaded
Open this post in threaded view
|

Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Hi, 

When using IGFS with a secondary file system, with write behind configured by using DUAL_ASYNC IgfsMode, is there any way to force the flush of the data from the Ignite caches into the secondary file system? A possible scenario here might be a temporary cluster with Ignite installed, that uses IGFS with DUAL_ASYNC to write to an HDFS cluster running in a permanent cluster that is configured as the secondary file system. In order to be able to shutdown this cluster we need to know that all the data has been flushed to HDFS or we might have data loss. For what I see in http://apache-ignite-users.70518.x6.nabble.com/Flush-the-cache-into-the-persistence-store-manually-td5077.html this wasn't available at the time that question was answered. The solution proposed there seems to be traversing the cache writing each cached entry to the data store that is cached. But for IGFS I understand that is not so straightforward, because the dataCache and metadataCache used by IGFS don't store the HDFS files directly, but the result of splitting them into pieces. 

Is there any way to flush the data from IGFS into HDFS? If not, is there any recommendation about how we could traverse the dataCache and metadataCache used by IGFS to manually write the data into HDFS? If we do that traversal, is there any way to avoid the async writes of IGFS and the write done in that traversal to interfere with each other, or lead to duplicate writes?

Thanks a lot for your help!

Juan Rodriguez Hortala
ilya.kasnacheev ilya.kasnacheev
Reply | Threaded
Open this post in threaded view
|

Re: Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Hello!

After reviewing IGFS code, I think that you can do the following:

You should save all file paths that are being migrated, and then call
await(collectionWithAllFilePaths) on IgfsImpl. If it's a huge number of
files, I imagine you can do this in batches.

It will do the same synchronous wait that DUAL_SYNC would do, just from a
different entry point. After await() returns you are safe to close IgfsImpl
and shutdown your cluster.

Note that I would like to have the same behaviour for IgfsImpl.close(cancel:
false), but it's NOT there yet. I have filed
https://issues.apache.org/jira/browse/IGNITE-7356 - do not hesitate to
comment.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Juan Rodríguez Hortalá Juan Rodríguez Hortalá
Reply | Threaded
Open this post in threaded view
|

Re: Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Hi llya, 

Thanks a lot for the detailed answer. It's nice to know there is a clear path to achieve that flush.

Greetings, 

Juan 

On Mon, Jan 8, 2018 at 4:33 AM, ilya.kasnacheev <[hidden email]> wrote:
Hello!

After reviewing IGFS code, I think that you can do the following:

You should save all file paths that are being migrated, and then call
await(collectionWithAllFilePaths) on IgfsImpl. If it's a huge number of
files, I imagine you can do this in batches.

It will do the same synchronous wait that DUAL_SYNC would do, just from a
different entry point. After await() returns you are safe to close IgfsImpl
and shutdown your cluster.

Note that I would like to have the same behaviour for IgfsImpl.close(cancel:
false), but it's NOT there yet. I have filed
https://issues.apache.org/jira/browse/IGNITE-7356 - do not hesitate to
comment.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/