IGFS backed by persistence on physical filesystem

classic Classic list List threaded Threaded
19 messages Options
Kobe Kobe
Reply | Threaded
Open this post in threaded view
|

IGFS backed by persistence on physical filesystem


I would like to use IGFS to store objects but I need IGFS content to be persisted as well on a (physical filesystem).

How do I accomplish this?

Regards,

/Kobe
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Hi,

You can use IGFS as Hadoop Accelerator. Hadoop will persist your date.
There is a starting guide for this configuration:
https://apacheignite.readme.io/docs/file-system

Regards,
Denis  
Kobe Kobe
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Thanx for your reply.

I have heard that I should not expect low latency access with HDFS. Could I back IGFS with ext4 or a clustered file system?

Kobe
Paolo Di Tommaso Paolo Di Tommaso
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Interesting question. I have the same user case. 

Is it possible to persist IGFS data on ext4 or a clustered file system (without using HDFS)? 


Cheers,
Paolo


On Sun, Nov 15, 2015 at 7:59 AM, Kobe <[hidden email]> wrote:
Thanx for your reply.

I have heard that I should not expect low latency access with HDFS. Could I
back IGFS with ext4 or a clustered file system?

Kobe



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1959.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Kobe Kobe
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

In reply to this post by Kobe
Kobe wrote
I have heard that I should not expect low latency access with HDFS. Could I back IGFS with ext4 or a clustered file system?
I ask because I suspect (and this could be my ignorance of applications of HDFS) that HDFS has overtones of Hadoop, in that its designed for large scale storage of data to make it amenable for MapReduce. I do not have a MapReduce use case for the data.

Could I use HDFS simply as a distributed filesystem without bringing in Hadoop? If not, I would like to back my IGFS with a regular filesystem, even ext3.

Regards,

/Kobe
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Kobe, Paolo,

At the moment IGFS has no adapters to interact with "regular" file systems without Hadoop. Note that Hadoop comes with LocalFileSystem [1] class which delegates to OS file system. Probably you can give it a chance and check whether it meets your requirements.

Some time ago there was a discussion to implement native IGFS -> OS adapter [2], but no final decisions were made. As were are seing growing interest to this feature probably we should resume this thread.


On Mon, Nov 16, 2015 at 5:18 AM, Kobe <[hidden email]> wrote:
Kobe wrote
> I have heard that I should not expect low latency access with HDFS. Could
> I back IGFS with ext4 or a clustered file system?

I ask because I suspect (and this could be my ignorance of applications of
HDFS) that HDFS has overtones of Hadoop, in that its designed for large
scale storage of data to make it amenable for MapReduce. I do not have a
MapReduce use case for the data.

Could I use HDFS simply as a distributed filesystem without bringing in
Hadoop? If not, I would like to back my IGFS with a regular filesystem, even
ext3.

Regards,

/Kobe




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1962.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Vladimir Ozerov wrote
Some time ago there was a discussion to implement native IGFS -> OS adapter
[2], but no final decisions were made. As were are seing growing interest
to this feature probably we should resume this thread.
Vladimir,

I think Kobe and Paolo are looking for the opposite to what we were going to achieve with Fuse. In my understanding, they just need an implementation of IgfsSecondaryFileSystem based on java.io.File API instead of Hadoop. Am I missing something?

-Val
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Yes, I thought the same. Looks like I didn't understand how FUSE works when answering.

On Mon, Nov 16, 2015 at 10:13 PM, vkulichenko <[hidden email]> wrote:
Vladimir Ozerov wrote
> Some time ago there was a discussion to implement native IGFS -> OS
> adapter
> [2], but no final decisions were made. As were are seing growing interest
> to this feature probably we should resume this thread.

Vladimir,

I think Kobe and Paolo are looking for the opposite to what we were going to
achieve with Fuse. In my understanding, they just need an implementation of
IgfsSecondaryFileSystem based on java.io.File API instead of Hadoop. Am I
missing something?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1970.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Create ticket: https://issues.apache.org/jira/browse/IGNITE-1926

Kobe, Paolo, this task doesn't seem to be very complicated. How about picking it up?

-Val
Paolo Di Tommaso Paolo Di Tommaso
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

This sound interesting. Yes, the idea here is to have a secondary file system storage alternative to Hadoop. 

My idea is a to have a IgfsSecondaryFileSystem implementation based on java.nio.file.Path. This would make it possible to use a local file system or a NFS storage via the POSIX interface (think for example to Amazon EFS)

At the same this would allow foreign file systems to be mounted at level of the JVM thanks to the JSR232 implemented in Java 7. 


Does it make sense to you? Do you think it could work in a Ignite cluster? 


Cheers,
Paolo


On Tue, Nov 17, 2015 at 3:44 AM, vkulichenko <[hidden email]> wrote:
Create ticket: https://issues.apache.org/jira/browse/IGNITE-1926

Kobe, Paolo, this task doesn't seem to be very complicated. How about
picking it up?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1975.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Ivan Veselovsky Ivan Veselovsky
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Hi, guys,
as Vladimir mentioned above, we may think about org.apache.hadoop.fs.LocalFileSystem.
The below code sample demonstrates how IGFS runs over it.
Do we really need to implement the same in Ignite?

public class LocalFsMain {

    public static void main(String[] args) throws Exception {
        IgfsSecondaryFileSystem igfsSec = new IgniteHadoopIgfsSecondaryFileSystem("file:///", null, "ivan");

        IgfsIpcEndpointConfiguration igfsIpc = new IgfsIpcEndpointConfiguration();
        igfsIpc.setType(IgfsIpcEndpointType.TCP);
        igfsIpc.setPort(10500);

        FileSystemConfiguration igfsCfg = new FileSystemConfiguration();
        igfsCfg.setDataCacheName("partitioned");
        igfsCfg.setMetaCacheName("replicated");
        igfsCfg.setName("myigfs");
        igfsCfg.setIpcEndpointConfiguration(igfsIpc);
        igfsCfg.setBlockSize(512 * 1024);
        igfsCfg.setPrefetchBlocks(1);
        igfsCfg.setDefaultMode(IgfsMode.DUAL_SYNC);
        igfsCfg.setSecondaryFileSystem(igfsSec);

        CacheConfiguration dataCacheCfg = defaultCacheConfiguration();

        dataCacheCfg.setName("partitioned");
        dataCacheCfg.setCacheMode(PARTITIONED);
        dataCacheCfg.setNearConfiguration(null);
        dataCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
        dataCacheCfg.setAffinityMapper(new IgfsGroupDataBlocksKeyMapper(128));
        dataCacheCfg.setBackups(0);
        dataCacheCfg.setAtomicityMode(TRANSACTIONAL);

        CacheConfiguration metaCacheCfg = defaultCacheConfiguration();

        metaCacheCfg.setName("replicated");
        metaCacheCfg.setCacheMode(REPLICATED);
        metaCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
        metaCacheCfg.setAtomicityMode(TRANSACTIONAL);

        TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();

        discoSpi.setIpFinder(new TcpDiscoveryVmIpFinder(true));

        IgniteConfiguration igCfg = new IgniteConfiguration();

        igCfg.setCacheConfiguration(metaCacheCfg, dataCacheCfg);
        igCfg.setDiscoverySpi(discoSpi);
        igCfg.setFileSystemConfiguration(igfsCfg);

        try (Ignite ig = Ignition.start(igCfg)) {
            IgniteFileSystem igfs = ig.fileSystem("myigfs");

            // List files:
            Collection<IgfsFile> listing = igfs.listFiles(new IgfsPath("/"));

            for (IgfsFile f: listing)
                System.out.println(f);

            // Write file:
            IgfsPath path = new IgfsPath("/tmp/foo" + System.currentTimeMillis());

            System.out.println("File: " + path);

            try (IgfsOutputStream os = igfs.create(path, true)) {
                os.write("Hello, world!".getBytes());
            }

            // Read the file:
            byte[] b = new byte[(int)igfs.info(path).length()];

            try (IgfsInputStream is = igfs.open(path)) {
                int length = is.read(b);
                assert length == b.length;

                System.out.println("Read: [" + new String(b) + "]");
            }
        }
    }

    static CacheConfiguration defaultCacheConfiguration() {
        CacheConfiguration cfg = new CacheConfiguration();

        cfg.setStartSize(1024);
        cfg.setAtomicWriteOrderMode(PRIMARY);
        cfg.setAtomicityMode(TRANSACTIONAL);
        cfg.setNearConfiguration(new NearCacheConfiguration());
        cfg.setWriteSynchronizationMode(FULL_SYNC);
        cfg.setEvictionPolicy(null);

        return cfg;
    }
Paolo Di Tommaso Paolo Di Tommaso
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

What if you have multiple nodes in a cluster using org.apache.hadoop.fs.LocalFileSystem as a secondary file system? Each node saves the IGFS content locally? Could it be used to save the data over a NFS mount ? 

Anyway I think that LocalFileSystem would introduce a dependency the Hadoop components that I would like to avoid. 


Cheers,
Paolo  



On Tue, Nov 17, 2015 at 4:19 PM, Ivan Veselovsky <[hidden email]> wrote:
Hi, guys,
as Vladimir mentioned above, we may think about
org.apache.hadoop.fs.LocalFileSystem.
The below code sample demonstrates how IGFS runs over it.
Do we really need to implement the same in Ignite?





--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1987.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Ivan Veselovsky Ivan Veselovsky
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Paolo Di Tommaso wrote
What if you have multiple nodes in a cluster using  org.apache.hadoop.fs.LocalFileSystem as a secondary file system? Each node saves the IGFS content locally?
In my understanding yes, each node that was requested to do the file operation will store the file locally.

Paolo Di Tommaso wrote
Could it be used to save the data over a NFS mount ?
LocalFileSystem runs over OS file system with "/" being the roott of the file system, so if an NFS path is mounted , say, in /mnt/nfs/, the files written under there will be shared. We may think about some kind of "chroot" there to isolate the Ignite file system from other things.
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Folks,

Looks like we're already discussing implementation details here, so I forwarded the thread to dev list. Let's continue there.

-Val
Ivan Veselovsky Ivan Veselovsky
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Valentin, my reply possibly was not enough exact, sorry, so let me bring clarity there.
If we store a file, in case of LocalFileSystem secondary Fs the file will be stored locally not on *each* Ignite node in the cluster, but *only* on the node where the operation was requested.
E.g. if a cluster consists of nodes A, B, C, we connect to node A and write file "/tmp/f", the file "/tmp/f" will be written locally only on the node where node A runs, and nowhere else.
Yes, data cache in IGFS is patitioned, so file blocks are distributed. But this is about IGFS, which in DUAL mode only plays role of intermediate layer (a cache) between secondary Fs and the client.

On Wed, Nov 18, 2015 at 12:16 AM, vkulichenko <[hidden email]> wrote:
Folks,

Looks like we're already discussing implementation details here, so I
forwarded the thread to dev list. Let's continue there.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-backed-by-persistence-on-physical-filesystem-tp1882p1992.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Ivan Veselovsky Ivan Veselovsky
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Important thing there is that in case of local underlying file system only one node on the cluster should be used to access the file system. Otherwise an incorrect behavior is possible, if, e.g. if local files with the same path but different content exist on different cluster nodes.
We may think how to prevent use cases when an incorrect behavior is possible.
Kobe Kobe
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

In reply to this post by Ivan Veselovsky
   
Hi -

I am using IGFS in the local file system mode.

[CODE]  
    IgfsSecondaryFileSystem igfsSec = new IgniteHadoopIgfsSecondaryFileSystem("file:///", null, "ivan");
[/CODE]

I have two webapps in my Tomcat container and I want the IGFS to be visible to both the web applications
(in replication mode). I must use the same (on -heap) cache to back up the IGFS instance for both the webapp contexts.

To pull this off, I assume I must put the Ignite jars in the path of Tomcat's common classloader (that is in ${TOMCAT}/lib (as opposed to WEB-INF/lib dir of each webapp)?

Pl advise...

thanx,

/Kobe
Denis Magda Denis Magda
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Hi,

You can put the jars in any of the listed folders. It should work fine in both cases.

However, I suggest you taking into account Ivan's thoughts and concerns for cases when a local file system is utilized across several remote nodes.

--
Denis
Kobe Kobe
Reply | Threaded
Open this post in threaded view
|

Re: IGFS backed by persistence on physical filesystem

Thank you Denis. I have read through Ivan's comments and have a plan to address it in my implementation.

However, could you help me with my query http://apache-ignite-users.70518.x6.nabble.com/Unable-to-find-ignite-artifacts-tc2097.html

I am unable to find ignite artifacts - is there a dedicated ignite repository?

Thanks,
Kobe

Denis Magda wrote
Hi,

You can put the jars in any of the listed folders. It should work fine in both cases.

However, I suggest you taking into account Ivan's thoughts and concerns for cases when a local file system is utilized across several remote nodes.

--
Denis