How to configure Hadoop accelerator for Hadoop-High Availability ?

classic Classic list List threaded Threaded
10 messages Options
Jesu Jesu
Reply | Threaded
Open this post in threaded view
|

How to configure Hadoop accelerator for Hadoop-High Availability ?

Unable to find any documentation related to this. Is it possible?
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

I believe you just need to provide correct HDFS URL with nameservice ID as IGFS secondary file system [1]. E.g., in the example from HDFS HA documentation [2] the URL will be 'hdfs://mycluster', where 'mycluster' is the nameservice ID.

Let me know if it works for you.

[1] https://apacheignite.readme.io/v1.3/docs/secondary-file-system
[2] https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html

-Val
Jesu Jesu
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

It doesn't work, already tried that it throws the following (UnknownHostException) exception

Could not instantiate bean class [org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem]: Constructor threw exception; nested exception is java.lang.IllegalArgumentException: java.net.UnknownHostException: myhdfs
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:290)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:122)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1471)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1216)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:538)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:276)
        ... 24 more


In order to resolve the nameservice ID to name node ip  the client requires the hdfs-site.xml configuration. Is there any way to provide this configuration?  

Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

Jesu,

IgniteHadoopIgfsSecondaryFileSystem has constructor which accepts two strings: file system URI and path to Hadoop configuration XML. Please try using this constructor and set the second argument to the path to your hdfs-site.xml file.

Please let me know if it helps.

Vladimir.
Jesu Jesu
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

Tried it still throws the same exception. On inspecting the second parameter is for username and not for hdfs--site.xml configuration.
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

This post was updated on .
Jesu,

Which Ignite version do you use? IgniteHadoopIgfsSecondaryFileSystem has 3 constructors:

IgniteHadoopIgfsSecondaryFileSystem(String uri)
IgniteHadoopIgfsSecondaryFileSystem(@Nullable String uri, @Nullable String cfgPath)
IgniteHadoopIgfsSecondaryFileSystem(@Nullable String uri, @Nullable String cfgPath, String userName)

Username is the third parameter in another constrcutor.

If possible, please attache your Ignite XML configuration here and I'll check it. Also please attach your hdfs-site.xml.
Jesu Jesu
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

 Vladimir,
           It works when i provide the hdfs-site.xml files path in the cfgPath. Before I had provided a wrong uri that's why it didn't work.
 But when try to read a file using IGFS it throws the following exception.

 Exception in thread "main" java.lang.IllegalArgumentException: Failed to resolve secondary file system configuration path: /home/sas/hadoopAccelerator/config/hconf/hdfs-site.xml
        at org.apache.ignite.internal.processors.hadoop.SecondaryFileSystemProvider.<init>(SecondaryFileSystemProvider.java:60)
        at org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:293)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at Test.main(Test.java:59)

 Test Program Test.java
 Iginite XML default-config.xml
 HDFS Site hdfs-site.xml
       
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

Jesu,

You have two processes: one is Ignite's node and another is you test application. Ignite was able to resolve configuration path, but your test application is not. 

I suspect that these applications are either running on different machines or your test application do not have enough privilegies to read the file "/home/sas/hadoopAccelerator/config/hconf/hdfs-site.xml". We can quickly confirm or reject this hypothesis if you manually check for file availability in you test application as follows:

System.out.println(new File("/home/sas/hadoopAccelerator/config/hconf/hdfs-site.xml").exists());

If it return "false", then our asusmption is correct and you should find a reason for this: file either doesn't exist or you have insufficient privilegies to read it.

Please let me know the result.

Vladimir.

On Mon, Sep 14, 2015 at 10:44 AM, Jesu <[hidden email]> wrote:
 Vladimir,
           It works when i provide the hdfs-site.xml files path in the
cfgPath. Before I had provided a wrong uri that's why it didn't work.
 But when try to read a file using IGFS it throws the following exception.

 Exception in thread "main" java.lang.IllegalArgumentException: Failed to
resolve secondary file system configuration path:
/home/sas/hadoopAccelerator/config/hconf/hdfs-site.xml
        at
org.apache.ignite.internal.processors.hadoop.SecondaryFileSystemProvider.<init>(SecondaryFileSystemProvider.java:60)
        at
org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem.initialize(IgniteHadoopFileSystem.java:293)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at Test.main(Test.java:59)

 Test Program  Test.java
<http://apache-ignite-users.70518.x6.nabble.com/file/n1375/Test.java>
 Iginite XML  default-config.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/n1375/default-config.xml>
 HDFS Site  hdfs-site.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/n1375/hdfs-site.xml>




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-configure-Hadoop-accelerator-for-Hadoop-High-Availability-tp1273p1375.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Jesu Jesu
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

Vladimir,
Yes, the processes are running in different machines. Does this mean ignite needs to run on all the machines in which igfs is accessed?
Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: How to configure Hadoop accelerator for Hadoop-High Availability ?

Jesu,

Yes, normally we expect that one Ignite node will run on all machines where file system is accessed. The main reason for this is performance. For example, what if Hadoop sent a job to a node where HDFS is running, but there is no Ignite node? In this case if you access HDFS through IGFS, you will have to connect to some remote Ignite node and try fetching data form there, what is less than optimal due to additoinla network trips. This is why advised way to work with IGFS is to start Ignite node on each machine running HDFS.

However, this is not strict requirement. You can work with remote Ignite nodes through TCP endpoint of course. But if IGFS use some additional Hadoop configuration files like core-site.xml or hdfs-site.xml, then you must ensure that they are placed in the same directories on all machnies where you access IGFS. 

Please let me know if you have any further questions.

Vladimir.


On Mon, Sep 14, 2015 at 11:56 AM, Jesu <[hidden email]> wrote:
Vladimir,
Yes, the processes are running in different machines. Does this mean ignite
needs to run on all the machines in which igfs is accessed?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-to-configure-Hadoop-accelerator-for-Hadoop-High-Availability-tp1273p1378.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.