How use Hadoop accelerator

classic Classic list List threaded Threaded
6 messages Options
jarredfox jarredfox
Reply | Threaded
Open this post in threaded view
|

How use Hadoop accelerator

This post has NOT been accepted by the mailing list yet.
I use hadoop 2.4.1 and Ignite 1.3.0 for test .
 
I am trying to read a simple test for IGFS compared with HDFS

I've configured the Hadoop FileSystem Cache (secondaryFileSystem)

but, I got the same result. (No performance improvement.)

The following is a test environment, and information.

My test H/W info :
 - Hadoop Cluster : 1 name node & 4 data node  
 - Test client: Testing on eclipse(windows )
 - Test file : 15GB text file (on HDFS)
 - hadoop namenode ip : 192.168.10.104)

hadoop core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.104:9000</value>
</property>

<property>
  <name>fs.igfs.impl</name>
  <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
</property>

<property>
  <name>fs.AbstractFileSystem.igfs.impl</name>
  <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
</property>

ignite default-config.xml (Modified parts)
<property name="ipcEndpointConfiguration">
        <bean class="org.apache.ignite.igfs.IgfsIpcEndpointConfiguration">
                <property name="type" value="TCP" />
                <property name="host" value="192.168.10.104" />
                <property name="port" value="10500"/>
        </bean>
</property>

<property name="secondaryFileSystem">
        <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
                <constructor-arg name="uri" value="hdfs://192.168.10.104:9000"/>
                <constructor-arg name="cfgPath"><null/></constructor-arg>
                <constructor-arg name="userName" value="hadoop"/>
        </bean>
</property>
 ...


Server Side Hadoop & Ignite Execution
1. Hadoop start with hadoop shell (start-all.sh)
2. Ignite start on NameNode with Ignite shell (ignite.sh)
    (Only one daemon is running )

Reading Testcode (HDFS & IGFS)
HDFSTest.java
IGFSTest.java

Is there somewhere where I wrong or do any more?



vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How use Hadoop accelerator

Hi,

Before making any other posts on this forum, please subscribe to the Apache Ignite user mailing list, otherwise your messages are not forwarded there. Refer to the instruction here: http://apache-ignite-users.70518.x6.nabble.com/mailing_list/MailingListOptions.jtp?forum=1

jarredfox wrote
I use hadoop 2.4.1 and Ignite 1.3.0 for test .
 
I am trying to read a simple test for IGFS compared with HDFS

I've configured the Hadoop FileSystem Cache (secondaryFileSystem)

but, I got the same result. (No performance improvement.)

...

Reading Testcode (HDFS & IGFS)
HDFSTest.java
IGFSTest.java

Is there somewhere where I wrong or do any more?
First of all you should make sure that your test is reading the file that is already in memory. When you read some piece of data for the first time, it will still have to go to Hadoop, but with IGFS configured it will end up in in-memory cache. Also note that you're using remote client which has to transfer data over network, so I would not expect much performance improvement in this test, because essentially you're comparing TCP vs file I/O.

Can you describe your use case? Are you running any Hadoop mapreduce jobs? Ignite Hadoop Accelerator is shipped with its own jobtracker implementation which allows to increase jobs performance without any code changes. I would recommend to take a look at documentation [1] and the blog by Konstantin Boudnik where he demonstrates how Hadoop job can be accelerated with the help of Ignite [2].

Let us know if you have more questions.

[1] https://apacheignite.readme.io/docs/hadoop-accelerator
[2] http://drcos.boudnik.org/2015/05/30-time-faster-hadoop-mapreduce.html

-Val
jarredfox jarredfox
Reply | Threaded
Open this post in threaded view
|

Re: How use Hadoop accelerator

This post has NOT been accepted by the mailing list yet.
Hi.

I test the sample code on my hadoop cluster server.
And, The result is faster than HDFS (about 10x)  
thanks!

But, I have an problem .. doing another test
I delete file with IGFS, then real file(on HDFS) is deleted
but When I check the deleted file with Exist API, the file file still exists.

below test code

String igfs_uri = "igfs://<name>@<serverip>:10500/";
fs = FileSystem.get(URI.create(igfs_uri), configuration, "hadoop");

Path file = new Path("testfile.txt");
if(fs.exists(file)) {
    System.out.println("delete testfile.txt ....");
    fs.delete(file, false);
    System.out.println(("Exist testfile.txt ? : " + fs.exists(file));
}

the result is always "true"
( i test " fs.delete(file), fs.delete(file, true) " but result is same)

When i restart ignity daemon, the file is not exist.






vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|

Re: How use Hadoop accelerator

Hi,

Please subscribe to the mailing list as described here: http://apache-ignite-users.70518.x6.nabble.com/mailing_list/MailingListOptions.jtp?forum=1

jarredfox wrote
Hi.

I test the sample code on my hadoop cluster server.
And, The result is faster than HDFS (about 10x)  
thanks!

But, I have an problem .. doing another test
I delete file with IGFS, then real file(on HDFS) is deleted
but When I check the deleted file with Exist API, the file file still exists.

below test code

String igfs_uri = "igfs://<name>@<serverip>:10500/";
fs = FileSystem.get(URI.create(igfs_uri), configuration, "hadoop");

Path file = new Path("testfile.txt");
if(fs.exists(file)) {
    System.out.println("delete testfile.txt ....");
    fs.delete(file, false);
    System.out.println(("Exist testfile.txt ? : " + fs.exists(file));
}

the result is always "true"
( i test " fs.delete(file), fs.delete(file, true) " but result is same)

When i restart ignity daemon, the file is not exist.
I've just created a similar test and it works for me. Is it possible that other operations on the same file are happening concurrently? Are you updating files exclusively via IGFS, or HDFS is accessed directly as well?

Can you also check what fs.delete() method returns? It should return true in case the file is actually deleted. You can also try to read the file contents and see what happens - is it really there or it's just fs.exists() that returns incorrect results? This will help to isolate the issue.

-Val
jarredfox jarredfox
Reply | Threaded
Open this post in threaded view
|

Re: How use Hadoop accelerator

This post was updated on .
I have performed some tests to make accurate error conditions.

first test - delete with hdfs => delete with igfs
step1. ignite start
step2. upload sample file to HDFS
step3. delete sample file ( with Hadoop Shell command)
step4. delete sample file test (with IGFS sample code)
result : well done.

second test - read file with IGFS => delete with HDFS => delete with IGFS
step1. ignite start
step2. upload sample file to HDFS
step3. read one line from sample file with IGFS
step4. delete sample file ( with Hadoop Shell command)
step5. delete sample file test (with IGFS sample code)
result : sample file is deleted from HDFS (check with HDFS shell command)
           but IGFS is not work
           when i delete with fs.delete(), the return is false.
           Sample file is still exist . I can read the file with IGFS.
           (In practice, smple file is not exist. )

below is my test case and log

# hdfs dfs -copyFromLocal ./sample.txt /temp/

# hadoop jar ./IGFS_read.jar /temp/sample.txt
/temp/sample.txt  : file Exist
Read One Line : abcdefg

# hdfs dfs -rm -f ./sample.txt /temp/
Deleted /temp/sample.txt

# hadoop jar ./IGFS_delete.jar /temp/sample.txt
/temp/sample.txt : file Exist
delete : false
/temp/sample.txt : file Exist
Read One Line : abcdefg

# hadoop dfs -ls  /temp/sample.txt
ls: `/temp/sample.txt': No such file or directory








Vladimir Ozerov Vladimir Ozerov
Reply | Threaded
Open this post in threaded view
|

Re: How use Hadoop accelerator

Jarred,

To achieve better performance IGFS is designed to be the only point to perform mutable actions, like delete, on the file system. When you remove a file from HDFS without letting IGFS know about this, IGFS thinks that the file never existed.

Please try doing all file system operations thtough IGFS.

In the nearest releases we will address the issue you faced and improve usability in cases when updates can be performed without IGFS.

Vladimir.

On Fri, Sep 11, 2015 at 8:37 AM, jarredfox <[hidden email]> wrote:
I have performed some tests to make accurate error conditions.

first test - delete with hdfs => delete with igfs
step1. ignite start
step2. upload sample file to HDFS
step3. delete sample file ( with Hadoop Shell command)
step4. delete sample file test (with IGFS sample code)
result : well done.

second test - read file with IGFS => delete with HDFS => delete with IGFS
step1. ignite start
step2. upload sample file to HDFS
step3. read one line from sample file with IGFS
step4. delete sample file ( with Hadoop Shell command)
step5. delete sample file test (with IGFS sample code)
result : sample file is deleted from HDFS (check with HDFS shell command)
           but IGFS is not work
           when i delete with fs.delete(), the return is false.
           Sample file is still exist . I can read the file with IGFS.
           (In practice, smple file is not exist. )

below is my test case and log

# hdfs dfs -copyFromLocal ./sample.txt /temp/

# hadoop jar ./IGFS_read.jar /temp/sample.txt
/temp/sample.txt  : file Exist
Read One Line : abcdefg

# hdfs dfs -rm -f ./sample.txt /temp/
Deleted ./sample.txt /temp/

# hadoop jar ./IGFS_delete.jar /temp/sample.txt
/temp/sample.txt : file Exist
delete : false
/temp/sample.txt : file Exist
Read One Line : abcdefg

# hadoop dfs -ls  /temp/sample.txt
ls: `/temp/sample.txt': No such file or directory












--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-use-Hadoop-accelerator-tp1314p1355.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.