how to append data to IGFS so that data gets saved to Hive partitioned table?

classic Classic list List threaded Threaded
15 messages Options
csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

how to append data to IGFS so that data gets saved to Hive partitioned table?

I have a Hive partitioned table as below:

create table stocks (stock string, time timestamp, price float) PARTITIONED BY (years bigint, months bigint, days bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Data is getting saved in HDFS through IGFS when I pass comma separated string to the fs.append method but when I query on Hive there is no row coming.

If I remove the partition while creating the table, I am able to fetch rows from hive.

What could be wrong here? How can we save data to a partitioned table through IGFS?
mcherkasov mcherkasov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi Csumi,

Does your hive requests work with HDFS directly or with IGFS via IgniteHadoopFileSystem?

Thanks,
Mikhail.
csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Ignite-hadoop accelerator is configured as https://apacheignite.readme.io/v1.0/docs/hadoop-accelerator#section-secondary-file-system.

Idea is to run hive queries on IGFS but looks like its not working that way. Not sure how to confirm if hive is connecting to HDFS or IGFS. Also, I think creating partition by passing comma separated string to fs.append is not creating partition correctly which is resulting in no result on hive query.

Below are some of the configurations I have. Please let me know if I need to share any more details.

core-site.xml is having these properties:

<property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
       
       
        <property>
                <name>fs.igfs.impl</name>
                <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
  </property>
  <property>
                <name>fs.AbstractFileSystem.igfs.impl</name>
                <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
  </property>
      <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
        <value>NEVER</value>
    </property>

ignite's default-config has below configuration

<property name="secondaryFileSystem">
                        <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
                            <property name="fileSystemFactory">
                                <bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
                                    <property name="uri" value="hdfs://localhost:9000/"/>
                                                                        <property name="configPaths">
                                                                                <list>
                                                                                        <value>D:/hadoop/etc/hadoop/core-site.xml</value>
                                                                                </list>
                                                                        </property>
                                </bean>
                            </property>
                        </bean>
                    </property>



mcherkasov mcherkasov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi again,

try to do this without ignite, looks like there's a problem on hive side.

<property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>

this means that you uses Hadoop directly, to make hive work with IGFS
you need to change hdfs to igfs:

<property>
                <name>fs.defaultFS</name>
                <value>igfs://localhost:9000</value>
        </property> 

Thanks,
Mikhail.

On Wed, Aug 2, 2017 at 12:35 PM, csumi <[hidden email]> wrote:
Ignite-hadoop accelerator is configured as
https://apacheignite.readme.io/v1.0/docs/hadoop-accelerator#section-secondary-file-system.

Idea is to run hive queries on IGFS but looks like its not working that way.
Not sure how to confirm if hive is connecting to HDFS or IGFS. Also, I think
creating partition by passing comma separated string to fs.append is not
creating partition correctly which is resulting in no result on hive query.

Below are some of the configurations I have. Please let me know if I need to
share any more details.

core-site.xml is having these properties:

<property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>


        <property>
                <name>fs.igfs.impl</name>
                <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
  </property>
  <property>
                <name>fs.AbstractFileSystem.igfs.impl</name>
                <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
  </property>
      <property>

<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
        <value>NEVER</value>
    </property>

ignite's default-config has below configuration

<property name="secondaryFileSystem">
                        <bean
class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
                            <property name="fileSystemFactory">
                                <bean
class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
                                    <property name="uri"
value="hdfs://localhost:9000/"/>
                                                                        <property name="configPaths">
                                                                                <list>
                                                                                        <value>D:/hadoop/etc/hadoop/core-site.xml</value>
                                                                                </list>
                                                                        </property>
                                </bean>
                            </property>
                        </bean>
                    </property>







--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/how-to-append-data-to-IGFS-so-that-data-gets-saved-to-Hive-partitioned-table-tp15725p15887.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Thanks,
Mikhail.
csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi Mikhail,

Thanks for your response.

I changed hdfs://localhost:9000 to igfs://localhost:9000 in the core-site.xml of hive home directory but core-site.xml of Hadoop home was still pointing to hdfs://localhost:9000. Issue persists.

If I create partition using the insert command on hive I am getting rows on select query. Even after stopping all ignite nodes I am still getting the data on select query. Does this mean that hive is still pointing to hdfs?

What else can we check here?

Thanks!
mcherkasov mcherkasov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi again,

so it's correct hadoop core-site.xml should have "hdfs://localhost:9000".

Hadoop works with a disk. So you write to IGFS, IGFS writes to the secondary file system which is hadoop,
that means hadoop itself shouldn't know anything about IGFS
otherwise, it will write to IGFS, which will write to secondary FS which is hadoop, which will write to IGFS and so on.

However,  your code or tools like hive need to work with IGFS so they should work with core-site.xml witn igfs://localhost:9000 .

As you said looks like hive still works with hadoop directly:

you need to feed core-site.xml to Hive with igfs://localhost:9000 and 
    <property>
        <name>fs.igfs.impl</name>
        <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.igfs.impl</name>
        <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
    </property>

So, for now, I think your very first question should be asked to Hive and Hadoop guys because it doesn't 
work when you work directly with Hadoop.

Thanks,
Mikhail.


On Thu, Aug 3, 2017 at 12:46 PM, csumi <[hidden email]> wrote:
Hi Mikhail,

Thanks for your response.

I changed hdfs://localhost:9000 to igfs://localhost:9000 in the
core-site.xml of hive home directory but core-site.xml of Hadoop home was
still pointing to hdfs://localhost:9000. Issue persists.

If I create partition using the insert command on hive I am getting rows on
select query. Even after stopping all ignite nodes I am still getting the
data on select query. Does this mean that hive is still pointing to hdfs?

What else can we check here?

Thanks!



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/how-to-append-data-to-IGFS-so-that-data-gets-saved-to-Hive-partitioned-table-tp15725p15937.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Thanks,
Mikhail.
csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Thank you for the response.

If I insert data from hive directly using below query, select query works fine.

insert into table stocks PARTITION (years=2004,months=12,days=3) values('AAPL',1501236980,120.34);

I think the issue here is that when we insert data using IGFS api (append method), select fails to return the results but if we use insert query, partitions are created and select query works fine. How to resolve this? Is partition supported through IGFS?
Jörn Franke Jörn Franke
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

I think it is still not clear what you are doing. What do you mean by using the fs.append function? Can you please provide each query that you execute? From where is the data inserted? Did you check all the logfiles of Hive and in Yarn?

Then single inserts are highly inefficient. Try to use create table as select or insert overwrite into a new partition.

As the people already said - if it even does not work when not using Ignite then it is a Hive or HDFS problem.

> On 4. Aug 2017, at 06:45, csumi <[hidden email]> wrote:
>
> Thank you for the response.
>
> If I insert data from hive directly using below query, select query works
> fine.
>
> insert into table stocks PARTITION (years=2004,months=12,days=3)
> values('AAPL',1501236980,120.34);
>
> I think the issue here is that when we insert data using IGFS api (append
> method), select fails to return the results but if we use insert query,
> partitions are created and select query works fine. How to resolve this? Is
> partition supported through IGFS?
>
>
>
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/how-to-append-data-to-IGFS-so-that-data-gets-saved-to-Hive-partitioned-table-tp15725p15983.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Let me try to clear here with the sequence of steps performed.
- Created table with partition through hive using below query. It creates a directory in hdfs.
        create table stocks3 (stock string, time timestamp, price float) PARTITIONED BY (years bigint, months bigint, days bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
- Then I get streaming data and using IgniteFileSystem's append/create method, it gets saved to ignited hadoop.
- Run below select query. No result returned
        select * from stocks3;
- Stop ignite and run the select again on hive. No result with below logs

hive> select * from stocks3;
17/08/04 14:59:08 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:08 INFO session.SessionState: Updating thread name to b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:08 WARN operation.Operation: Unable to create operation log file: D:\tmp\hive\<user>\operation_logs\b5e3e924-e46a-481c-8aef-30d48605a2da\137adad6-ea23-462c-a414-6ce260e5bd49
java.io.IOException: The system cannot find the path specified
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:1012)
        at org.apache.hive.service.cli.operation.Operation.createOperationLog(Operation.java:237)
        at org.apache.hive.service.cli.operation.Operation.beforeRun(Operation.java:279)
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:314)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499)
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486)
        at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412)
        at com.sun.proxy.$Proxy21.ExecuteStatement(Unknown Source)
        at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:308)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
        at org.apache.hive.beeline.Commands.executeInternal(Commands.java:988)
        at org.apache.hive.beeline.Commands.execute(Commands.java:1160)
        at org.apache.hive.beeline.Commands.sql(Commands.java:1074)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1148)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:976)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:886)
        at org.apache.hive.beeline.cli.HiveCli.runWithArgs(HiveCli.java:35)
        at org.apache.hive.beeline.cli.HiveCli.main(HiveCli.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
17/08/04 14:59:08 INFO ql.Driver: Compiling command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676): select * from stocks3
17/08/04 14:59:08 INFO parse.CalcitePlanner: Starting Semantic Analysis
17/08/04 14:59:08 INFO parse.CalcitePlanner: Completed phase 1 of Semantic Analysis
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for source tables
17/08/04 14:59:08 INFO metastore.HiveMetaStore: 0: get_table : db=yt tbl=stocks3
17/08/04 14:59:08 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr      cmd=get_table : db=yt tbl=stocks3
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for subqueries
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for destination tables
17/08/04 14:59:09 INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO parse.CalcitePlanner: Completed getting MetaData in Semantic Analysis
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for source tables
17/08/04 14:59:09 INFO metastore.HiveMetaStore: 0: get_table : db=yt tbl=stocks3
17/08/04 14:59:09 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr      cmd=get_table : db=yt tbl=stocks3
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for subqueries
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for destination tables
17/08/04 14:59:09 INFO ql.Context: New scratch dir is hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_831615902204143
0928-1/-mr-10001/.hive-staging_hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO parse.CalcitePlanner: CBO Succeeded; optimized logical plan.
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for FS(2)
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for SEL(1)
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for TS(0)
17/08/04 14:59:09 INFO metastore.HiveMetaStore: 0: get_partitions : db=yt tbl=stocks3
17/08/04 14:59:09 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr      cmd=get_partitions : db=yt tbl=stocks3
17/08/04 14:59:09 INFO parse.CalcitePlanner: Completed plan generation
17/08/04 14:59:09 INFO ql.Driver: Semantic Analysis Completed
17/08/04 14:59:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stocks3.stock, type:string, comment:null), FieldSchema(name:stocks3.time, type:timestamp, comment:null),
FieldSchema(name:stocks3.price, type:float, comment:null), FieldSchema(name:stocks3.years, type:bigint, comment:null), FieldSchema(name:stocks3.months, type:bigint, comment:null), FieldSchema(name:sto
cks3.days, type:bigint, comment:null)], properties:null)
17/08/04 14:59:09 INFO exec.TableScanOperator: Initializing operator TS[0]
17/08/04 14:59:09 INFO exec.SelectOperator: Initializing operator SEL[1]
17/08/04 14:59:09 INFO exec.SelectOperator: SELECT struct<stock:string,time:timestamp,price:float,years:bigint,months:bigint,days:bigint>
17/08/04 14:59:09 INFO exec.ListSinkOperator: Initializing operator LIST_SINK[3]
17/08/04 14:59:09 INFO ql.Driver: EXPLAIN output for queryid <user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676 : STAGE DEPENDENCIES:
  Stage-0 is a root stage [FETCH]

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: stocks3
          GatherStats: false
          Select Operator
            expressions: stock (type: string), time (type: timestamp), price (type: float), years (type: bigint), months (type: bigint), days (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
            ListSink


17/08/04 14:59:09 INFO ql.Driver: Completed compiling command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676); Time taken: 0.586 seconds
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
17/08/04 14:59:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
17/08/04 14:59:09 INFO ql.Driver: Executing command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676): select * from stocks3
17/08/04 14:59:09 INFO ql.Driver: Completed executing command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676); Time taken: 0.002 seconds
OK
17/08/04 14:59:09 INFO ql.Driver: OK
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to b5e3e924-e46a-481c-8aef-30d48605a2da Thread-52
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  Thread-52
17/08/04 14:59:09 WARN thrift.ThriftCLIService: Error fetching results:
org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=137adad6-ea23-462c-a414-6ce260e5bd49]

        at org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:324)
        at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:849)
        at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:505)
        at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:698)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412)
        at com.sun.proxy.$Proxy21.FetchResults(Unknown Source)
        at org.apache.hive.jdbc.HiveStatement.getQueryLog(HiveStatement.java:871)
        at org.apache.hive.jdbc.HiveStatement.getQueryLog(HiveStatement.java:842)
        at org.apache.hive.beeline.Commands.showRemainingLogsIfAny(Commands.java:1211)
        at org.apache.hive.beeline.Commands.access$200(Commands.java:68)
        at org.apache.hive.beeline.Commands$2.run(Commands.java:1187)
        at java.lang.Thread.run(Thread.java:724)
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
No rows selected (0.612 seconds)
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for log id: b5e3e924-e46a-481c-8aef-30d48605a2da

- Data created in HDFS (http://localhost:50070/explorer.html#/usr/hive/warehouse/yt.db/stocks3/years=2017/months=7/days=4) is as follows:
-rw-r--r-- <user>        supergroup 44 B Aug 04 14:48 3 128 MB 1

- Start ignite
- Run insert query as below
        insert into table stocks3 PARTITION (years=2004,months=12,days=3) values('AAPL',1501236980,120.34);
- New partition created http://localhost:50070/explorer.html#/usr/hive/warehouse/yt.db/stocks3/years=2004/months=12/days=3
        -rwxr-xr-x <user>        supergroup 15 B Aug 04 15:16 1 128 MB 000000_0
- Run below select query which is returning the row inserting using the aboe insert.
        select * from stocks3;
- Now insert new row in the table to the partition created through code earlier
        insert into table stocks3 PARTITION (years=2017,months=7,days=4) values('AAPL',1501236980,120.34);
- Run select query again. Now it gives 3 rows. Two of which were inserted using insert command and one through code which was not coming in select query earlier.
Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi,

Could you please clarify, if you run all actions using IGFS, but instaed of fs.appedn use Hive, like:
insert into table stocks PARTITION (years=2004,months=12,days=3)
values('AAPL',1501236980,120.34);

Does select work this time? 

Thanks,
Mikhail.

2017-08-04 12:56 GMT+03:00 csumi <[hidden email]>:
Let me try to clear here with the sequence of steps performed.
-       Created table with partition through hive using below query. It creates a
directory in hdfs.
        create table stocks3 (stock string, time timestamp, price float)
PARTITIONED BY (years bigint, months bigint, days bigint) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ',';
-       Then I get streaming data and using IgniteFileSystem's append/create
method, it gets saved to ignited hadoop.
-       Run below select query. No result returned
        select * from stocks3;
-       Stop ignite and run the select again on hive. No result with below logs

hive> select * from stocks3;
17/08/04 14:59:08 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:08 INFO session.SessionState: Updating thread name to
b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:08 WARN operation.Operation: Unable to create operation log
file:
D:\tmp\hive\<user>\operation_logs\b5e3e924-e46a-481c-8aef-30d48605a2da\137adad6-ea23-462c-a414-6ce260e5bd49
java.io.IOException: The system cannot find the path specified
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:1012)
        at
org.apache.hive.service.cli.operation.Operation.createOperationLog(Operation.java:237)
        at
org.apache.hive.service.cli.operation.Operation.beforeRun(Operation.java:279)
        at
org.apache.hive.service.cli.operation.Operation.run(Operation.java:314)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486)
        at
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:295)
        at
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412)
        at com.sun.proxy.$Proxy21.ExecuteStatement(Unknown Source)
        at
org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:308)
        at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
        at
org.apache.hive.beeline.Commands.executeInternal(Commands.java:988)
        at org.apache.hive.beeline.Commands.execute(Commands.java:1160)
        at org.apache.hive.beeline.Commands.sql(Commands.java:1074)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1148)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:976)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:886)
        at org.apache.hive.beeline.cli.HiveCli.runWithArgs(HiveCli.java:35)
        at org.apache.hive.beeline.cli.HiveCli.main(HiveCli.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
17/08/04 14:59:08 INFO ql.Driver: Compiling
command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676):
select * from stocks3
17/08/04 14:59:08 INFO parse.CalcitePlanner: Starting Semantic Analysis
17/08/04 14:59:08 INFO parse.CalcitePlanner: Completed phase 1 of Semantic
Analysis
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for source tables
17/08/04 14:59:08 INFO metastore.HiveMetaStore: 0: get_table : db=yt
tbl=stocks3
17/08/04 14:59:08 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr
cmd=get_table : db=yt tbl=stocks3
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for subqueries
17/08/04 14:59:08 INFO parse.CalcitePlanner: Get metadata for destination
tables
17/08/04 14:59:09 INFO ql.Context: New scratch dir is
hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO parse.CalcitePlanner: Completed getting MetaData in
Semantic Analysis
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for source tables
17/08/04 14:59:09 INFO metastore.HiveMetaStore: 0: get_table : db=yt
tbl=stocks3
17/08/04 14:59:09 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr
cmd=get_table : db=yt tbl=stocks3
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for subqueries
17/08/04 14:59:09 INFO parse.CalcitePlanner: Get metadata for destination
tables
17/08/04 14:59:09 INFO ql.Context: New scratch dir is
hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO common.FileUtils: Creating directory if it doesn't
exist:
hdfs://localhost:9000/tmp/hive/<user>/b5e3e924-e46a-481c-8aef-30d48605a2da/hive_2017-08-04_14-59-08_935_831615902204143
0928-1/-mr-10001/.hive-staging_hive_2017-08-04_14-59-08_935_8316159022041430928-1
17/08/04 14:59:09 INFO parse.CalcitePlanner: CBO Succeeded; optimized
logical plan.
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for FS(2)
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for SEL(1)
17/08/04 14:59:09 INFO ppd.OpProcFactory: Processing for TS(0)
17/08/04 14:59:09 INFO metastore.HiveMetaStore: 0: get_partitions : db=yt
tbl=stocks3
17/08/04 14:59:09 INFO HiveMetaStore.audit: ugi=<user>  ip=unknown-ip-addr
cmd=get_partitions : db=yt tbl=stocks3
17/08/04 14:59:09 INFO parse.CalcitePlanner: Completed plan generation
17/08/04 14:59:09 INFO ql.Driver: Semantic Analysis Completed
17/08/04 14:59:09 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:stocks3.stock, type:string,
comment:null), FieldSchema(name:stocks3.time, type:timestamp, comment:null),
FieldSchema(name:stocks3.price, type:float, comment:null),
FieldSchema(name:stocks3.years, type:bigint, comment:null),
FieldSchema(name:stocks3.months, type:bigint, comment:null),
FieldSchema(name:sto
cks3.days, type:bigint, comment:null)], properties:null)
17/08/04 14:59:09 INFO exec.TableScanOperator: Initializing operator TS[0]
17/08/04 14:59:09 INFO exec.SelectOperator: Initializing operator SEL[1]
17/08/04 14:59:09 INFO exec.SelectOperator: SELECT
struct<stock:string,time:timestamp,price:float,years:bigint,months:bigint,days:bigint>
17/08/04 14:59:09 INFO exec.ListSinkOperator: Initializing operator
LIST_SINK[3]
17/08/04 14:59:09 INFO ql.Driver: EXPLAIN output for queryid
<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676 : STAGE
DEPENDENCIES:
  Stage-0 is a root stage [FETCH]

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: stocks3
          GatherStats: false
          Select Operator
            expressions: stock (type: string), time (type: timestamp), price
(type: float), years (type: bigint), months (type: bigint), days (type:
bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
            ListSink


17/08/04 14:59:09 INFO ql.Driver: Completed compiling
command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676);
Time taken: 0.586 seconds
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
17/08/04 14:59:09 INFO ql.Driver: Concurrency mode is disabled, not creating
a lock manager
17/08/04 14:59:09 INFO ql.Driver: Executing
command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676):
select * from stocks3
17/08/04 14:59:09 INFO ql.Driver: Completed executing
command(queryId=<user>_20170804145908_b270c978-ab00-4160-a2a6-c19b42eab676);
Time taken: 0.002 seconds
OK
17/08/04 14:59:09 INFO ql.Driver: OK
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to
b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to
b5e3e924-e46a-481c-8aef-30d48605a2da Thread-52
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to
Thread-52
17/08/04 14:59:09 WARN thrift.ThriftCLIService: Error fetching results:
org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated
with operation handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=137adad6-ea23-462c-a414-6ce260e5bd49]

        at
org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:324)
        at
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:849)
        at
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:505)
        at
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:698)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:491)
        at
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412)
        at com.sun.proxy.$Proxy21.FetchResults(Unknown Source)
        at
org.apache.hive.jdbc.HiveStatement.getQueryLog(HiveStatement.java:871)
        at
org.apache.hive.jdbc.HiveStatement.getQueryLog(HiveStatement.java:842)
        at
org.apache.hive.beeline.Commands.showRemainingLogsIfAny(Commands.java:1211)
        at org.apache.hive.beeline.Commands.access$200(Commands.java:68)
        at org.apache.hive.beeline.Commands$2.run(Commands.java:1187)
        at java.lang.Thread.run(Thread.java:724)
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Updating thread name to
b5e3e924-e46a-481c-8aef-30d48605a2da main
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da
17/08/04 14:59:09 INFO session.SessionState: Resetting thread name to  main
No rows selected (0.612 seconds)
17/08/04 14:59:09 INFO conf.HiveConf: Using the default value passed in for
log id: b5e3e924-e46a-481c-8aef-30d48605a2da

-       Data created in HDFS
(http://localhost:50070/explorer.html#/usr/hive/warehouse/yt.db/stocks3/years=2017/months=7/days=4)
is as follows:
-rw-r--r--      <user>  supergroup      44 B    Aug 04 14:48    3       128 MB  1

-       Start ignite
-       Run insert query as below
        insert into table stocks3 PARTITION (years=2004,months=12,days=3)
values('AAPL',1501236980,120.34);
-       New partition created
http://localhost:50070/explorer.html#/usr/hive/warehouse/yt.db/stocks3/years=2004/months=12/days=3
        -rwxr-xr-x      <user>  supergroup      15 B    Aug 04 15:16    1       128 MB  000000_0
-       Run below select query which is returning the row inserting using the aboe
insert.
        select * from stocks3;
-       Now insert new row in the table to the partition created through code
earlier
        insert into table stocks3 PARTITION (years=2017,months=7,days=4)
values('AAPL',1501236980,120.34);
-       Run select query again. Now it gives 3 rows. Two of which were inserted
using insert command and one through code which was not coming in select
query earlier.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/how-to-append-data-to-IGFS-so-that-data-gets-saved-to-Hive-partitioned-table-tp15725p15991.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Yes if I create partition using hive, select works fine. Please see last two bullets of my previous comment. Copying here for your quick reference:

-      Now insert new row in the table to the partition created through code
earlier
        insert into table stocks3 PARTITION (years=2017,months=7,days=4)
values('AAPL',1501236980,120.34);
-       Run select query again. Now it gives 3 rows. Two of which were inserted
using insert command and one through code which was not coming in select
query earlier.
Mikhail Mikhail
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Ok, I'll try to reproduce this issue locally, will response tomorrow.

2017-08-05 13:50 GMT+03:00 csumi <[hidden email]>:
Yes if I create partition using hive, select works fine. Please see last two
bullets of my previous comment. Copying here for your quick reference:

-      Now insert new row in the table to the partition created through code
earlier
        insert into table stocks3 PARTITION (years=2017,months=7,days=4)
values('AAPL',1501236980,120.34);
-       Run select query again. Now it gives 3 rows. Two of which were
inserted
using insert command and one through code which was not coming in select
query earlier.




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/how-to-append-data-to-IGFS-so-that-data-gets-saved-to-Hive-partitioned-table-tp15725p16013.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi Mikhail,

Any luck with reproducing the issue?
Mikhail Getmanov Mikhail Getmanov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

csumi csumi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: how to append data to IGFS so that data gets saved to Hive partitioned table?

Hi Mikhail,

Your last reply is showing as blank. Any luck with reproduce?

Thanks!
Loading...