Running Spark SQL on Spark Thrift Server with Ignite

classic Classic list List threaded Threaded
6 messages Options
ravi ravi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Running Spark SQL on Spark Thrift Server with Ignite

This post has NOT been accepted by the mailing list yet.
Hi,
  We wanted to run the spark sql(version 2.2.x) by connecting to Spark Thrift Server via JDBC. Will the spark sql run on top of the spark and ignite combination?. Or do we need to change from JDBC to scala/java RDD based API to run spark sql?
  Can you share some examples on this topic?

Regards
Ravi.P
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running Spark SQL on Spark Thrift Server with Ignite

Hi Ravi,

I don't think it currently will, because this will require integration with data frames. We have it plans, but it is not implemented yet. I think you should use IgniteRDD or Ignite APIs directly.

Can you describe business use case you're trying to implement?

-Val
ravi ravi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running Spark SQL on Spark Thrift Server with Ignite

This post has NOT been accepted by the mailing list yet.
Hi Val,
  Thanks for the reply. Below is our usecase
1) We have the aggregated data stored in Hive(HiveServer2 -  HDFS - ORC File Format with snappy compressed)
2) For the online reporting, we use spark sql(spark thrift server - jdbc driver). With this we are not getting less than a minute response for certain queries.

So with ignite, the idea is, the underlying hive table will be cached with ignite file system and run spark sql query as before using jdbc(no scala/java api with Spark RDD/Ignite RDD/Data Frames,..) as described in slide 12 option b as described in the below link.

https://www.slideshare.net/imcsummit/accelerating-the-hadoop-data-stack-with-apache-ignite-spark-and-bigtop

Will the above approach work?. Anything we need to specify during startup of the spark thrift server so that spark can read it from IGFS instead of HDFS file system?.

Please reply.

Regards
Ravi.P
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running Spark SQL on Spark Thrift Server with Ignite

Ravi,

If you need to speed up SQL, you should make sure Ignite uses indexes to execute queries. I think you can do the following:
- Create Hive RDD and map it to RDD of key value pairs.
- Create new IgniteRDD on top of a cache and use IgniteRDD#savePairs method to load data from Hive to Ignite.
- IgniteRDD#sql method to execute queries.

Note that SQL needs to be configured in Ignite (i.e. you need to specify queryable fields, indexes, etc.). More information here: https://apacheignite.readme.io/docs/sql-queries

-Val
ravi ravi
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running Spark SQL on Spark Thrift Server with Ignite

Hi Val,
  Thanks for the reply. Can you share some example or sample code snippet for the steps you have explained?. The link you have shared doesn't explain mapping Hive/Spar RDD to Ignite RDD context?

Regards
Ravi.P
vkulichenko vkulichenko
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running Spark SQL on Spark Thrift Server with Ignite

Loading...