Getting data[parquet,json,...] from S3 buckets to apache ignite

classic Classic list List threaded Threaded
3 messages Options
viktor viktor
Reply | Threaded
Open this post in threaded view
|

Getting data[parquet,json,...] from S3 buckets to apache ignite

Hi,

I'm currently working on r&d project where we would like to retrieve data
files[parquet, json, ...] from S3 buckets and load the data into apache
ignite for machine learning purposes with tensorflow.

With the removal of IGFS in the next release I'm having troubles finding a
solution.
What would be an optimal way to facilitate the data for apache ignite?

I'm currently looking into using the 3rd party store features of ignite to
integrate with apache drill as it is able to query these s3 bucket data
files.
However from a glance it doesn't look like as a great solution since every
table structure has to be manually defined in the ignite configuration or
semi-automatic with the agent.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
dmagda dmagda
Reply | Threaded
Open this post in threaded view
|

Re: Getting data[parquet,json,...] from S3 buckets to apache ignite

Hi Viktor,

Could you please clarify a bit, do you need just to load data once/periodically or do you want Ignite to write-back to S3 on updates? If the loading is all you need then create a custom Java app/class that pulls data from S3 and streams into Ignite IgniteDataStreamer (fastest loading technique) [1]. 

If the write-back is needed then the 3rd party store (CacheStore) is the best way to go. GridGain Web Console (free tool) [2] goes with a model importing feature [3] that should be able to read the schema of Drill via JDBC and produce an Ignite configuration.


-
Denis


On Thu, Oct 17, 2019 at 6:45 AM viktor <[hidden email]> wrote:
Hi,

I'm currently working on r&d project where we would like to retrieve data
files[parquet, json, ...] from S3 buckets and load the data into apache
ignite for machine learning purposes with tensorflow.

With the removal of IGFS in the next release I'm having troubles finding a
solution.
What would be an optimal way to facilitate the data for apache ignite?

I'm currently looking into using the 3rd party store features of ignite to
integrate with apache drill as it is able to query these s3 bucket data
files.
However from a glance it doesn't look like as a great solution since every
table structure has to be manually defined in the ignite configuration or
semi-automatic with the agent.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
viktor viktor
Reply | Threaded
Open this post in threaded view
|

Re: Getting data[parquet,json,...] from S3 buckets to apache ignite

dmagda wrote

> Hi Viktor,
>
> Could you please clarify a bit, do you need just to load data
> once/periodically or do you want Ignite to write-back to S3 on updates? If
> the loading is all you need then create a custom Java app/class that pulls
> data from S3 and streams into Ignite IgniteDataStreamer (fastest loading
> technique) [1].
>
> If the write-back is needed then the 3rd party store (CacheStore) is the
> best way to go. GridGain Web Console (free tool) [2] goes with a model
> importing feature [3] that should be able to read the schema of Drill via
> JDBC and produce an Ignite configuration.
>
> [1] https://apacheignite.readme.io/docs/data-loading#ignitedatastreamer
> [2]
> https://www.gridgain.com/docs/web-console/latest/web-console-getting-started
> [3] https://apacheignite-tools.readme.io/docs/automatic-rdbms-integration
>
> -
> Denis
>
>
> On Thu, Oct 17, 2019 at 6:45 AM viktor &lt;

> viktor.baert@

> &gt; wrote:
>
>> Hi,
>>
>> I'm currently working on r&d project where we would like to retrieve data
>> files[parquet, json, ...] from S3 buckets and load the data into apache
>> ignite for machine learning purposes with tensorflow.
>>
>> With the removal of IGFS in the next release I'm having troubles finding
>> a
>> solution.
>> What would be an optimal way to facilitate the data for apache ignite?
>>
>> I'm currently looking into using the 3rd party store features of ignite
>> to
>> integrate with apache drill as it is able to query these s3 bucket data
>> files.
>> However from a glance it doesn't look like as a great solution since
>> every
>> table structure has to be manually defined in the ignite configuration or
>> semi-automatic with the agent.
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>

Thanks for the reply our use case goes something like this, data for new ML
projects are
placed in S3 buckets, BI & ML engineers are able to make use of ignite to
query this data with optimal performance.
Ideally there shouldn't be much hassle to get ignite in sync with drill
whenever there are structure/data alterations.

I'll play around with the cache store today see how far that gets me, thanks
for the advice.
More suggestions are always welcome.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/