Ignite Spark Example Question

classic Classic list List threaded Threaded
8 messages Options
sri hari kali charan Tummala sri hari kali charan Tummala
Reply | Threaded
Open this post in threaded view
|

Ignite Spark Example Question

Hi All, 

I am new to Apache Ignite community I am testing out ignite for knowledge sake in the below example the code reads a json file and writes to ingite in-memory table is it overwriting can I do append mode I did try spark append mode .mode(org.apache.spark.sql.SaveMode.Append)
without stopping one ignite application inginte.stop which keeps the cache alive and tried to insert data to cache twice but I am still getting 4 records I was expecting 8 records , what would be the reason ?

aealexsandrov aealexsandrov
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>
sri hari kali charan Tummala sri hari kali charan Tummala
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala

sri hari kali charan Tummala sri hari kali charan Tummala
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

one last question, is there an S3 connector for Ignite which can load s3 objects in realtime to ignite cache and data updates directly back to S3? I can use spark as one alternative but is there another approach of doing?

Let's say I want to build in-memory near real-time data lake files which get loaded to S3 automatically gets loaded to Ignite (I can use spark structured streaming jobs but is there a direct approach ?)

On Fri, Aug 9, 2019 at 4:34 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala

stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

I don’t think there’s anything “out of the box,” but you could write a custom CacheStore to do that.

See here for more details: https://apacheignite.readme.io/docs/3rd-party-store#section-custom-cachestore

Regards,
Stephen

On 9 Aug 2019, at 21:50, sri hari kali charan Tummala <[hidden email]> wrote:

one last question, is there an S3 connector for Ignite which can load s3 objects in realtime to ignite cache and data updates directly back to S3? I can use spark as one alternative but is there another approach of doing?

Let's say I want to build in-memory near real-time data lake files which get loaded to S3 automatically gets loaded to Ignite (I can use spark structured streaming jobs but is there a direct approach ?)

On Fri, Aug 9, 2019 at 4:34 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala



sri hari kali charan Tummala sri hari kali charan Tummala
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

Thanks Stephen , last question so I have to keep looping to find new data files in S3 and write to cache real time or is it already built in ?

On Mon, Aug 12, 2019 at 5:43 AM Stephen Darlington <[hidden email]> wrote:
I don’t think there’s anything “out of the box,” but you could write a custom CacheStore to do that.

See here for more details: https://apacheignite.readme.io/docs/3rd-party-store#section-custom-cachestore

Regards,
Stephen

On 9 Aug 2019, at 21:50, sri hari kali charan Tummala <[hidden email]> wrote:

one last question, is there an S3 connector for Ignite which can load s3 objects in realtime to ignite cache and data updates directly back to S3? I can use spark as one alternative but is there another approach of doing?

Let's say I want to build in-memory near real-time data lake files which get loaded to S3 automatically gets loaded to Ignite (I can use spark structured streaming jobs but is there a direct approach ?)

On Fri, Aug 9, 2019 at 4:34 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala





--
Thanks & Regards
Sri Tummala

stephendarlington stephendarlington
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

As I say, there’s nothing "out of the box” — you’d have to write it yourself. Exactly how you architect it would depend on what you’re trying to do.

Regards,
Stephen

On 12 Aug 2019, at 19:59, sri hari kali charan Tummala <[hidden email]> wrote:

Thanks Stephen , last question so I have to keep looping to find new data files in S3 and write to cache real time or is it already built in ?

On Mon, Aug 12, 2019 at 5:43 AM Stephen Darlington <[hidden email]> wrote:
I don’t think there’s anything “out of the box,” but you could write a custom CacheStore to do that.

See here for more details: https://apacheignite.readme.io/docs/3rd-party-store#section-custom-cachestore

Regards,
Stephen

On 9 Aug 2019, at 21:50, sri hari kali charan Tummala <[hidden email]> wrote:

one last question, is there an S3 connector for Ignite which can load s3 objects in realtime to ignite cache and data updates directly back to S3? I can use spark as one alternative but is there another approach of doing?

Let's say I want to build in-memory near real-time data lake files which get loaded to S3 automatically gets loaded to Ignite (I can use spark structured streaming jobs but is there a direct approach ?)

On Fri, Aug 9, 2019 at 4:34 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala





--
Thanks & Regards
Sri Tummala



sri hari kali charan Tummala sri hari kali charan Tummala
Reply | Threaded
Open this post in threaded view
|

Re: Ignite Spark Example Question

can I run ignite and spark on cluster mode ? in the github example what I see is just local mode, if I use grid cloud ignite cluster how would I install spark distributed mode is it comes with the ignite cluster ?


On Tue, Aug 13, 2019 at 6:53 AM Stephen Darlington <[hidden email]> wrote:
As I say, there’s nothing "out of the box” — you’d have to write it yourself. Exactly how you architect it would depend on what you’re trying to do.

Regards,
Stephen

On 12 Aug 2019, at 19:59, sri hari kali charan Tummala <[hidden email]> wrote:

Thanks Stephen , last question so I have to keep looping to find new data files in S3 and write to cache real time or is it already built in ?

On Mon, Aug 12, 2019 at 5:43 AM Stephen Darlington <[hidden email]> wrote:
I don’t think there’s anything “out of the box,” but you could write a custom CacheStore to do that.

See here for more details: https://apacheignite.readme.io/docs/3rd-party-store#section-custom-cachestore

Regards,
Stephen

On 9 Aug 2019, at 21:50, sri hari kali charan Tummala <[hidden email]> wrote:

one last question, is there an S3 connector for Ignite which can load s3 objects in realtime to ignite cache and data updates directly back to S3? I can use spark as one alternative but is there another approach of doing?

Let's say I want to build in-memory near real-time data lake files which get loaded to S3 automatically gets loaded to Ignite (I can use spark structured streaming jobs but is there a direct approach ?)

On Fri, Aug 9, 2019 at 4:34 PM sri hari kali charan Tummala <[hidden email]> wrote:
Thank you, I got it now I have to change the id values to see the same data as extra results (this is just for testing) amazing.

val df = spark.sql(SELECT monolitically_id() as id, name, department FROM json_person)

df.write(append)... to ignite

Thanks
Sri


On Fri, Aug 9, 2019 at 6:08 AM Andrei Aleksandrov <[hidden email]> wrote:
Hi,

Spark contains several SaveModes that will be applied if the table that you are going to use exists:

* Overwrite - with this option you will try to re-create existed table or create new and load data there using IgniteDataStreamer implementation
* Append - with this option you will not try to re-create existed table or create new table and just load the data to existed table

* ErrorIfExists - with this option you will get the exception if the table that you are going to use exists

* Ignore - with this option nothing will be done in case if the table that you are going to use exists. If table already exists, the save operation is expected to not save the contents of the DataFrame and to not change the existing data.

According to your question:

You should use the Append SaveMode for your spark integration in case if you are going to store new data to cache and save the previous stored data.

Note, that in case if you will store the data for the same Primary Keys then with data will be overwritten in Ignite table. For example:

1)Add person {id=1, name=Vlad, age=19} where id is the primary key
2)Add person {id=1, name=Nikita, age=26} where id is the primary key

In Ignite you will see only {id=1, name=Nikita, age=26}.

Also here you can see the code sample for you and other information about SaveModes:

https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-saving-dataframes

BR,
Andrei

On 2019/08/08 17:33:39, sri hari kali charan Tummala [hidden email] wrote:
> Hi All,>
>
> I am new to Apache Ignite community I am testing out ignite for knowledge>
> sake in the below example the code reads a json file and writes to ingite>
> in-memory table is it overwriting can I do append mode I did try spark>
> append mode .mode(org.apache.spark.sql.SaveMode.Append)>
> without stopping one ignite application inginte.stop which keeps the cache>
> alive and tried to insert data to cache twice but I am still getting 4>
> records I was expecting 8 records , what would be the reason ?>
>
> https://github.com/apache/ignite/blob/1f8cf042f67f523e23f795571f609a9c81726258/examples/src/main/spark/org/apache/ignite/examples/spark/IgniteDataFrameWriteExample.scala#L89>
>
> -- >
> Thanks & Regards>
> Sri Tummala>
>


--
Thanks & Regards
Sri Tummala



--
Thanks & Regards
Sri Tummala





--
Thanks & Regards
Sri Tummala





--
Thanks & Regards
Sri Tummala