How does Ignite Lucene based text indexing & querying work if a field has comma separated values

classic Classic list List threaded Threaded
13 messages Options
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu
Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov

Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:
Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov


Regards,
Andrew.
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov



Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <[hidden email]> написал:
Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov



Regards,
Andrew.
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Okay..alrite..thanks Andrey.

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <[hidden email]> wrote:
Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <[hidden email]> написал:

Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov




mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

The objects in my ignite cache have a List<String> as member...so i have to change it to a comma separated String if i have to able to perform text searches..correct?

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:49 PM, Muthu <[hidden email]> wrote:
Okay..alrite..thanks Andrey.

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <[hidden email]> wrote:
Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <[hidden email]> написал:

Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov





Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Not quite, comma separated String will be tokenized by Lucene StandartTokenizer,
according to Unicode standard [1].

I'd recommend to use ", " (comma with a space character) as separator.

On Tue, Jun 20, 2017 at 11:55 PM, Muthu <[hidden email]> wrote:
The objects in my ignite cache have a List<String> as member...so i have to change it to a comma separated String if i have to able to perform text searches..correct?

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:49 PM, Muthu <[hidden email]> wrote:
Okay..alrite..thanks Andrey.

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <[hidden email]> wrote:
Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <[hidden email]> написал:

Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov








--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Okay...thanks for the input..just to understand better the comma is needed with space to make sure it tokenizes the entire words in between as tokens?

Regards,
Muthu

On Wed, Jun 21, 2017 at 8:58 AM, Andrey Mashenkov <[hidden email]> wrote:
Not quite, comma separated String will be tokenized by Lucene StandartTokenizer,
according to Unicode standard [1].

I'd recommend to use ", " (comma with a space character) as separator.

On Tue, Jun 20, 2017 at 11:55 PM, Muthu <[hidden email]> wrote:
The objects in my ignite cache have a List<String> as member...so i have to change it to a comma separated String if i have to able to perform text searches..correct?

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:49 PM, Muthu <[hidden email]> wrote:
Okay..alrite..thanks Andrey.

Regards,
Muthu

On Tue, Jun 20, 2017 at 1:30 PM, Andrey Mashenkov <[hidden email]> wrote:
Hi,
I mean object fields which are type of String.

20 июня 2017 г. 23:04 пользователь "Muthu" <[hidden email]> написал:

Okay...btw what is an object String?

Regards,
Muthu

On Sat, Jun 17, 2017 at 1:53 AM, Andrey Mashenkov <[hidden email]> wrote:
No, only Strings and object String fields are supported.

16 июня 2017 г. 21:27 пользователь "Muthu" <[hidden email]> написал:

Great!...thanks for the info...how about a list of strings (List<String>)...will it also be handled (an array value in the key-value pair)?

Regards,
Muthu

On Fri, Jun 16, 2017 at 2:03 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Yes, field value will be tokenized with Lucene StandartAnalyzer [1].


On Fri, Jun 16, 2017 at 2:45 AM, Muthu <[hidden email]> wrote:
Folks,

If a field annotated with @QueryTextField contains comma separated values would this be tokenized before being indexed by Lucene? How does it work?

Regards,
Muthu



--
Best regards,
Andrey V. Mashenkov








--
Best regards,
Andrey V. Mashenkov

Manu Manu
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Hi,

If you need advanced lucene search you could modify GridLuceneIndex to parse KeyCacheObject and CacheObject on store method to create additional IndexableFields applying transformation to non string values.

We just integrate cassandra-lucene-index concept from stratio implementation (https://github.com/Stratio/cassandra-lucene-index, documentation here https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.13/doc/documentation.rst) on GridLuceneIndex to support advanced lucene search like spatial, bitemporal, maps, list... based on mappers modifying @QueryTextField (with allow add mapper definition, i.e. how you want to index fields on lucene) and modifying annotation processor on CacheConfiguration. This allow use advanced lucene search on standard ignite SqlQueries not only on TextQuery, that has a very limited functionality. GridLuceneIndex is now a GridH2Index!!,  so we could make complex joins with other entities using complex lucene filters. Functionality and performance results are awesome!!

Also we have made some improvements to indexing module... like auto-register NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions change.

When we have some free time we will share the code for community!

Bye!
Andrew Mashenkov Andrew Mashenkov
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Hi Muthu,

Using comma as separator is bad idea in common case. As you see from Unicode standard, 3,456.789   wouldn't break into 2 words.
It would be better to use space character or comma (or other separator you want) with space.

On Fri, Jun 23, 2017 at 3:22 PM, Manu <[hidden email]> wrote:
Hi,

If you need advanced lucene search you could modify GridLuceneIndex to parse
KeyCacheObject and CacheObject on store method to create additional
IndexableFields applying transformation to non string values.

We just integrate cassandra-lucene-index concept from stratio implementation
(https://github.com/Stratio/cassandra-lucene-index, documentation here
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.13/doc/documentation.rst)
on GridLuceneIndex to support advanced lucene search like spatial,
bitemporal, maps, list... based on mappers modifying @QueryTextField (with
allow add mapper definition, i.e. how you want to index fields on lucene)
and modifying annotation processor on CacheConfiguration. This allow use
advanced lucene search on standard ignite SqlQueries not only on TextQuery,
that has a very limited functionality. GridLuceneIndex is now a
GridH2Index!!,  so we could make complex joins with other entities using
complex lucene filters. Functionality and performance results are awesome!!

Also we have made some improvements to indexing module... like auto-register
NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions
change.

When we have some free time we will share the code for community!

Bye!




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-does-Ignite-Lucene-based-text-indexing-querying-work-if-a-field-has-comma-separated-values-tp13830p14064.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Best regards,
Andrey V. Mashenkov
Regards,
Andrew.
mlekshma mlekshma
Reply | Threaded
Open this post in threaded view
|

Re: How does Ignite Lucene based text indexing & querying work if a field has comma separated values

Thanks Andrey..i got that after looking through your earlier reply. What i was curious is the reason for comma with space instead of just space character. The reason as i understand is to tokenize the entire words in between as tokens.

@Manu, 

Thanks for the additional info. Let me look at it.

Regards,
Muthu

On Fri, Jun 23, 2017 at 8:58 AM, Andrey Mashenkov <[hidden email]> wrote:
Hi Muthu,

Using comma as separator is bad idea in common case. As you see from Unicode standard, 3,456.789   wouldn't break into 2 words.
It would be better to use space character or comma (or other separator you want) with space.

On Fri, Jun 23, 2017 at 3:22 PM, Manu <[hidden email]> wrote:
Hi,

If you need advanced lucene search you could modify GridLuceneIndex to parse
KeyCacheObject and CacheObject on store method to create additional
IndexableFields applying transformation to non string values.

We just integrate cassandra-lucene-index concept from stratio implementation
(https://github.com/Stratio/cassandra-lucene-index, documentation here
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.13/doc/documentation.rst)
on GridLuceneIndex to support advanced lucene search like spatial,
bitemporal, maps, list... based on mappers modifying @QueryTextField (with
allow add mapper definition, i.e. how you want to index fields on lucene)
and modifying annotation processor on CacheConfiguration. This allow use
advanced lucene search on standard ignite SqlQueries not only on TextQuery,
that has a very limited functionality. GridLuceneIndex is now a
GridH2Index!!,  so we could make complex joins with other entities using
complex lucene filters. Functionality and performance results are awesome!!

Also we have made some improvements to indexing module... like auto-register
NEW sqlfields, auto rebuild and create NEW indexes...if entity definitions
change.

When we have some free time we will share the code for community!

Bye!




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/How-does-Ignite-Lucene-based-text-indexing-querying-work-if-a-field-has-comma-separated-values-tp13830p14064.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.



--
Best regards,
Andrey V. Mashenkov