Preprocessing of data to use in Naive-Bayes

classic Classic list List threaded Threaded
6 messages Options
Priya Yadav Priya Yadav
Reply | Threaded
Open this post in threaded view
|

Preprocessing of data to use in Naive-Bayes

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.
ibelyakov ibelyakov
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing of data to use in Naive-Bayes

Alexey,

Do you have any thoughts regarding that?

Igor

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav <[hidden email]> wrote:

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.
zaleslaw zaleslaw
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing of data to use in Naive-Bayes

Very interesting case!

We have 3 different implementations for NaiveBayes algorithm

Data should be prepared as Vectors in Ignite Cache to start training.

Dear Priya Yadav, could you please provide code or pseudocode with how you populate your Ignite cache with sentences data, a few sentences will be useful too.
Also will be useful, how could you solve this task in scikit-learn, I'll try to help with the preprocessing code for this case.

Sincerely yours, 
       Alexey

пт, 4 сент. 2020 г. в 19:40, Igor Belyakov <[hidden email]>:
Alexey,

Do you have any thoughts regarding that?

Igor

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav <[hidden email]> wrote:

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.
Priya Yadav Priya Yadav
Reply | Threaded
Open this post in threaded view
|

RE: Preprocessing of data to use in Naive-Bayes

Hi Alexey,

 

I am stuck on the preprocessing step itself as I am not able to find any api which takes the sentence , reads the tokens and calculate their count whereas scikit-learn provides the apis out of the box.

 

I am attaching the sample data that I need to categorize on the basis of user experience. Please find the python code snippet below:

 

from sklearn.naive_bayes import MultinomialNB

from sklearn.feature_extraction.text import CountVectorizer

classifier = MultinomialNB();

vect=CountVectorizer();

counts=vect.fit_transform(["pizza was soft, very nice"," good ambience and excellent service","tool a long time, service needs improvement","toppings were very less, but bread was excellent"]) ;

counts=vect.fit_transform(comment);

targets = ['Good Experience','Good Experience','Bad Experience','Good Experience'];

classifier.fit(counts,targets);

predictComments = [“soft bread, nice toppings”]

predictData=vect.transform(predictComments);

predictions = classifier.predict(predictData)

print(predictions);

 

 

Thanks,

Priya

 

 

From: Alexey Zinoviev <[hidden email]>
Sent: Sunday, September 6, 2020 6:41 PM
To: Igor Belyakov <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Preprocessing of data to use in Naive-Bayes

 

Very interesting case!

 

We have 3 different implementations for NaiveBayes algorithm

 

Data should be prepared as Vectors in Ignite Cache to start training.

 

Dear Priya Yadav, could you please provide code or pseudocode with how you populate your Ignite cache with sentences data, a few sentences will be useful too.

Also will be useful, how could you solve this task in scikit-learn, I'll try to help with the preprocessing code for this case.

 

Sincerely yours, 

       Alexey

 

пт, 4 сент. 2020 г. в 19:40, Igor Belyakov <[hidden email]>:

Alexey,

 

Do you have any thoughts regarding that?

 

Igor

 

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav <[hidden email]> wrote:

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

FeedbackData (950 bytes) Download Attachment
Priya Yadav Priya Yadav
Reply | Threaded
Open this post in threaded view
|

RE: Preprocessing of data to use in Naive-Bayes

Hi,

 

If there in any update, Please let me know.

 

Thanks

 

From: Priya Yadav
Sent: Sunday, September 6, 2020 8:14 PM
To: Alexey Zinoviev <[hidden email]>
Cc: [hidden email]
Subject: RE: Preprocessing of data to use in Naive-Bayes

 

Hi Alexey,

 

I am stuck on the preprocessing step itself as I am not able to find any api which takes the sentence , reads the tokens and calculate their count whereas scikit-learn provides the apis out of the box.

 

I am attaching the sample data that I need to categorize on the basis of user experience. Please find the python code snippet below:

 

from sklearn.naive_bayes import MultinomialNB

from sklearn.feature_extraction.text import CountVectorizer

classifier = MultinomialNB();

vect=CountVectorizer();

counts=vect.fit_transform(["pizza was soft, very nice"," good ambience and excellent service","tool a long time, service needs improvement","toppings were very less, but bread was excellent"]) ;

counts=vect.fit_transform(comment);

targets = ['Good Experience','Good Experience','Bad Experience','Good Experience'];

classifier.fit(counts,targets);

predictComments = [“soft bread, nice toppings”]

predictData=vect.transform(predictComments);

predictions = classifier.predict(predictData)

print(predictions);

 

 

Thanks,

Priya

 

 

From: Alexey Zinoviev <[hidden email]>
Sent: Sunday, September 6, 2020 6:41 PM
To: Igor Belyakov <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Preprocessing of data to use in Naive-Bayes

 

Very interesting case!

 

We have 3 different implementations for NaiveBayes algorithm

 

Data should be prepared as Vectors in Ignite Cache to start training.

 

Dear Priya Yadav, could you please provide code or pseudocode with how you populate your Ignite cache with sentences data, a few sentences will be useful too.

Also will be useful, how could you solve this task in scikit-learn, I'll try to help with the preprocessing code for this case.

 

Sincerely yours, 

       Alexey

 

пт, 4 сент. 2020 г. в 19:40, Igor Belyakov <[hidden email]>:

Alexey,

 

Do you have any thoughts regarding that?

 

Igor

 

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav <[hidden email]> wrote:

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.
zaleslaw zaleslaw
Reply | Threaded
Open this post in threaded view
|

Re: Preprocessing of data to use in Naive-Bayes

Sorry, no update, I'm on vacation till October. But later I return to your question, it's not trivial on preprocessing stage.



вт, 15 сент. 2020 г., 9:21 Priya Yadav <[hidden email]>:

Hi,

 

If there in any update, Please let me know.

 

Thanks

 

From: Priya Yadav
Sent: Sunday, September 6, 2020 8:14 PM
To: Alexey Zinoviev <[hidden email]>
Cc: [hidden email]
Subject: RE: Preprocessing of data to use in Naive-Bayes

 

Hi Alexey,

 

I am stuck on the preprocessing step itself as I am not able to find any api which takes the sentence , reads the tokens and calculate their count whereas scikit-learn provides the apis out of the box.

 

I am attaching the sample data that I need to categorize on the basis of user experience. Please find the python code snippet below:

 

from sklearn.naive_bayes import MultinomialNB

from sklearn.feature_extraction.text import CountVectorizer

classifier = MultinomialNB();

vect=CountVectorizer();

counts=vect.fit_transform(["pizza was soft, very nice"," good ambience and excellent service","tool a long time, service needs improvement","toppings were very less, but bread was excellent"]) ;

counts=vect.fit_transform(comment);

targets = ['Good Experience','Good Experience','Bad Experience','Good Experience'];

classifier.fit(counts,targets);

predictComments = [“soft bread, nice toppings”]

predictData=vect.transform(predictComments);

predictions = classifier.predict(predictData)

print(predictions);

 

 

Thanks,

Priya

 

 

From: Alexey Zinoviev <[hidden email]>
Sent: Sunday, September 6, 2020 6:41 PM
To: Igor Belyakov <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Preprocessing of data to use in Naive-Bayes

 

Very interesting case!

 

We have 3 different implementations for NaiveBayes algorithm

 

Data should be prepared as Vectors in Ignite Cache to start training.

 

Dear Priya Yadav, could you please provide code or pseudocode with how you populate your Ignite cache with sentences data, a few sentences will be useful too.

Also will be useful, how could you solve this task in scikit-learn, I'll try to help with the preprocessing code for this case.

 

Sincerely yours, 

       Alexey

 

пт, 4 сент. 2020 г. в 19:40, Igor Belyakov <[hidden email]>:

Alexey,

 

Do you have any thoughts regarding that?

 

Igor

 

On Fri, Sep 4, 2020 at 10:03 AM Priya Yadav <[hidden email]> wrote:

Hi,

 

Problem Statement: I have a feedback sentences having words separated by spaces like normal English sentences. Using these sentences I need to classify into categories based on some keywords. How should I preprocess my data in order to use it in Naive-Bayes.

Any leads would be helpful.

Thanks in advance.

 

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.

This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.