Sentiment-Analysis-IMDB-Reviews Dataset

This project uses Bayes theorem based Naive Bayes classifier to classify the sentiment of IMDB reviews as positive or negative and compares the results with the results obtained by using the scikit-learn library. The Gradio library is used to create a web app for the model prediction.

General Information

Algorithm used

MultinomialNB, BernoulliNB from Sklearn

Dataset Information

IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms. For more dataset information, please go through the following link, http://ai.stanford.edu/~amaas/data/sentiment/

Steps involved

Data Load and Analysis
Data Wragling
Exploratory Data Analysis
Splitting the dataset
Count Vectorizer transformation
Modelling
Model Evaluation and Comparison

Result

We achieved the following results:

Multinomial Naive Bayes Classifier
- Test Accuracy : 0.8563
- Precision : 0.8697
- Recall : 0.8393
- F1 Score : 0.8543
Precision Recall Curve

ROC Curve

Bernoulli Naive Bayes Classifier
- Test Accuracy : 0.8474
- Precision : 0.8724
- Recall : 0.8152
- F1 Score : 0.8428
Precision Recall Curve

ROC Curve

Web App

Conclusion

As it can be observed the Multinomial Naive Bayes Classifier performed better than Bernoulli Naive Bayes Classifier but has higher precision than Multinomial Naive Bayes Classifier. So, we can conclude that Multinomial Naive Bayes Classifier is the best model for this dataset.

Technologies Used

Pandas - version 1.3.4
NumPy - version 1.20.3
MatplotLib - version 3.4.3
Seaborn - version 0.11.2
Scikit-Learn - version 0.24.2

Acknowledgements

Thanks Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts for providing the dataset to the world and to the community.

Contact

Created by [@sukhijapiyush] - feel free to contact me!

License

This project is open source and available without restrictions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sentiment-Analysis-IMDB-Reviews Dataset

Table of Contents

General Information

Algorithm used

Dataset Information

Steps involved

Result

Web App

Conclusion

Technologies Used

Acknowledgements

Contact

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sentiment-Analysis-IMDB-Reviews Dataset

Table of Contents

General Information

Algorithm used

Dataset Information

Steps involved

Result

Web App

Conclusion

Technologies Used

Acknowledgements

Contact

License