This project uses Bayes theorem based Naive Bayes classifier to classify the sentiment of IMDB reviews as positive or negative and compares the results with the results obtained by using the scikit-learn library. The Gradio library is used to create a web app for the model prediction.
MultinomialNB, BernoulliNB from Sklearn
IMDB dataset having 50K movie reviews for natural language processing or Text analytics. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training and 25,000 for testing. So, predict the number of positive and negative reviews using either classification or deep learning algorithms. For more dataset information, please go through the following link, http://ai.stanford.edu/~amaas/data/sentiment/
- Data Load and Analysis
- Data Wragling
- Exploratory Data Analysis
- Splitting the dataset
- Count Vectorizer transformation
- Modelling
- Model Evaluation and Comparison
We achieved the following results:
- Multinomial Naive Bayes Classifier
- Test Accuracy : 0.8563
- Precision : 0.8697
- Recall : 0.8393
- F1 Score : 0.8543
- Precision Recall Curve
- ROC Curve
- Bernoulli Naive Bayes Classifier
- Test Accuracy : 0.8474
- Precision : 0.8724
- Recall : 0.8152
- F1 Score : 0.8428
- Precision Recall Curve
- ROC Curve
As it can be observed the Multinomial Naive Bayes Classifier performed better than Bernoulli Naive Bayes Classifier but has higher precision than Multinomial Naive Bayes Classifier. So, we can conclude that Multinomial Naive Bayes Classifier is the best model for this dataset.
- Pandas - version 1.3.4
- NumPy - version 1.20.3
- MatplotLib - version 3.4.3
- Seaborn - version 0.11.2
- Scikit-Learn - version 0.24.2
Thanks Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts for providing the dataset to the world and to the community.
Created by [@sukhijapiyush] - feel free to contact me!
This project is open source and available without restrictions.