Using Twitter API for opinion mining and sentiment analysis to predict Brexit results.
Language version : Python 3.6
Operating System : Ubuntu 16.04, Windows 10 and MacOS High Sierra.
- pyspark : Apache Spark Library for Python (used for Machine Learning Algorithms) http://spark.apache.org/docs/2.2.0/api/python/index.html
- pandas : open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tool https://pandas.pydata.org/pandas-docs/stable/api.html
- numpy : efficient multi-dimensional container of generic data https://docs.scipy.org/doc/numpy-1.13.0/reference/
- polyglot : natural language pipeline https://github.com/aboSamoor/polyglot
With pip :
sudo pip install --upgrade <library>
With homebrew :
brew install <library>
With Anaconda :
conda install <library>
Shell command line:
sudo apt-get install python-numpy libicu-dev
-
Algorithm used for collecting the data :
https://github.com/Jefferson-Henrique/GetOldTweets-python -
Download the dataset :
- directly from the website :
https://www.kaggle.com/natmonkey/brexit-data-project-bdd/data
- with the shell command line :
kaggle datasets download -d natmonkey/brexit-data-project-bdd
-
Data cleansing (csv file format):
cat fichier | sort | uniq
The machine learning methods implemented are :
- Naive Bayes
- Logistic Regression
- Decision Tree
- Multi-Layer Perceptron
- Support Vector Machine
Execute the wanted method in the /learning repository.
Mira Ait Saada, Alexandra Benamar, Cristel Dos Santos Catarino, Joel Oscar Dossa