My attempt for the Kaggle AES project (https://www.kaggle.com/c/asap-aes) The linear regression model uses a Word2Vec model and custom generated heuristic features to obtain a mean-quadratic-weighted-kappa score of 0.9359.
- CustomFeatureGeneration.ipynb - Generating custom features for the data set.
- Data_Exploration.ipynb - Exploring the data set and free form visualization.
- Linear Regression Model.ipynb - The model building and learning takes place here.
./utils/helperfunctions.py
./utils/requirements.py
- Scikit-learn 0.18.1: pip install --user --upgrade sklearn
- Gensim 2.1.0: pip install --user --upgrade gensim
- Textmining 1.0: pip install --user --upgrade textmining
- Grammar Check 1.3.1 : 1. pip install --upgrade 3to2 2. pip install --user --upgrade grammar-check
- Matplotlib 2.0.0: pip install --user --upgrade matplotlib
- NLTK 3.2.2: pip install --user --upgrade nltk
NOTE You need to download all the NLTK's data first inorder to use its packages, to do so type following commands in python (referece: http://www.nltk.org/data.html)
import nltk
nltk.download()
- You also need Java installed on your machine to run NLTK. Java installation steps for Ubuntu 16.04 : (http://www.wikihow.com/Install-Oracle-Java-on-Ubuntu-Linux)
7.** Dataset : domain123.csv **
- ** Images and saved models :
./model_and_visualization/
** - ** References :
./References/ **
<<<<<<< HEAD - ** Essay set description :
./Essay_Set_Descriptions
** ======= - ** Essay set description :
./Essay_Set_Descriptions
**
067f935d810b18fc14c06031678015e8ae500251