Skip to content

Commit

Permalink
Fixing readme for github
Browse files Browse the repository at this point in the history
  • Loading branch information
chmullig committed Dec 11, 2012
1 parent c576b93 commit c9b2f6d
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions README → README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# chmullig's Kaggle Essay Code

For http://inclass.kaggle.com/c/columbia-university-introduction-to-data-science-fall-2012,
as part of the class http://columbiadatascience.wordpress.com.

Implements a few models using R and python. Requirements:
Implements a few models using R and python.

## Requirements:
* Python (only tested with 2.7)
* nltk
* scikit-learn
Expand All @@ -15,7 +19,7 @@ Implements a few models using R and python. Requirements:
* ggplot2 (soft requirement)
* reshape (soft requirement)

#Features Created/Used
## Features Created/Used
* number of characters
* numer of sentances
* number of words
Expand All @@ -39,14 +43,14 @@ Implements a few models using R and python. Requirements:
* counts of the NER words (eg number of times they used @MONEY)
* TF-IDF word and bigram frequencies that were then PCA'd down to 50 cells.

#Models Used
## Models Used
* First model was OLS linear regression using a subset of the variables. I trained 5 models, one per essay set, with identical formulas. Shockingly good.
* Second model was Random Forest regression, again 5 models. Using more variables.
* Third model was GBM, same formula as random forest, using 5 models.

Also tried doing rfm and gbm with one model using set as a predictor, but it didn't seem to perform as well.

#Basic workflow in buildModel.sh.
## Basic workflow in buildModel.sh.

1. Run basic_tags.py on test.tsv and train.tsv. This creates almost all the
features/tags/variables we need to use
Expand Down

0 comments on commit c9b2f6d

Please sign in to comment.