From c9b2f6d0d992a0265a7a6a0eaa850aae5137682c Mon Sep 17 00:00:00 2001
From: Chris Mulligan <clm2186@columbia.edu>
Date: Tue, 11 Dec 2012 00:13:24 -0500
Subject: [PATCH] Fixing readme for github

---
 README => README.md | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
 rename README => README.md (91%)

diff --git a/README b/README.md
similarity index 91%
rename from README
rename to README.md
index 47ef307..b9439a5 100644
--- a/README
+++ b/README.md
@@ -1,7 +1,11 @@
+# chmullig's Kaggle Essay Code
+
 For http://inclass.kaggle.com/c/columbia-university-introduction-to-data-science-fall-2012,
 as part of the class http://columbiadatascience.wordpress.com.
 
-Implements a few models using R and python. Requirements:
+Implements a few models using R and python.
+
+## Requirements:
 * Python (only tested with 2.7)
   * nltk
   * scikit-learn
@@ -15,7 +19,7 @@ Implements a few models using R and python. Requirements:
   * ggplot2 (soft requirement)
   * reshape (soft requirement)
 
-#Features Created/Used
+## Features Created/Used
 * number of characters
 * numer of sentances
 * number of words
@@ -39,14 +43,14 @@ Implements a few models using R and python. Requirements:
 * counts of the NER words (eg number of times they used @MONEY)
 * TF-IDF word and bigram frequencies that were then PCA'd down to 50 cells.
 
-#Models Used
+## Models Used
 * First model was OLS linear regression using a subset of the variables. I trained 5 models, one per essay set, with identical formulas. Shockingly good.
 * Second model was Random Forest regression, again 5 models. Using more variables.
 * Third model was GBM, same formula as random forest, using 5 models.
 
 Also tried doing rfm and gbm with one model using set as a predictor, but it didn't seem to perform as well.
 
-#Basic workflow in buildModel.sh.
+## Basic workflow in buildModel.sh.
 
 1. Run basic_tags.py on test.tsv and train.tsv. This creates almost all the
 features/tags/variables we need to use