Dataset from the Tanzanian Ministry of Water and problem: https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/
Paper file name is called: DMA_Project-1.pdf
- Started as a pair data science project at university. The other group member @pointonjoel implemented a logistic regression model on the same dataset whilst I implemented a random forest model.
The file containing the code for the Random Forest Classification is rfCLASSIFIER.ipynb
- Highest score achieved is 80.21% based on the classification rate (the percentage of rows where the predicted class y^ in the submission matches the actual class, y in the test set).