- This project contains two notebook, one for webscarping customer's review and one for developing clustering and classification model for predicting missing values in courses.
- To be noted, data.csv is the overall raw dataset, and pred_data is the visualization dataset with ml prediction.
- The first challenge is to scrap review starts with picture format in pop-up windows, which require strong coding skills and understanding of html.
- Another challenge is to clean the messy courses and use KMeans to cluster 180 course types to 9 distinct segments.
- The last challenge is to address the title and review as featue and develop a proper ml classification model.
- With evalution of curriculunm, instructor, job assistance, overall rating, the top 2 insititutes are Le Wagon and WeCloudData, while Le Wagon mainly focus on Full Stack and Web Development, and WeCouldData focused on Data Science.
- For next, with the predicted dataset, it would be good to understand the pros and cons of courses in every institutes by analyzing words frequency in reviewer's comments.