This project provides an example notebook of how to apply beautifulSoup and Selenium to scrap data online. With the obtained dataset, it also showed data analysis and visulations about products distribution, the average prices of each catagory, the most popular products and the relation of prices and review points.
Lululemon is a semi-dynamic page, which brings the major difficulty to grab data. With an automatic loading to a certain level, it will stop loading. Instead, it required a click to continue loading products. Another challenge is, the tags of product info and review is not presented closely, instead, it required scrolling down to get the review available in the screen, which is necessary for using selenium to grab data.
Based on the visulization, it can be concluded that the 62.2% products are womens, and the most products in womens are shirts and leggings with 51.4% propotion. The major revenue supposed comes from pants as its avg price in womens and mens are the most expensive. The avg of the most popular products is nearly 4.0/5.0, even for the products in higher price range.
To try on this project, play with the code in the notebook with data in csv file that scarped from Lululemon.