To find your outfit size for better shopping experience
Link to the web app (It's under maintenance for the time being)
This is a project is to mitigate size mistakes during online shopping, especially for ladies. It is a mini project, a part of class 'Intelligent Service Development with Big Data' delivered by Samsung Multicampus in Soeul, Korea - an affiliate of Samsung for education - and operated by Korean Ministry of Employment and Labor as a K-digital training scheme.
This README file is only about data analysis behind FITCHA app. To see how the webpages were set up, please explore and enjoy the service yourself following the link here.
From the charts above, it is shown that clothing is returned most in online shopping, and the main reason of returning is size and fit problem. Thus, I wanted to make a service to help with finding the right size so that it helps reduce the return rate for retailers and improves shopping experience for customers.
The aim of this project is to build an appropriate classification model to predict customers' size to deliver a bespoke service. Not only just suggest size, it will also direct customers to shopping pages so that the recommendation can be utilized for shopping.
The data used was from Kaggle. You can find it here.
The framework used to forge this web application is Django. In server-side, machine learning analysis and database process are done. On the other hand, in client-side, the webpage is built with the help of bootstrap for a responsive webpage.
Just an overall view of what libraries, programmes and programming languages were used for this project.
The data are distributed in a bell shape. It's the case of the weight as well, but there's occasionally some low-frequency values in weight.
<script type='text/javascript'> var divElement = document.getElementById('viz1632262842823'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1300px';vizElement.style.height='427px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1300px';vizElement.style.height='427px';} else { vizElement.style.width='100%';vizElement.style.height='977px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
It was to see what features would have much impact on the size prediction. The taller and the heavier, the bigger the size is. However, it is not obvious from the second chart that size is related to age. It needs to be investigated what is better to include age column or not for better prediction.
The data comprises columns of weight, age, height and size. Firstly, all the rows with NaN values were dropped (total 581), age dtype was changed from float
to int
, and height was rounded to one decimal point.
The second step is to remove outliers as it contains lots of unusual values, eg. age 0 or too low weight.
Secondly, the age range was narrowed to 20 to 60, considering the prospective users' age. And then, the outliers were removed through two steps, in terms of the whole data distribution (chart above) and each size distribution.
<script type='text/javascript'> var divElement = document.getElementById('viz1632263834673'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1024px';vizElement.style.height='795px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1024px';vizElement.style.height='795px';} else { vizElement.style.width='100%';vizElement.style.height='727px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>Now we get more refined data to go further.
I admit the data is not that reliable as the source is not known. If you come to this page after using the service and finding the prediction was not that great, this part will address the point.
The reason I stuck to the Kaggle data is that it's hard to get customers' data due to the private information issue. Plus, once I get reliable data, this whole system can be used. For this reason, I also include a section in the web application to gather the actual size of customers who wants to provide it.
As you can see from above, the ratio of each size is not even - XXL data takes too small share. Furthermore, there is no XS data which make it impossible to predict the size at all. I admit it will work badly for customers whose actual size is XS or XXL.
Another problem is that it is not mentioned whether the size is preferred one or exact one for each person. If it is preferred size, it disturbs prediction as all people pursue different fit. If you explore data, people with similar physique chose different sizes, so it could be a preferred size which brings the huge difference in outcome in case the service has to suggest an exact size.
Despite of these recognized issues, I further analysed data to build a whole frame of an web application.
The classification result is not that impressive as expected from the data. It is found that the better prediction is achieved when including age - the best model was Random Forest. So the service is designed to predicts size from age, height and weight information with Random Forest model.
The chart below demonstrates the difference between prediction and actual data well. It was drawn from Linear Discriminant Analysis which reduced the dimension of features into 2.
<script type='text/javascript'> var divElement = document.getElementById('viz1632270780328'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1520px';vizElement.style.height='687px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1520px';vizElement.style.height='687px';} else { vizElement.style.width='100%';vizElement.style.height='777px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>Databases were used 1) to build sign-in system and 2) to store shopping webpage links. Sign-in system encompasses features of sign-up, sign-in, sign-out, my info page, update info, delete account and success/error pages accordingly. The structure of tables is as below.
The sign-in DB includes the column of Actual_size, which is intended to gather the data from customers who are willing to provide it, with a hope to improve data from this measure. If a customer provides the actual size, all the services will be provided based on it - like leading to shopping pages with the actual size, not the recommended one.
All the data on sign-up are stored in this table and will be retrieved for authentication.
The web links DB includes size recommended and webpage links to which the service page should be directed after size recommendation. MF, YOOX, MT, NET all indicate each specific website where we can search for an item by size.
After all the trials and errors, the application was set up in the end. What I wanted to try this project was to use Tableau for visualization and connect Gmail with contact page. It's shame this data is pretty simple and doesn't have many options to try diverse visualization or make a decent dashboard.
Hopefully, I wish to try this web app with better data. FITCHA app is well built up, so it will serve its purpose once it's given right data.