Skip to content

Latest commit

 

History

History
160 lines (73 loc) · 12.8 KB

README.md

File metadata and controls

160 lines (73 loc) · 12.8 KB

FITCHA project

To find your outfit size for better shopping experience

Link to the web app (It's under maintenance for the time being)


This is a project is to mitigate size mistakes during online shopping, especially for ladies. It is a mini project, a part of class 'Intelligent Service Development with Big Data' delivered by Samsung Multicampus in Soeul, Korea - an affiliate of Samsung for education - and operated by Korean Ministry of Employment and Labor as a K-digital training scheme.

This README file is only about data analysis behind FITCHA app. To see how the webpages were set up, please explore and enjoy the service yourself following the link here.


image-20210921235715555

image-20210921235723897

From the charts above, it is shown that clothing is returned most in online shopping, and the main reason of returning is size and fit problem. Thus, I wanted to make a service to help with finding the right size so that it helps reduce the return rate for retailers and improves shopping experience for customers.

​ The aim of this project is to build an appropriate classification model to predict customers' size to deliver a bespoke service. Not only just suggest size, it will also direct customers to shopping pages so that the recommendation can be utilized for shopping.

The data used was from Kaggle. You can find it here.



1. Configuration of Web Application by Django

image-20210921224439494


The framework used to forge this web application is Django. In server-side, machine learning analysis and database process are done. On the other hand, in client-side, the webpage is built with the help of bootstrap for a responsive webpage.



2. Libraries and Programmes

image-20210921225844869

Just an overall view of what libraries, programmes and programming languages were used for this project.



3. Explore the Data

image-20210921232009105

The data are distributed in a bell shape. It's the case of the weight as well, but there's occasionally some low-frequency values in weight.


대시보드 2
<script type='text/javascript'> var divElement = document.getElementById('viz1632262842823'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1300px';vizElement.style.height='427px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1300px';vizElement.style.height='427px';} else { vizElement.style.width='100%';vizElement.style.height='977px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

It was to see what features would have much impact on the size prediction. The taller and the heavier, the bigger the size is. However, it is not obvious from the second chart that size is related to age. It needs to be investigated what is better to include age column or not for better prediction.



4. Preprocessing

1) Simple preprocessing

The data comprises columns of weight, age, height and size. Firstly, all the rows with NaN values were dropped (total 581), age dtype was changed from float to int, and height was rounded to one decimal point.

image-20210921230905763image-20210921230928384image-20210921230941018


2) Remove outliers

The second step is to remove outliers as it contains lots of unusual values, eg. age 0 or too low weight.

image-20210921232830922

Secondly, the age range was narrowed to 20 to 60, considering the prospective users' age. And then, the outliers were removed through two steps, in terms of the whole data distribution (chart above) and each size distribution.

대시보드 1
<script type='text/javascript'> var divElement = document.getElementById('viz1632263834673'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1024px';vizElement.style.height='795px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1024px';vizElement.style.height='795px';} else { vizElement.style.width='100%';vizElement.style.height='727px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

Now we get more refined data to go further.



5. Limitation of the Data

I admit the data is not that reliable as the source is not known. If you come to this page after using the service and finding the prediction was not that great, this part will address the point.

The reason I stuck to the Kaggle data is that it's hard to get customers' data due to the private information issue. Plus, once I get reliable data, this whole system can be used. For this reason, I also include a section in the web application to gather the actual size of customers who wants to provide it.

image-20210921234725804

As you can see from above, the ratio of each size is not even - XXL data takes too small share. Furthermore, there is no XS data which make it impossible to predict the size at all. I admit it will work badly for customers whose actual size is XS or XXL.


Another problem is that it is not mentioned whether the size is preferred one or exact one for each person. If it is preferred size, it disturbs prediction as all people pursue different fit. If you explore data, people with similar physique chose different sizes, so it could be a preferred size which brings the huge difference in outcome in case the service has to suggest an exact size.


Despite of these recognized issues, I further analysed data to build a whole frame of an web application.



6. Results of Classification Model

1) Exclude Age

image-20210922010202628

2) Include Age as a Feature

image-20210922010245089

The classification result is not that impressive as expected from the data. It is found that the better prediction is achieved when including age - the best model was Random Forest. So the service is designed to predicts size from age, height and weight information with Random Forest model.


The chart below demonstrates the difference between prediction and actual data well. It was drawn from Linear Discriminant Analysis which reduced the dimension of features into 2.

Linear Discriminant Analysis
<script type='text/javascript'> var divElement = document.getElementById('viz1632270780328'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1520px';vizElement.style.height='687px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1520px';vizElement.style.height='687px';} else { vizElement.style.width='100%';vizElement.style.height='777px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>



7. Database

Databases were used 1) to build sign-in system and 2) to store shopping webpage links. Sign-in system encompasses features of sign-up, sign-in, sign-out, my info page, update info, delete account and success/error pages accordingly. The structure of tables is as below.

image-20210922014348696

The sign-in DB includes the column of Actual_size, which is intended to gather the data from customers who are willing to provide it, with a hope to improve data from this measure. If a customer provides the actual size, all the services will be provided based on it - like leading to shopping pages with the actual size, not the recommended one.

All the data on sign-up are stored in this table and will be retrieved for authentication.

image-20210922014410787

The web links DB includes size recommended and webpage links to which the service page should be directed after size recommendation. MF, YOOX, MT, NET all indicate each specific website where we can search for an item by size.



8. Closing Remarks

After all the trials and errors, the application was set up in the end. What I wanted to try this project was to use Tableau for visualization and connect Gmail with contact page. It's shame this data is pretty simple and doesn't have many options to try diverse visualization or make a decent dashboard.

Hopefully, I wish to try this web app with better data. FITCHA app is well built up, so it will serve its purpose once it's given right data.