This project explores the relationship between physicochemical properties of wines and their quality ratings, aiming to predict wine quality and identify key factors influencing it using machine learning models such as Decision Trees. Through exploratory data analysis (EDA), we examine patterns, distributions, and correlations, addressing challenges such as class imbalances in wine quality ratings. The Decision Tree model is evaluated using metrics like accuracy, precision, recall, and feature importance to uncover significant predictors, such as density, alcohol, and volatile_acidity. The primary goal is to build an interpretable machine learning pipeline that provides actionable insights for winemakers to optimize production processes and for consumers to make informed choices. Additionally, the project sets the foundation for future work, including incorporating sensory attributes, addressing dataset imbalances, and leveraging more advanced ensemble methods for better predictions.
- Chukwunonso Ebele-Muolokwu
- Samuel Adetsi
- Shashank Hosahalli Shivamurthy
- Ci Xu
This project ensures a reproducible computational environment using Conda. Follow the steps below to set up the environment for this project.
- Install Miniconda or Anaconda.
- Clone this repository:
git clone https://github.com/UBC-MDS/522-wine-quality-32.git
cd 522-wine-quality-32
This is the recommended method to set up the environment.
- Create the Conda environment:
conda env create -f environment.yml
- Activate the environment:
conda activate 522_milestone_env
- Verify the environment setup:
python -c "import pandas as pd; print('Environment set up successfully!')"
If you want to ensure reproducibility across different operating systems, use platform-specific lock files.
- Install
conda-lock
:
pip install conda-lock
- Create the environment using the lock file for your platform:
- For Linux/macOS/Windows:
conda-lock install --name 522_milestone_env conda-lock.yml
- Activate the environment:
conda activate 522_milestone_env
- Navigate to the root of this project on your computer using the command line and enter the following command:
docker compose up
- In the terminal, look for a URL that starts with
http://127.0.0.1:8888/lab?token=
(for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser.
- To run the analysis,
open
analysis.ipynb
in Jupyter Lab you just launched and under the "Kernel" menu click "Restart Kernel and Run All Cells...".
Each pipeline step is defined in the Makefile
. Below are the individual targets and how to use them:
Download the raw wine quality dataset:
make data
Output
: data/raw/wine_data.csv
Process the raw data and generate the processed training and testing datasets, along with a validation report:
make process
Inputs
: data/raw/wine_data.csvOutputs
:- data/processed/wine_train.csv
- data/processed/wine_test.csv
- report/validation_report.html
Train a Decision Tree model on the processed data:
make train
Inputs
:- data/processed/wine_train.csv
- data/processed/wine_test.csv
Output
: data/model/wine_model.pkl
Create visualizations for feature importance and wine quality distribution:
make plot
Inputs
:- data/model/wine_model.pkl
- data/processed/wine_train.csv
- data/processed/wine_test.csv
Outputs
:- data/img/feature_importance.png
- data/img/quality_distribution.png
Render the analysis report using Quarto:
make report
Inputs
:- data/img/feature_importance.png
- data/img/quality_distribution.png
- report/wine_quality_eda.qmd
Output
: report/wine_quality_eda.html
Run all steps in the pipeline:
make all
This command ensures that all intermediate files are created and up to date.
Remove all generated files to reset the pipeline:
make clean
Clean the pipeline and rerun all steps:
make retrain
If you add new dependencies:
- Update environment.yaml.
- Rebuild the environment:
conda env update -f environment.yaml --prune
- For Docker, rebuild the container:
docker compose build
- Remove the Conda Environment:
conda env remove -n 522_milestone_env
- Remove Docker Resources:
docker compose down --remove-orphans