Skip to content

UBC-MDS/wine-quality-regressor-group-2

Repository files navigation

Analysis of Wine Quality and Prediction Using Logistic Regression

Author

Alix Zhou, Paramveer Singh, Susannah Sun, Zoe Ren

About

This analysis investigates the relationship between physicochemical properties and wine quality using the Wine Quality dataset from the UCI Machine Learning Repository, containing data for both red and white wine. Through comprehensive exploratory data analysis, we examined 11 physicochemical features and their correlations with wine quality scores. Our analysis revealed that higher quality wines typically have higher alcohol content and lower volatile acidity, with white wines generally receiving higher quality scores than red wines. Most features showed right-skewed distributions with notable outliers, particularly in sulfur dioxide and residual sugar measurements. The quality scores themselves followed a normal distribution centered around scores 5-6.

We implemented a logistic regression model with standardized features and one-hot encoded categorical variables, using randomized search cross-validation to optimize the regularization parameter. The final model achieved an accuracy of 54% on the test set. While this performance suggests room for improvement, the analysis provides valuable insights for future research directions.

The dataset used in this project is the Wine Quality dataset from the UCI Machine Learning Repository (Cortez et al. 2009) and can be found here These datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. They contains physicochemical properties (e.g., acidity, sugar content, and alcohol) of different wine samples, alongside a sensory score representing the quality of the wine, rated by experts on a scale from 3 to 9. Each row in the dataset represents a wine sample, with the columns detailing 11 physicochemical attributes and the quality score. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones).

Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Report

The final report can be found here

Dependencies

  • conda (version 24.9.1 or higher)
  • conda-lock (version 2.5.7 or higher)
  • Python package ucimlrepo (version 0.0.7)
  • jupyterlab (version 4.2.0 or higher)
  • nb_conda_kernels (version 2.5.1 or higher)
  • Python and packages listed in environment.yml

Usage

Setup

If you are using Windows or Mac, then please ensure that Docker Desktop is running. The user can be check if they have Docker by running the following command in a bash terminal: docker --version.

  1. Clone this GitHub repository.
  2. Make sure docker-compose.yml is using the image with the tag you wish to run it with. No changes are necessary if there is not a specific image tag you would like to run.

Running the analysis

  1. Run the following command in a terminal in the root of the local repository to use the Docker image to run the analysis:

    docker compose up

    This command will automatically start up a Jupyter Lab session using the image listed in the docker-compose.yml file and mount the current project in the Docker container.

  2. In the terminal, look for the Jupyter Lab link which starts with http://127.0.0.1:8888/. Copy and paste the URL into the browser to open up Jupyter Lab.

  3. Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

    make clean
  4. To run the analysis in its entirety, enter the following command in the terminal in the project root:

    make all

Clean up

Hit Ctrl + C in the terminal to end the Jupyter Lab session. Run the following command after the session ends to free up the resources used by Docker: docker compose rm.

Developer notes

Feedback and Contribution instruction can be found here

License

License can be found here

References

Jain, K., Kaushik, K., Gupta, S. K., & Others. 2023. "Machine learning-based predictive modelling for the enhancement of wine quality." *Scientific Reports*, 13:17042. .
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. 2009. "Wine Quality [Dataset]." *UCI Machine Learning Repository*. .
Kniazieva, Y. 2023, October 12. "A digital sommelier: Machine learning for wine quality prediction." *Label Your Data*. .
Aich, S., Al-Absi, A. A., Hui, K. L., Lee, J. T., & Sain, M. 2018. "A classification approach with different feature sets to predict the quality of different types of wine using machine learning techniques." In *International Conference on Advanced Communication Technology (ICACT)*, pp. 139–143. .