Skip to content

UBC-MDS/DSCI-522-2425-team35-Heart_disease_diagnostic_machine

Repository files navigation

Heart Disease Predictor

Contributors: Sarah Eshafi, Hui Tang, Long Nguyen, Marek Boulerice

About

This repository covers the creation of a machine learning model analysis with a goal to predict angiographic coronary disease in patients. Data is pulled from patients undergoing angiography at the Cleveland Clinic in Ohio. This analysis is composed of Exploratory Data Analysis, testing of various machine models on a training data set, model optimization via hyperparameter, and final model performance analysis. The final LogisticRegression model is shown to perform quite well on testing data, reporting final F1 scores of around 0.75. The Recall value on this model is also quite promising, indicating the model is able to predict the occurrence of heart disease consistently with very few false negative cases (misses). Nevertheless, a few limitations still apply to our model. while the recall is quite low, even missing a few cases of heart disease in patients would have devastating effects on lives of patients. As well, the model requires quite extensive medical information from patients to deliver an accurate prediction, which may make it difficult to roll out in commercial applications. With these factors in mind we suggest further tuning of the model, though initial results are promising.

Running the Report

To run the analysis:

1. Download Dependencies

Ensure you have the following downloaded:

2. Using Docker

note - the instructions in this section also depends on running this in a unix shell (e.g., terminal or Git Bash)

Clone this GitHub repository by clicking the green Code button near the top of the repository, copying the HTTPS or SSH link and running git clone [LINK].

Once you have the repository cloned, navigate to the root directory of this project and run the following command at the command line/terminal:

docker compose up

Copy the link from the output (the link would look like below) Jupyter-lab

and paste it to your browser and change the port number from 8888 to 9999 to launch jupyter notebook. Jupyter-lab

3. Reset Project to Clean Slate

Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

make clean

4. Running the Analysis

To run the analysis in its entirety, navigate to the root of this project using the command line and run the following command:

make all

5. Clean Up

To shut down the container and clean up the resources, type Cntrl + C in the terminal where you launched the container, and then type docker compose rm.

Developer Notes

Developer Dependencies

  • conda (version 24.7.1 or higher)
  • conda-lock (version 2.5.7 or higher)
  • jupyterlab (version 4.2.5 or higher)
  • nb_conda_kernels (version 2.5.1 or higher)
  • Python and packages listed in environment.yml

Running the Test Suite

Use the same docker compose up command as described in the Running the Report section above to launch Jupyter lab. Tests are run using the pytest command in the root of the project. More details about the test suite can be found in the tests directory.

Licenses

The Heart Diagnostic Analysis file contained within this repository is licensed under the Creative Commons 4.0 license. The software code contained within this repository is licensed under the MIT license. See the license file for more information.