Contributors: Sarah Eshafi, Hui Tang, Long Nguyen, Marek Boulerice
This repository covers the creation of a machine learning model analysis with a goal to predict angiographic coronary disease in patients. Data is pulled from patients undergoing angiography at the Cleveland Clinic in Ohio. This analysis is composed of Exploratory Data Analysis, testing of various machine models on a training data set, model optimization via hyperparameter, and final model performance analysis. The final LogisticRegression model is shown to perform quite well on testing data, reporting final F1 scores of around 0.75. The Recall value on this model is also quite promising, indicating the model is able to predict the occurrence of heart disease consistently with very few false negative cases (misses). Nevertheless, a few limitations still apply to our model. while the recall is quite low, even missing a few cases of heart disease in patients would have devastating effects on lives of patients. As well, the model requires quite extensive medical information from patients to deliver an accurate prediction, which may make it difficult to roll out in commercial applications. With these factors in mind we suggest further tuning of the model, though initial results are promising.
To run the analysis:
Ensure you have the following downloaded:
note - the instructions in this section also depends on running this in a unix shell (e.g., terminal or Git Bash)
Clone this GitHub
repository by clicking the green Code button near the top of the repository, copying the HTTPS or SSH link and running git clone [LINK]
.
Once you have the repository cloned, navigate to the root directory of this project and run the following command at the command line/terminal:
docker compose up
Copy the link from the output (the link would look like below)
and paste it to your browser and change the port number from 8888
to 9999
to launch jupyter notebook.
Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):
make clean
To run the analysis in its entirety, navigate to the root of this project using the command line and run the following command:
make all
To shut down the container and clean up the resources, type Cntrl + C in the terminal where you launched the container, and then type docker compose rm
.
- conda (version 24.7.1 or higher)
- conda-lock (version 2.5.7 or higher)
- jupyterlab (version 4.2.5 or higher)
- nb_conda_kernels (version 2.5.1 or higher)
- Python and packages listed in environment.yml
Use the same docker compose up
command as described in the Running the Report section above to launch Jupyter lab. Tests are run using the pytest
command in the root of the project. More details about the test suite can be found in the tests directory.
The Heart Diagnostic Analysis file contained within this repository is licensed under the Creative Commons 4.0 license. The software code contained within this repository is licensed under the MIT license. See the license file for more information.