Skip to content

Commit

Permalink
Merge pull request #125 from UBC-MDS/develop
Browse files Browse the repository at this point in the history
Final Repo
  • Loading branch information
kegao1995 authored Dec 17, 2024
2 parents e5938f2 + 9b4ede4 commit 03905a2
Show file tree
Hide file tree
Showing 44 changed files with 906 additions and 1,996 deletions.
Empty file removed .bash_history
Empty file.
1 change: 1 addition & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ on:
paths:
- 'Dockerfile'
- 'conda-linux-64.lock'
- 'requirements.txt'

jobs:
push_to_registry:
Expand Down
17 changes: 0 additions & 17 deletions .local/share/jupyter/runtime/jpserver-7-open.html

This file was deleted.

13 changes: 0 additions & 13 deletions .local/share/jupyter/runtime/jpserver-7.json

This file was deleted.

1 change: 0 additions & 1 deletion .local/share/jupyter/runtime/jupyter_cookie_secret

This file was deleted.

71 changes: 71 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
editor_options:
markdown:
wrap: 72
---

Revisions:

Who: Merari Santana

What was addressed:

- Scripts on README file were not running Description of Revision: I
revised the instructions for running Make file. This runs all the
scripts correctly. Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/commit/3f23b4e431508388575169556cc8aa3a8e0a0646>

- Improved accessibility to our report Description of Revision: I
deployed Github pages so that our README file has a direct link to
our HTML report. Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/commit/7e22dd6dc250c11948aa87be384a8f9c15fec87a>

- Change acronymns in final report and delete bullet points
Description of Revision: I changed the acronyms in our qmd file and
deleted bullet points. These changes were rendered to our pdf and
html files. Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/commit/b91ca5a3874067d447d9646090028011784b85ba>
<https://github.com/UBC-MDS/heart-failure-analysis/commit/7a12b5c145fc4dc222c043461186f4d0b4b43c99>

Who: Gurmehak Kaur

What was addressed:

- Improve the project folder structure Description of Revision: I cleaned up and improved the project’s folder structure by organizing files into dedicated folders that were earlier missing in our repo: `reports/` for generated summaries, `results/` with subfolders for tables and figures for visualizations, `scripts/` for executable workflows and `src/` for abstract functions. This streamlined structure improves clarity and project maintainability.
Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/commit/87eadd9b89b44e0c49dea8433a9b300577dab760>
<https://github.com/UBC-MDS/heart-failure-analysis/commit/5517cf4a60afb6bf6afef3c43c2f820a9909862c>

Who: Ke Gao

What was addressed:

- Improve Automatic Numbering of Figures in the Report Description of
Revision: I improved automatic numbering of figures in the report.
Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/pull/106>

- Improve Automatic Numbering of Tables in the Report Description of
Revision: I improved automatic numbering of tables in the report.
Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/pull/106>

Who: Yuhan Fan

What was addressed:

- Updated README.me with following:

- the 'About' section of README.md with most resent results
metrics from our final report, and fixed any grammar errors.

- Deleted bullet point and capitalized "contributors" in
README.md.

- Added GitHub repository link under 'Usage' - 'Setup' section.

- Added example screenshot image to 'Running the analysis'
section.

- Evidence:
<https://github.com/UBC-MDS/heart-failure-analysis/pull/120>
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ USER root
RUN sudo apt update \
&& sudo apt install -y lmodern

RUN apt-get update && apt-get install -y build-essential make

USER $NB_UID

RUN mamba update --quiet --file /tmp/conda-linux-64.lock
Expand All @@ -17,8 +19,7 @@ RUN mamba clean --all -y -f
RUN pip install --no-cache-dir -r /tmp/requirements.txt
RUN pip cache purge


RUN fix-permissions "${CONDA_DIR}"
RUN fix-permissions "/home/${NB_USER}"

RUN pip install deepchecks==0.18.1

80 changes: 48 additions & 32 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,57 +1,73 @@
.PHONY: all clean

all: report/heart_failure_analysis.html report/heart_failure_analysis.pdf
all: data/raw/heart_failure_clinical_records.data \
data/processed/heart_failure_train.csv \
results/figures/correlation_heatmap.png \
results/models/pipeline.pickle results/figures/training_plots \
results/tables/confusion_matrix.csv \
results/tables/test_scores.csv \
reports/heart-failure-analysis.html \
reports/heart-failure-analysis.pdf

# Download and convert data
data/raw/heart_failure_clinical_records.data : scripts/download_and_convert.py
data/raw/heart_failure_clinical_records.data: scripts/download_and_convert.py
python scripts/download_and_convert.py \
--url="https://archive.ics.uci.edu/static/public/519/heart+failure+clinical+records.zip" \
--write_to=data/raw

# Process and analyze data
data/processed/heart_failure_train.csv data/processed/heart_failure_test.csv : scripts/process_and_analyze.py data/raw/heart_failure_clinical_records.data
python scripts/process_and_analyze.py \
--file_path=data/raw/heart_failure_clinical_records.data \
--data-to=data/processed
--file_path="data/raw/heart_failure_clinical_records_dataset_converted.csv" \
--output_dir=data/processed

# Perform correlation analysis
results/figures/correlation_heatmap.png : scripts/correlation_analysis.py data/processed/heart_failure_train.csv data/processed/heart_failure_test.csv
python scripts/correlation_analysis.py \
--train_file=data/processed/heart_failure_train.csv \
--test_file=data/processed/heart_failure_test.csv \
--output_file=results/figures/correlation_heatmap.png
--output_file="./results/figures/heatmap.png"

# Train and evaluate the model
results/models/pipeline.pickle results/figures/training_plots : scripts/modelling.py data/processed/heart_failure_train.csv
python scripts/modelling.py \
--training-data=data/processed/heart_failure_train.csv \
--pipeline-to=results/models \
--plot-to=results/figures \
--seed=123

results/tables/test_evaluation.csv : scripts/model_evaluation.py data/processed/heart_failure_test.csv results/models/pipeline.pickle
results/models/pipeline.pickle results/figures/training_plots: data/processed/heart_failure_train.csv
python scripts/modelling.py \
--training-data "./data/processed/heart_failure_train.csv" \
--pipeline-to "results/models" \
--plot-to "results/figures" \
--table-to "results/tables" \
--seed 123

results/tables/confusion_matrix.csv results/tables/test_scores.csv: scripts/model_evaluation.py data/processed/heart_failure_test.csv results/models/pipeline.pickle
python scripts/model_evaluation.py \
--scaled-test-data=data/processed/heart_failure_test.csv \
--pipeline-from=results/models/pipeline.pickle \
--results-to=results/tables
--scaled-test-data "data/processed/heart_failure_test.csv" \
--pipeline-from "results/models/pipeline.pickle" \
--results-to "results/tables"

# Build HTML and PDF reports
report/heart_failure_analysis.html report/heart_failure_analysis.pdf : report/heart_failure_analysis.qmd \
results/models/pipeline.pickle \
results/figures/heatmap.html \
results/figures/training_plots \
results/tables/test_evaluation.csv
quarto render report/heart_failure_analysis.qmd --to html
quarto render report/heart_failure_analysis.qmd --to pdf
# Rule to generate HTML
reports/heart-failure-analysis.html:
quarto render reports/heart-failure-analysis.qmd --to html --embed-resources --standalone

# Rule to generate PDF
reports/heart-failure-analysis.pdf:
quarto render reports/heart-failure-analysis.qmd --to pdf


# Clean up analysis
clean:
rm -rf data/raw/*
rm -f results/data/processed/heart_failure_train.csv \
results/data/processed/heart_failure_test.csv \
results/models/pipeline.pickle \
results/figures/heatmap.html \
results/figures/training_plots \
results/tables/test_evaluation.csv \
report/heart_failure_analysis.html \
report/heart_failure_analysis.pdf
rm -rf \
data/processed/* \
results/figures/* \
results/img/* \
results/models/* \
results/pipeline/* \

rm -f \
results/tables/test_scores.csv \
results/tables/confusion_matrix.csv \
results/tables/confusion_matrix.csv \
results/tables/logistic_regression_coefficients.csv \
reports/heart-failure-analysis.html \
reports/heart-failure-analysis.pdf


58 changes: 20 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
# Heart Failure Analysis

- contributors: Yuhan Fan, Gurmehak Kaur, Ke Gao, Merari Santana
Contributors: Yuhan Fan, Gurmehak Kaur, Ke Gao, Merari Santana

## About

In this project, we attempt to build a classification model using logistic regression algorithm to predict patient mortality risk after surviving a heart attack using their medical records. Using patient test results, the final classifier achieves an accuracy of 81.6%. The model’s precision of 70.0% suggests it is moderately conservative in predicting the positive class (death), minimizing false alarms.More importantly, the recall of 73.68% ensures the model identifies the majority of high-risk patients, reducing the likelihood of missing true positive cases, however, there is still room for a lot of improvement, particularly in aiming to maximise recall by minimising False Negatives. The F1-score of 0.71 reflects a good balance between precision and recall, highlighting the model’s robustness in survival prediction. While promising, further refinements are essential for more reliable predictions and effectively early intervention.
In this project, we attempt to build a classification model using logistic regression algorithm to predict patient mortality risk after surviving a heart attack using their medical records. Using patient test results, the final classifier achieves an accuracy of 0.82. The model’s precision of 0.70 suggests it is moderately conservative in predicting the positive class (death), minimizing false alarms. More importantly, the recall of 0.74 ensures the model identifies the majority of high-risk patients, reducing the likelihood of missing true positive cases, however, there is still room for a lot of improvement, particularly in aiming to maximise recall by minimising False Negatives. The F1-score of 0.72 reflects a good balance between precision and recall, highlighting the model’s robustness in survival prediction. While promising, further refinements are essential for more reliable predictions and effectively early intervention.

The data set used in this project was created by D. Chicco, Giuseppe Jurman in 2020. It was sourced from the UCI Machine Learning Repository and can be found [here](https://archive.ics.uci.edu/dataset/519/heart+failure+clinical+records). Each row in the data set represents the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features(age, anaemia, diabetes, platelets, etc.).

## Report

The final report can be found [here](https://ubc-mds.github.io/heart-failure-analysis/reports/heart-failure-analysis.html).

## Dependencies

- Docker
Expand All @@ -20,21 +24,30 @@ The data set used in this project was created by D. Chicco, Giuseppe Jurman in 2

> If you are using Windows or Mac, make sure Docker Desktop is running.
1. Clone this GitHub repository.
1. Clone this [GitHub repository](https://github.com/UBC-MDS/heart-failure-analysis/tree/main).

### Running the analysis

1. Navigate to the root of this project on your computer using the command line and enter the following command:
2. Navigate to the root of this project on your computer using your local terminal and then enter the following command:

```
docker compose up
```

2. In the terminal, look for a URL that starts with [`http://127.0.0.1:8888/lab?token=`](http://127.0.0.1:8888/lab?token=) (for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser.
3. In the terminal output, look for a URL that starts with `http://127.0.0.1:8888/lab?token=`. (for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser.


4. Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

```
make clean
```

<img src="img/jupyter-container-web-app-launch-url.png" width="400"/>
5. To run the analysis in its entirety, enter the following command in the terminal in the project root:

3. To run the analysis, open `heart-failure-analysis.ipynb` in Jupyter Lab you just launched and under the "Kernel" menu click "Restart Kernel and Run All Cells...".
```
make all
```

### Clean up

Expand All @@ -61,37 +74,6 @@ docker compose up

6. Send a pull request to merge the changes into the `main` branch.

### Calling scripts

To run the analysis, open a terminal and run the following commands and their respective arguments:

```
python scripts/download_and_convert.py \
--url "https://archive.ics.uci.edu/static/public/519/heart+failure+clinical+records.zip"
python scripts/process_and_analyze.py \
--file_path "../data/heart_failure_clinical_records_dataset_converted.csv"
python scripts/correlation_analysis.py \
--train_file "./data/processed/heart_failure_train.csv" \
--test_file "./data/processed/heart_failure_test.csv" \
--output_file "./results/figures/heatmap.html"
python scripts/modelling.py \
--training-data "./data/processed/heart_failure_train.csv" \
--pipeline-to "results/pipeline" \
--plot-to "results/figures" \
--seed 123
python scripts/model_evaluation.py \
--scaled-test-data=data/processed/heart_failure_test.csv \
--pipeline-from=results/pipeline/heart_failure_model.pickle \
--results-to=results/figures
quarto render heart-failure-analysis.qmd --to html
quarto render heart-failure-analysis.qmd --to pdf
```

## License

This dataset is licensed under a [Creative Commons Attribution 4.0 International (CC BY 4.0) license](https://creativecommons.org/licenses/by/4.0/legalcode).
Expand Down
Empty file removed data/.gitkeep
Empty file.
File renamed without changes.
3 changes: 2 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
services:
jupyter-notebook:
image: gur5/heart-failure-prediction:7de3b28
image: gur5/heart-failure-prediction:fe61672

ports:
- "8888:8888"
volumes:
Expand Down
12 changes: 6 additions & 6 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ dependencies:
- joblib=1.3.1
- pip=24.0
- pytest=8.3.4
- pip:
- altair-ally==0.1.1
- vega-datasets==0.9.0
- vegafusion==1.6.9
- deepchecks==0.18.1
- pandera==0.20.4
# - pip:
# - altair-ally==0.1.1
# - vega-datasets==0.9.0
# - vegafusion==1.6.9
# - deepchecks==0.18.1
# - pandera==0.20.4
253 changes: 134 additions & 119 deletions reports/heart-failure-analysis.html

Large diffs are not rendered by default.

Loading

0 comments on commit 03905a2

Please sign in to comment.