Diabetes Prediction

Problem Description

The outline of the project is to predict whether a patient is prone to risk of a heart attack or not, using different health parameters.

About the Dataset

The Diabetes prediction dataset is a collection of medical and demographic data from patients, along with their diabetes status (positive or negative). The data includes features such as age, gender, body mass index (BMI), hypertension, heart disease, smoking history, HbA1c level, and blood glucose level. This dataset can be used to build machine learning models to predict diabetes in patients based on their medical history and demographic information. This can be useful for healthcare professionals in identifying patients who may be at risk of developing diabetes and in developing personalized treatment plans. Additionally, the dataset can be used by researchers to explore the relationships between various medical and demographic factors and the likelihood of developing diabetes.

This dataset provides a comprehensive array of features relevant to heart health and lifestyle choices, encompassing patient-specific details such as age, gender, cholesterol levels, blood pressure, heart rate, and indicators like diabetes, family history, smoking habits, obesity, and alcohol consumption. Additionally, lifestyle factors like exercise hours, dietary habits, stress levels, and sedentary hours are included. Medical aspects comprising previous heart problems, medication usage, and triglyceride levels are considered. Socioeconomic aspects such as income and geographical attributes like country, continent, and hemisphere are incorporated. The dataset, consisting of around 7000 records from patients around the globe, culminates in a crucial binary classification feature denoting the presence or absence of a heart attack risk, providing a comprehensive resource for predictive analysis and research in cardiovascular health.

How to Use

Prerequisites

python 3.10
docker

Cloning the repo

First and foremost, the repo needs to cloned to local for usage. This can be achieved using:

git clone https://github.com/PriyaVellanki/diabetest-risk-score.git

Acquiring Data

The data used for training this model is stored in /data/diabetes_prediction_dataset.csv in the repo.

Using Docker Image

Build Docker Image

docker build -t {build-tag} .

Run the docker image

docker run -it --rm -p 9696:9696 {build-tag}

{build-tag}: Specifies any user-defined tag for docker image. eg. diabetes-risk-score:latest

Making predictions

By default, the patient parameters are set at the following for test service:

  patient = {
    "gender":"male",
    "age":50,
    "hypertension": 0,
    "heart_disease": 0,
    "smoking_history": "current",
    "bmi": 25.31,
    "hba1c_level":7.0,
    "blood_glucose_level":220
}

To test the model with specific input and check the prediction probablity value.

python predict_test_using_model.py

To test the model using API endpoint either after starting gunicorn loclaly or after docker deployment.

python test_predict_api.py

Sample Output

Locally, user shoudl be able to get a similar output to the one shown below upon running all steps successfully.

Cloud Deployment

Prerequisites

CPU : 2 or more Container or virtual machine manager such as Docker,Virtual Box etc

###Installation Install instructions for various platforms are located here : Below steps I listed on for Mac.

brew install minikube

brew install kubectl

Start your Cluster

minikube start

To user docker daemon inside minikube

eval $(minikube docker-env)

Build docker images inside minikube

 minikube cache add python:3.10-slim
 docker build -t diabetest-risk-score .

Deploy Application

Create deployment and Expose it on port 9696.

kubectl create -f deployment.yaml 
  kubectl expose deployment flaskapi-deployment --type=NodePort --port=9696

I took sample deployment.yaml for FlaskAPI and updated with my image. Alternatively, you can use image to create deployment. Below are the details.

kubectl create deployment flask-api --image=diabetest-risk-score:latest
kubectl expose deployment flask-api --type=NodePort --port=9696

Access the Service Endpoint

The easiest way to access this service is to let minikube launch a web browser for you:

minikube service flask-api

Alternatively, use kubectl to forward the port:

kubectl port-forward service/flask-api 7080:9696

Test using minikube endpoint

Manage your Cluster

Pause Kubernetes without impacting deployed applications:

minikube pause

Unpause a paused instance:

minikube unpause

Halt the cluster:

minikube stop

Acknowledgement

The project has been created as part of ML ZOOMCAMP with the help of a colaborative slack community of DataTalks and specially Alexey.

Notes

Trained model on Logistic, Decision Tree , Random Forest and XGBoost. Though XGBoost is have very slightly high score than random forest. Not getting right predictions when I test locally with different test data. The data I choose as host of class Imbalance. Added class_weight to correct the balance but still need to explore why XGBoost is not predicting right always even with high AUC score.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Project_model.ipynb		Project_model.ipynb
README.md		README.md
deployment.yaml		deployment.yaml
predict.py		predict.py
predict_test_using_model.py		predict_test_using_model.py
rf.bin		rf.bin
test_predict_api.py		test_predict_api.py
test_request.ipynb		test_request.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Prediction

Problem Description

About the Dataset

How to Use

Prerequisites

Cloning the repo

Acquiring Data

Using Docker Image

Making predictions

Sample Output

Cloud Deployment

Prerequisites

Start your Cluster

To user docker daemon inside minikube

Build docker images inside minikube

Deploy Application

Access the Service Endpoint

Test using minikube endpoint

Manage your Cluster

Acknowledgement

Notes

Feedback

About

Releases

Packages

Languages

PriyaVellanki/diabetest-risk-score

Folders and files

Latest commit

History

Repository files navigation

Diabetes Prediction

Problem Description

About the Dataset

How to Use

Prerequisites

Cloning the repo

Acquiring Data

Using Docker Image

Making predictions

Sample Output

Cloud Deployment

Prerequisites

Start your Cluster

To user docker daemon inside minikube

Build docker images inside minikube

Deploy Application

Access the Service Endpoint

Test using minikube endpoint

Manage your Cluster

Acknowledgement

Notes

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages