Skip to content

Iman-hamdan/Diabetes-Stats-Model

Repository files navigation

Diabetes-Stats-Model

Diabetes Prediction using Logistic Regression

This project is a part of the AAI-500 course in the Applied Artificial Intelligence Program at the University of San Diego (USD).

-- Project Status: Completed

Installation

For independent review, download AAI_500_Final_Team_Project_Group6.ipynb and run the file from a Python code editor, preferably Microsoft VS Code.

To contribute to this project, clone of fork this repo.

Project Intro/Objective

The purpose of this project is to analyze health indicators to predict the likelihood of an individual having diabetes. Diabetes is a chronic condition that affects millions of people worldwide, and early prediction can help individuals take preventive measures to manage their health.

This analysis utilizes a dataset containing various health-related metrics such as glucose levels, BMI, age, and more. By applying statistical methods and machine learning models, the goal is to understand the relationships between these health indicators and diabetes outcomes.

The primary objectives of this project are:

  • Perform exploratory data analysis (EDA) to identify patterns and trends.
  • Conduct hypothesis testing to validate the significance of key health features.
  • Train and evaluate machine learning models to predict diabetes status.
  • Provide actionable insights and recommendations based on the findings.

This project is a collaborative effort, following best practices in data science, including version control with GitHub, and adhering to coding standards like PEP 8.

File Reference:

  • diabetes_012_health_indicators_BRFSS2015.csv - Deata reference file, local sample of the CDC Diabetes Health Indicators Dataset.

  • AAI_500_Final_Team_Project_Group6.ipynb - Python notebook to rnu and describe exploratory data analysis (EDA).

Partner(s)/Contributor(s)

  • Iman Hamdan
  • Andrew Blumhardt

Methods Used

A few examples are:

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Data Manipulation

Technologies

  • Python

Project Description

This project focuses on predicting the risk of diabetes using a range of health indicators from the CDC Diabetes Health Indicators dataset. The dataset offers valuable insights into participants’ demographic, lifestyle, and medical background, allowing for the creation of a predictive model. We utilized Exploratory Data Analysis (EDA) to uncover key trends in the data, followed by the implementation of a Logistic Regression model, well-suited for binary classification, to forecast the likelihood of diabetes.

Using Exploratory Data Analysis (EDA), we identified key patterns in the dataset. We selected Logistic Regression as our primary model due to its suitability for binary classification. The dataset was divided into training and testing sets to validate the model’s performance.

License

Academic Free License v3.0 AFL-3.0

Acknowledgments

Teboul, A., & Centers for Disease Control and Prevention. (2023, September 25). Diabetes health indicators dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published