This project is a part of the AAI-500 course in the Applied Artificial Intelligence Program at the University of San Diego (USD).
-- Project Status: Completed
For independent review, download AAI_500_Final_Team_Project_Group6.ipynb and run the file from a Python code editor, preferably Microsoft VS Code.
To contribute to this project, clone of fork this repo.
The purpose of this project is to analyze health indicators to predict the likelihood of an individual having diabetes. Diabetes is a chronic condition that affects millions of people worldwide, and early prediction can help individuals take preventive measures to manage their health.
This analysis utilizes a dataset containing various health-related metrics such as glucose levels, BMI, age, and more. By applying statistical methods and machine learning models, the goal is to understand the relationships between these health indicators and diabetes outcomes.
The primary objectives of this project are:
- Perform exploratory data analysis (EDA) to identify patterns and trends.
- Conduct hypothesis testing to validate the significance of key health features.
- Train and evaluate machine learning models to predict diabetes status.
- Provide actionable insights and recommendations based on the findings.
This project is a collaborative effort, following best practices in data science, including version control with GitHub, and adhering to coding standards like PEP 8.
-
diabetes_012_health_indicators_BRFSS2015.csv - Deata reference file, local sample of the CDC Diabetes Health Indicators Dataset.
-
AAI_500_Final_Team_Project_Group6.ipynb - Python notebook to rnu and describe exploratory data analysis (EDA).
- Iman Hamdan
- Andrew Blumhardt
A few examples are:
- Inferential Statistics
- Machine Learning
- Data Visualization
- Data Manipulation
- Python
This project focuses on predicting the risk of diabetes using a range of health indicators from the CDC Diabetes Health Indicators dataset. The dataset offers valuable insights into participants’ demographic, lifestyle, and medical background, allowing for the creation of a predictive model. We utilized Exploratory Data Analysis (EDA) to uncover key trends in the data, followed by the implementation of a Logistic Regression model, well-suited for binary classification, to forecast the likelihood of diabetes.
Using Exploratory Data Analysis (EDA), we identified key patterns in the dataset. We selected Logistic Regression as our primary model due to its suitability for binary classification. The dataset was divided into training and testing sets to validate the model’s performance.
Academic Free License v3.0 AFL-3.0
Teboul, A., & Centers for Disease Control and Prevention. (2023, September 25). Diabetes health indicators dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset