Heberman’s Dataset is the data collected from the study in University of Chicago Billing Hospital between year 1958 to 1970. This data contains the information of the patients who have undergone with surgery of breast cancer.
The data is in csv format and contains 4 attributes (including the class attribute) and 306 data attributes
- Age: Age of the person (numerical) The range for this data is from 30 years to 83 years
- Year: Year of operation of the person (numerical) This columns represents the years in ‘19yy’ format. For example, if the number is 64, it means the years is 1964.
- Nodes: Number of positive axillary nodes detected (numerical) The lymph system is body’s immunity system and are small in size and kidney-shaped nodes.
- Status: Survival Status (numerical) This is class attribute. It represent whether patient survive more than 5 years or less after undergone through surgery. Here if patients survived 5 years or more is represented as 1 and patients who survived less than 5 years is represented as 2.
An exploratory data analysis and predictive modelling on this dataset has been performed.
Plots:
- Pair plot
- Distribution Plot
- Cummulative Distribution Plot
- Scatter Plot
- Box Plot
- Voilin Plot
Data Modelling:
- Linear Regression Model
requirements.txt contains all the python packages that needs to be installed
pip install -r requirements.txt
Predictive_Modeling.ipynb: Ipython file where the data analysis performed
haberman.csv: data file
requirements.txt: python packages required to install
README.md: Readme file with introduction and details