This project explores the Titanic dataset to analyze survival trends and generate insights. It involves:
- Loading and inspecting the dataset.
- Performing exploratory data analysis (EDA).
- Visualizing survival trends by gender, class, and age.
- Cleaning and saving the dataset.
- Load Data: Used pandas to load and preview the Titanic dataset.
- Check Missing Values: Identified columns with missing data and decided on handling strategies.
- Univariate Analysis: Visualized age and survival distributions using histograms.
- Bivariate Analysis: Analyzed survival rates by gender and class using bar plots.
- Handle Missing Data: Filled missing values in the
age
column and dropped columns with excessive missing data. - Save Cleaned Data: Saved the cleaned dataset for future use.
- Survival rates were higher for women compared to men.
- Passengers in higher classes (e.g., first-class) had a greater chance of survival.
- Age distribution showed that younger passengers had slightly better survival rates.
titanic3.xls
: Original dataset.EDA_Titanic.ipynb
: Jupyter Notebook documenting the EDA process.cleaned_titanic_data.xlsx
: Cleaned dataset.README.md
: Project overview and methodology.
To run the analysis, install the following Python libraries:
pip install pandas matplotlib seaborn xlrd openpyxl