This project explores machine learning techniques to detect fraudulent transactions in a highly imbalanced dataset. The dataset contains anonymized transaction details with the goal of identifying fraudulent transactions (Class 1).
The dataset consists of:
- Time: Time elapsed since the first transaction.
- V1-V28: Anonymized features.
- Amount: Transaction amount.
- Class: Target variable indicating fraud (1) or non-fraud (0).
- Fraudulent Transactions: 492 instances (0.17%)
- Non-Fraudulent Transactions: 284,315 instances (99.83%)
Model | ROC AUC | Precision | Recall | Notes |
---|---|---|---|---|
Logistic Regression | 0.9559 | 0.83 | 0.65 | Simple and interpretable. |
Decision Tree | 0.8875 | 0.75 | 0.78 | Prone to overfitting. |
Random Forest | 0.9578 | 0.94 | 0.82 | Robust and feature-rich. |
SVM | 0.9646 | 0.96 | 0.67 | Effective but complex. |
XGBoost | 0.9711 | 0.89 | 0.83 | Best performance overall. |
analysis/DSC478_Final_Project.ipynb
: Jupyter notebook containing the analysis and code.report/Finalreport-PML.pdf
: Comprehensive project report.
- Clone the repository:
git clone https://github.com/tejas-1911/Fraud-Detection-Using-Machine-Learning.git