This repository automates the process of fetching, merging, and updating Bitcoin (BTC/USD) historical data from Bitstamp. The updated dataset is then uploaded to Kaggle every day using GitHub Actions.
- Daily Updates: The repository ensures that the
btcusd_1-min_data.csv
dataset is always up to date with the latest data by fetching missing days of Bitcoin trading data from the Bitstamp API. - Merges Existing Data: It first downloads the existing dataset from Kaggle, identifies any missing data, fetches it from the Bitstamp API, and then merges it with the existing data.
- Automated Upload: Once the missing data is merged, the updated dataset is automatically uploaded back to Kaggle, ensuring that the dataset remains accurate and complete without manual intervention.
.
├── kaggle_bitcoin
│ └── kaggle_update_bitcoin.py # Main Python script for data fetching and merging
├── pyproject.toml # Poetry dependency manager file
├── poetry.lock # Poetry dependency manager file
└── .github
└── workflows
└── kaggle-automation.yml # GitHub Actions workflow for automation
- GitHub Actions automatically triggers every day at midnight (UTC).
- The dataset is downloaded from Kaggle (
btcusd_1-min_data.csv
). - Missing data is identified by comparing the last available date in the dataset with today’s date.
- Data is fetched from the Bitstamp API to fill in the missing days.
- The updated dataset is uploaded back to Kaggle, keeping the dataset current.
Ensure that your Kaggle API credentials are set as GitHub Secrets:
KAGGLE_USERNAME
: Your Kaggle username.KAGGLE_KEY
: Your Kaggle API key.
These credentials are used by the script to download and upload the dataset on Kaggle.
To run the script locally, follow these steps:
- Install dependencies using Poetry:
poetry install
- Run the main script:
poetry run python kaggle_bitcoin/kaggle_update_bitcoin.py
This setup automates the process of keeping the Kaggle dataset up-to-date with the latest Bitcoin trading data, ensuring the dataset remains comprehensive without any manual effort.