This repository is part of the publication entitled "a dynamic ensemble model for short-term forecasting in pandemic situations" (https://journals.plos.org/globalpublichealth/article?id=10.1371/journal.pgph.0003058). In the following we will describe how this repository is organized and how to run the scripts. For questions please contact Jonas Botz ([email protected]).
.
├── environment.yml
├── README.md
└── src -> includes necessary scripts for loading data and running models
├── data -> scripts for data loading and processing
│ ├── datasets.py
│ ├── __init__.py
│ └── load_data.py
├── __init__.py
├── models -> scripts for modeling, tuning and evaluation
│ ├── arima.py -> ARIMA base-model
│ ├── __init__.py
│ ├── main_ens_model.py -> meta-model without metadata
│ ├── main_meta.py -> meta-model with metadata
│ ├── main_optuna_ens.py -> optuna tuning for meta-model without metadata
│ ├── main_optuna_LSTM.py -> optuna tuning for LSTM base-model
│ ├── main_optuna_meta.py -> optuna tuning for meta-model with metadata
│ ├── main_optuna_trees.py -> optuna tuning for XGBoost and Random Forest
│ ├── main.py -> script for running tuned basemodel and baseline ensemble methods
│ ├── networks.py -> LSTM base-model and meta-model
│ ├── parser.py -> configurations
│ ├── regression.py -> Linear Regression
│ ├── solver_ens_model.py -> for running meta-model without metadata
│ ├── solver_meta_model.py -> for running meta-model with metadata
│ ├── solver_optuna.py -> for running optuna tuning for LSTM base-model
│ ├── solver.py -> for running LSTM base-model
│ ├── test.py -> for testing LSTM and Regression base-models
│ ├── train_ens_model.py -> for training meta-model without metadata
│ ├── train_meta_model.py -> for training meta-model with metadata
│ ├── train_optuna.py -> for training LSTM base-model in optuna tuning
│ ├── train.py -> for training LSTM base-model
│ ├── tree_models.py -> XGBoost and Random Forest
│ ├── utils.py -> Definition of Metrics
│ └── visualizations.py -> Barplots, Violinplots and Boxplots
└── README.md
We used German COVID-19, Influenza and SARI Surveillance Data provided by the RKI, available here: https://github.com/robert-koch-institut We used French COVID-19 Surveillance Data provided by SPF, available here: https://www.data.gouv.fr/fr/organizations/sante-publique-france
The metadata was accessed via the Google Trends API. For access you have to apply here: https://docs.google.com/document/d/1Ybu3gHUHtcSXXzgDJ-m7PPto9tw0QG8A5oOBsFP2jao/edit?pli=1#heading=h.qye80d9e325z Then follow Danqi et al.: https://www.nature.com/articles/s41598-023-48096-3
For smoothing we applied a centered moving average over seven days (for the COVID-19 and metadata).
For running the scripts following configuration parameters need to be set (we also mention the parameters that we used):
- iterations - number of test windows (140 for COVID, 30 for Influenza, 80 for SARI)
- period - training period (70 for daily data, 52 for weekly data)
- size_SW - fitting window size (7 for daily data, 5 for weekly data)
- size_PW - prediction window size (14 for daily data, 2 for weekly data)
- exp_name - name of the experiment to be correctly stored
- num_trials - number of optuna trials
- num_epochs - number of epochs the LSTM or meta-model are trained
There are more, which are specific to the models, for example the prediction type (stacking or selection) for the meta-model. How to set these should become clear when looking at the corresponding scripts.
-
Hyperparameter Tuning Base-Models:
- main_optuna_LSTM.py
- main_optuna_trees.py
-
Base-Model and Baseline Ensemble Evaluation:
- main.py (check that the correct optuna databases are selected)
-
Hyperparameter Tuning for Meta-Model:
- main_optuna_ens.py
- main_optuna_meta.py*
-
Meta-Model Evaluation:
- main_ens_model.py
- main_meta.py*
*Only if metadata is available
The results for step 2 should be stored and are used for steps 3 and 4. Further the results for step 4 are stored.