Enhancement: Async gbasf2 submission and/or download to avoid delay of scheduling #129
Labels
enhancement
New feature or request
gbasf2
Concerns the gbasf2/grid b2luigi wrapper
help wanted
Extra attention is needed
The gbasf2 submission and dataset download operations take a long time. Even when remote workers work in parallel, scheduling happens by default in serial. (Except when the
parallel_scheduling
config option is set tuetrue
. However, this didn't work for me, if you had success with it please message me.) The long gbasf2 submission and the dataset download seem to block the scheduling until that operation is done. This is something that I can live with, since usually only few gbasf2 projects are required, but it would be cool to do something about it.This gbasf2 dataset download is currently triggered in the
get_job_status
method as a subroutine call when the gbasf2 project is all done. Maybe we can call initiate the download as an async subprocess and only mark the job as really complete when the download is done. At least when thegbasf2_download_dataset
b2luigi option is set.Something similar might be done for the submission.
This is not easy and I don't know if we can do both cases. The subprocess sometimes might require user input, e.g. and ca-certificate or ssh key password, so this should still work. And error handling should also be thought about. As I have not much experience with async subprocesses, I'd be happy about help.
If I'm just too stupid for
parallel_scheduling
and with that properly enabled these blocking operations are no problem, then this can be closed. (Thoughparallel_scheduling
also only works for pickable tasks.)The text was updated successfully, but these errors were encountered: