Optimize/configure experiment commits for which to push cache #6593

dberenbaum · 2021-09-10T20:01:31Z

dberenbaum
Sep 10, 2021
Collaborator

dvc exp push/pull transfer all cached outputs for all experiment commits on which the experiment is based, including the baseline commit (see #6592).

Potential issues:

Might be unexpected. I don't think it's documented.
Pushing lots of large data could make pushing/pulling experiments slow.
It's inconsistent with pushing/pulling the cache for regular Git commits, including those generated by dvc exp branch.

I can understand the default behavior because otherwise the user risks losing all of the cached data from the experiment commits if the local experiments get destroyed. However, if a user wants to reproduce or build on top of an existing experiment commit, it may be simpler to ask them to follow the typical workflow of having to dvc pull first. Users may want granular control to say to pull the cache for the last n commits of the branch/experiment.

Is this really even an experiment-specific functionality? Being able to push the cache for multiple commits in a regular Git branch would enable more consistency between pushing/pulling experiments and dvc exp branch branches and seems useful outside of experiments, too. Even for a single commit, this would resolve user complaints of having to separately do git push and dvc push.

@pmrowla @daavoo @iesahin

iesahin · 2021-09-22T09:27:34Z

iesahin
Sep 22, 2021

I think this is a real issue. One of the motivations behind #6549. Having a granular control over "cache-depth" is also important, though not as immediate as experiments.

I believe we shouldn't push any artifact to a Git repository if it's not already tracked by Git. We are trying to push 50+MB model files to Git and only notice this when Github refuses. These artifacts can quickly fill up the repository space.

1 reply

dberenbaum Sep 22, 2021
Collaborator Author

@iesahin This discussion was intended to focus on pushing the DVC cache when pushing experiments, not pushing Git commits. It's not intended for experiments to push to Git any artifact not already tracked by Git, nor 50+MB model files. These outputs are intended to be cached in DVC. Maybe we can discuss the workflow that requires pushing large untracked files to Git in #6549?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize/configure experiment commits for which to push cache #6593

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Optimize/configure experiment commits for which to push cache #6593

dberenbaum Sep 10, 2021 Collaborator

Replies: 1 comment · 1 reply

iesahin Sep 22, 2021

dberenbaum Sep 22, 2021 Collaborator Author

dberenbaum
Sep 10, 2021
Collaborator

Replies: 1 comment 1 reply

iesahin
Sep 22, 2021

dberenbaum Sep 22, 2021
Collaborator Author