Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make exp push smoother in CI #9612

Closed
dberenbaum opened this issue Jun 15, 2023 · 11 comments
Closed

Make exp push smoother in CI #9612

dberenbaum opened this issue Jun 15, 2023 · 11 comments
Assignees
Labels
p1-important Important, aka current backlog of things to do

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented Jun 15, 2023

Related to #9595 and #8844.

We had #8844 to work on making exp run easier in CI but it was probably too early for us to think about exp push. It would be good to coordinate this with https://github.com/iterative/studio/issues/5594.

@dberenbaum dberenbaum added the p1-important Important, aka current backlog of things to do label Jun 15, 2023
@dberenbaum
Copy link
Collaborator Author

Some steps that are still required (based on https://github.com/iterative/lstm_seq2seq/blob/6d869bd219bcbf5b3d5a8c8143c727a86895f689/.github/workflows/cml.yaml):

  • Set id-token: write
  • Include REPO_TOKEN env var
  • Call cml ci
  • Configure cloud credentials to push to remote

@pmrowla
Copy link
Contributor

pmrowla commented Jun 20, 2023

Some steps that are still required

All of these seem like reasonable things to require for GHA authenticated push workflow. What exactly are we looking to streamline here?

Maybe there could be a standardized re-usable action (like iterative/setup-dvc-exp) that would supercede iterative/setup-cml and does these steps with some sane defaults, but that doesn't really have anything to do with exp push behavior. (If anything, it feels like there should just be a flag for the existing setup-cml action that does this instead)

@dberenbaum
Copy link
Collaborator Author

Yeah, no concrete ideas from me yet @pmrowla. Just trying to document the current state so at least we are clear on that.

Agreed that consolidating some of these steps into a single dvc or cml action might make sense. Open to other ideas.

@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Jun 22, 2023

  • Set id-token: write

When using a cml runner, a personal access token is required, which can be reused for the git credentials. id-token: write only impacts the built-in GITHUB_TOKEN, so it isn't needed when using a personal access token.

See https://github.com/iterative/studio/pull/6438

Update: see https://iterativeai.slack.com/archives/C01900GSB4J/p1687442173552329

@dacbd
Copy link

dacbd commented Aug 3, 2023

I don't think we should depreciate setup-dvc completely, there are also some other changes that we need to do to the action that will make it more stable in the long term.

If this is a focus for you guys currently then I'll go ahead and address the other setup-cml issues and we can release a v2 of the action.

@dberenbaum
Copy link
Collaborator Author

Related: iterative/scmrepo#262.

@dacbd Not sure if I missed something, but we were discussing in the DVC sprint meeting having both setup-cml and setup-dvc do cml ci to save users a step.

@pmrowla
Copy link
Contributor

pmrowla commented Aug 3, 2023

@dacbd can you clarify why we need a separate setup-dvc action? Using DVC in CI essentially requires CML (and running cml ci), but having the two separate setup-dvc and setup-cml actions increases the chance of conflicts where setup-dvc and setup-cml try to install different versions of CML.

setup-dvc also assumes that the user wants to install DVC through the action, but in practice we see a lot of users that only use setup-cml and then install DVC themselves (whether that's because they need a pip install to use the python API or because they are using a CML image that already includes DVC). In those cases, it's not clear to the users that they still need to run cml ci in order to make DVC usable. IMO it makes more sense to just have a single action that does the configuration required (i.e. running cml ci with no additional args) and then provide an input option to install DVC if needed.

@dacbd
Copy link

dacbd commented Aug 3, 2023

@dberenbaum @pmrowla maybe I'm missing some of that context discussed by the DVC team, so at the end of the day let my know what you guys need and I'll make it happen.

That said my purposal is to keep the actions separate, and if anything include an option in setup-dvc to install cml. Mentally I'm placing the focus on the dvc name as it what we have slowing collecting around.

So my purposal for a setup-dvc@v2 is the following:

  • extract the required extra header logic to the js action itself so we don't need to package another cli tool just to use dvc properly in ci
  • provide a bypass flag so user can simply pip install the version of dvc they want
  • provide an option to also install cml for user who need more than just cml ci

Additionally what does this look like for the other tools in the long term gto/mlem perhaps we should push something more generic like my yet to advertized https://github.com/iterative/setup-tools ?

@dberenbaum
Copy link
Collaborator Author

Additionally what does this look like for the other tools in the long term gto/mlem perhaps we should push something more generic like my yet to advertized https://github.com/iterative/setup-tools ?

I would vote to just reuse setup-dvc for that purpose given our current consolidation effort, but admittedly that opinion is based on not having looked at all at the implementation.

@dberenbaum
Copy link
Collaborator Author

  • extract the required extra header logic to the js action itself so we don't need to package another cli tool just to use dvc properly in ci

If we can also do this in setup-cml and/or any other setup actions, that would be great, although @dacbd mentioned he would have to check if running this logic twice would cause any issues.

@dberenbaum
Copy link
Collaborator Author

  • Call cml ci

Thanks to the PR from @dacbd in iterative/setup-dvc#55, this shouldn't be needed anymore (cc @daavoo if you are updating the example repos). The other items listed above seem necessary unless/until we change the CI approach to rely on cloud experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants