Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage add: insert stages mid-dvc.yaml (via --force) #6171

Closed
BradyJ27 opened this issue Jun 14, 2021 · 8 comments
Closed

stage add: insert stages mid-dvc.yaml (via --force) #6171

BradyJ27 opened this issue Jun 14, 2021 · 8 comments
Labels
A: pipelines Related to the pipelines feature feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint

Comments

@BradyJ27
Copy link
Contributor

BradyJ27 commented Jun 14, 2021

As of right now, if you want to insert a new stage between two existing stages in a DVC pipeline, you must either edit the .yaml manually or remake the pipeline completely. See Discord discussion https://discord.com/channels/485586884165107732/485596304961962003/854010209596473364
My proposed feature it something along the lines of either:

  1. dvc stage add --location where location is the stage number you want it to be. E.g. if I have stages: x, y, z and I do dvc stage add -n t --location 2 the stages would now be x, t, y, z

OR

  1. dvc stage insert where you could insert a stage into an already existing pipeline.

Any thoughts/improvements are much appreciated.

Edit:

After further discussion, this is possible. The steps are dvc stage add and redefine the following stage with dvc stage add --force. Currently the .yaml file does not get updated, and the new stage gets added to the end. The desired behavior is the .yaml file to be either in order based on dependencies and outputs, OR overwriten stages should be moved to the bottom of the yaml file.

@jorgeorpinel jorgeorpinel transferred this issue from iterative/dvc.org Jun 14, 2021
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 14, 2021

Hi @BradyJ27

As of right now, if you want to insert a new stage between two existing stages in a DVC pipeline, you must either edit the .yaml manually or remake the pipeline completely.

This is not exactly the case. You can insert stages with dvc stage add now in 2 steps:

  1. stage add the new stage in between by using the correct dependencies and outputs
  2. Re-define the stage with that should follow with stage add --force, by using the new stage's outputs as dependencies.

The order in which they are presented in dvc.yaml is not important for DVC, only the way in which they connect (the dependency graph or DAG). But for your own benefit you can easily edit dvc.yaml so it's more readable to you if you prefer.

Does that clarify the situation?

@jorgeorpinel jorgeorpinel added the awaiting response we are waiting for your reply, please respond! :) label Jun 14, 2021
@BradyJ27
Copy link
Contributor Author

BradyJ27 commented Jun 14, 2021

Thank you for your response @jorgeorpinel ! This does clarify. I'm still thinking there might be some room for a feature here, maybe even something as small as rearranging the yaml file for readability. Any thoughts on making this into something or is this unnecessary? Thanks again!

@jorgeorpinel
Copy link
Contributor

An idea would be to move overwritten stages (stage add -f) to the bottom in dvc.yaml or even to some logical intermediate place as needed, instead of keeping them in their current location, which seems to be the current behavior (I just tried it).

@jorgeorpinel jorgeorpinel added feature request Requesting a new feature and removed awaiting response we are waiting for your reply, please respond! :) labels Jun 14, 2021
@jorgeorpinel jorgeorpinel changed the title Ability to insert stages mid-pipeline stage add: insert stages mid-dvc.yaml (via --force) Jun 14, 2021
@BradyJ27
Copy link
Contributor Author

Good idea! I'm thinking either moving forced stages to the bottom (like you said). Or moving stages based on dependencies and outputs. I will update the issue description.

@pmrowla pmrowla added the p3-nice-to-have It should be done this or next sprint label Jun 15, 2021
@skshetry
Copy link
Member

dvc.yaml does not have a sense of order or even the pipelines. A part of the pipeline could be very well in a different place within the dvc.yaml file or in dvc.yaml of different directories. These days we recommend users to edit dvc.yaml directly rather than using commands, so if you need order, editing is the correct way to do it.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 19, 2021

Def. agree. But I did notice that stage add -f preserves the existing stage order in dvc.yaml so there must be some special logic to do that already. What do you think about just removing it from where it it and writing it in the bottom of the file? That may be more intuitive.

@skshetry
Copy link
Member

skshetry commented Jun 20, 2021

But I did notice that stage add -f preserves the existing stage order in dvc.yaml so there must be some special logic to do that already.

Not really. stages is a dictionary/hashmap, so we just overwrite it if it already exists and the order gets preserved automatically. If there's no stage with the name, it gets added to the bottom. So, we are just exploiting the fact that the dictionaries in Python preserve insertion order.

What do you think about just removing it from where it it and writing it in the bottom of the file? That may be more intuitive.

We used to this before. See #4003 for the motivation to preserving order. But open to discussion for the change.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 22, 2021

the order gets preserved automatically

I see so if this is dependent on the implementation and that's how structures are handled by default by Python then that's probably the best behavior to keep.

So is this request valid, or should we decide not to go this route? After all it's unclear where to put stages that will be considered ideal in all cases, and users can always reorder the YAML structure manually.

@daavoo daavoo added the A: pipelines Related to the pipelines feature label Oct 20, 2021
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: pipelines Related to the pipelines feature feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

6 participants