Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: doc repro --glob and update targets arg info. #1983

Merged
merged 20 commits into from
Dec 19, 2020
Merged
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 40 additions & 14 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,22 @@ analyzing dependencies and <abbr>outputs</abbr> of the target stages.
```usage
usage: dvc repro [-h] [-q | -v] [-f] [-s] [-m] [--dry] [-i]
[-p] [-P] [-R] [--no-run-cache] [--force-downstream]
[--no-commit] [--downstream] [--pull]
[targets [targets ...]]
[--no-commit] [--downstream] [--pull] [--glob]
[targets [<target> ...]]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

positional arguments:
targets Stage or path to dvc.yaml or .dvc file to reproduce. Using -R,
directories to search for stages can also be given.
targets Limit command scope to these .dvc or dvc.yaml files,
or stage names.
```

> See [`targets`](#options) for more details.

## Description

`dvc repro` provides a way to regenerate data pipeline results, by restoring the
dependency graph (a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph))
implicitly defined by the stages listed in `dvc.yaml`. The commands defined in
these stages can then be executed in the correct order, reproducing pipeline
these stages are then executed in the correct order, reproducing pipeline
results.

> Pipeline stages are defined in a `dvc.yaml` file (either manually or by using
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -51,8 +53,8 @@ are run one after the other in the order they are defined. The failure of any
command will halt the remaining stage execution, and raises an error.

There are a few ways to restrict what will be regenerated by this command: by
specifying stages as `targets`, or by using `--single-item`, among other
options.
specifying specific reproduction [`targets`](#options), or by using certain
command [options](#options), such as `--single-item` or `--all-pipelines`.

> Note that stages without dependencies are considered _always changed_, so
> `dvc repro` always executes them.
Expand Down Expand Up @@ -97,17 +99,40 @@ up-to-date and only execute the final stage.

## Options

- `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no
changes were found. This executes all of the stages by default, but it can be
limited with the `targets` argument, or the `-s`, `-p` options.
- `targets` (optional argument) - specifies one or more `dvc.yaml` files or
specific stage name(s). `./dvc.yaml` by default. E.g.
`dvc repro pipes/linear/dvc.yaml`

Stage names must be defined in `./dvc.yaml`. E.g. `dvc repro train-vision`.
Stages in other `dvc.yaml` files can be given using by using a colon `:`
following the path to that file. E.g. `models/dvc.yaml:prepare`

Different things can be provided as targets depending on the flags used (more
details in each option), namely:

- With `-R` you can provide directory paths to search for `dvc.yaml` files in,
recursively.
- With `--glob`, you can use special patterns (using wildcards) to match
groups of stage names.

- `-R`, `--recursive` - looks for `dvc.yaml` files to reproduce in any
directories given as `targets`, and in their subdirectories. If there are no
directories among the targets, this option has no effect.

- `--glob` - causes the `targets` to be interpreted as wildcard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added -R and --glob to targets but also moved those options up right next to targets, so I still think it seems a bit redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Keep only

Different things can be provided as targets depending on the flags used (more
  details in each option), namely:

then a list of examples:

  • dvc reporo -R pipeline - will look recursively into ....
  • dvc reporo --glob ... - will execute stages that match pattern
  • dvc repro - will find dvc.yaml(s) and execute all stages

it's way simpler to grasp the idea, no need to read long long text (if some details are need let's try very hard to squeeze them into each item above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reporo? 😆 Sounds funny

OK good idea, applied! PTAL.

Note that issue #1614 is pretty evident this way, should we prioritize it?

image

[patterns](https://docs.python.org/3/library/glob.html) to match for stage
names. For example: `train-*` (certain stage names) or
`models/dvc.yaml:train-*` (stages in specific `dvc.yaml` file). Note that it
does not match patterns with the path, only to the stages present in the
specified file.

- `-s`, `--single-item` - reproduce only a single stage by turning off the
recursive search for changed dependencies. Multiple stages are executed
(non-recursively) if multiple stage names are given as `targets`.

- `-R`, `--recursive` - determines the stages to reproduce by searching each
target directory and its subdirectories for stages (in `dvc.yaml`) to inspect.
If there are no directories among the targets, this option is ignored.
- `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no
changes were found. This executes all of the stages by default, but it can be
limited with the `targets` argument, or the `-s`, `-p` options.

- `--no-commit` - do not store the outputs of this execution in the cache
(`dvc.yaml` and `dvc.lock` are still created or updated); useful to avoid
Expand All @@ -128,7 +153,8 @@ up-to-date and only execute the final stage.
to. Use `dvc dag <target>` to show the parent pipeline of a target.

- `-P`, `--all-pipelines` - reproduce all pipelines for all `dvc.yaml` files
present in the DVC project.
present in the DVC project. Specifying `targets` has no effects with this
option, as all possible targets are already included.

- `--no-run-cache` - execute stage commands even if they have already been run
with the same dependencies/outputs/etc. before.
Expand Down