Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: doc repro --glob and update targets arg info. #1983

Merged
merged 20 commits into from
Dec 19, 2020
Merged
Changes from 12 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 51 additions & 30 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,26 @@ analyzing dependencies and <abbr>outputs</abbr> of the target stages.
```usage
usage: dvc repro [-h] [-q | -v] [-f] [-s] [-m] [--dry] [-i]
[-p] [-P] [-R] [--no-run-cache] [--force-downstream]
[--no-commit] [--downstream] [--pull]
[targets [targets ...]]
[--no-commit] [--downstream] [--pull] [--glob]
[targets [<target> ...]]
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

positional arguments:
targets Stage or path to dvc.yaml or .dvc file to reproduce. Using -R,
directories to search for stages can also be given.
targets Limit command scope to these .dvc or dvc.yaml files,
or stage names.
```

> See [`targets`](#options) for more details.

## Description

`dvc repro` provides a way to regenerate data pipeline results, by restoring the
dependency graph (a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph))
implicitly defined by the stages listed in `dvc.yaml`. The commands defined in
these stages can then be executed in the correct order, reproducing pipeline
results.
these stages are then be executed in the correct order.

For stages with multiple commands (having a list in the `cmd` field), commands
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
are run one after the other in the order they are defined. The failure of any
command will halt the remaining stage execution, and raises an error.

> Pipeline stages are defined in a `dvc.yaml` file (either manually or by using
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
> `dvc run`) while initial data dependencies can be registered with `dvc add`.
Expand All @@ -37,30 +42,21 @@ and <abbr>caches</abbr> the pipeline's <abbr>outputs</abbr> along the way.
💡 For convenience, a Git hook is available to remind you to `dvc repro` when
needed after a `git commit`. See `dvc install` for more details.

[Stage](/doc/command-reference/run) outputs are deleted from the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
<abbr>workspace</abbr> before executing the stage commands that produce them.
`dvc repro` does not run `dvc fetch`, `dvc pull` or `dvc checkout` to get data
files, intermediate or final results (except if the `--pull` option is used).

By default, this command checks all [pipeline](/doc/command-reference/dag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all not relevant?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is redundant (with the new targets text a well as the first p in the Description) and slightly wrong (but that will be corrected for #2024). Removing it aims to make the Description shorter so that the whole doc doesn't grow too much as we introduced the targets text.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can easily stash the change if you prefer and leave it for later. Lmk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's make PRs focused if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect some changes here, but it probably should be only the paragraph about targets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I reverted this. I'm afraid that if we just keep adding paragraphs without reviewing the context we'll end up with lots of redundancy and a text that may be unnecessary long.

stages to determine which ones have changed. Then it executes the corresponding
commands (`cmd` field of `dvc.yaml`). [Stage](/doc/command-reference/run)
outputs are deleted from the <abbr>workspace</abbr> before executing the stage
commands that produce them.

For stages with multiple commands (having a list in the `cmd` field), commands
are run one after the other in the order they are defined. The failure of any
command will halt the remaining stage execution, and raises an error.

There are a few ways to restrict what will be regenerated by this command: by
specifying stages as `targets`, or by using `--single-item`, among other
options.
specifying specific reproduction [`targets`](#options), or by using certain
command [options](#options), such as `--single-item`.

> Note that stages without dependencies are considered _always changed_, so
> `dvc repro` always executes them.

It stores all the data files, intermediate or final results in the
`dvc repro` saves all the data files, intermediate or final results into the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
<abbr>cache</abbr> (unless the `--no-commit` option is used), and updates the
hash values of changed dependencies and outputs in the `dvc.lock` and `.dvc`
files.
hash values of changed outputs in the `dvc.lock` and `.dvc` files.

### Parallel stage execution

Expand Down Expand Up @@ -97,6 +93,31 @@ up-to-date and only execute the final stage.

## Options

- `targets` (optional argument) - one or more file or directory paths (to `.dvc`
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
or `dvc.yaml` files), or stage name(s) (`./dvc.yaml` by default). DVC will
reproduce them as detailed below.

- For **`dvc.yaml` files**, their [pipeline(s)](/doc/command-reference/dag)
are checked for changes, and reproduced as needed (explained in the command
[description](#description) above). E.g. `dvc repro pipes/linear/dvc.yaml`

- **Stage names** must be defined in `./dvc.yaml`. E.g.
`dvc repro train-vision`. Stages in other `dvc.yaml` files can be given
using by using a colon `:` following the path to that file. E.g.
`models/dvc.yaml:prepare`

- Files and directories tracked by **`.dvc` files** given as `targets` are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's skip this I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I would like that too but are you sure? The help output (and Synopsis above) mention .dvc files so if we don't explain it here, it'd be up to the user to assume what will happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synopsis is a copy/paste from DVC which should probably change, so, let's not complicate. The point is not cover everything to the very small detail - the point is to focus on important stuff and convey it in the simplest possible format (e.g. people don't read long texts, unless they want to dive in)- thus short self-explainable examples are better

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Removed info about .dvc file targets from this ref.

updated (same as `dvc add`). E.g. `dvc repro data.dvc`

> Note that [frozen](/doc/command-reference/freeze) `.dvc` files are
> ignored.

- `--glob` - causes the `targets` to be interpreted as wildcard
[patterns](https://docs.python.org/3/library/glob.html) to match for stages.
For example: `train-*` (certain stage names) or `models/dvc.yaml:train-*`
(stages in specific `dvc.yaml` file). Note that it does not match patterns
with the path, only to the stages present in the specified file.

- `-f`, `--force` - reproduce a pipeline, regenerating its results, even if no
changes were found. This executes all of the stages by default, but it can be
limited with the `targets` argument, or the `-s`, `-p` options.
Expand All @@ -105,15 +126,21 @@ up-to-date and only execute the final stage.
recursive search for changed dependencies. Multiple stages are executed
(non-recursively) if multiple stage names are given as `targets`.

- `-R`, `--recursive` - determines the stages to reproduce by searching each
target directory and its subdirectories for stages (in `dvc.yaml`) to inspect.
If there are no directories among the targets, this option is ignored.
- `-R`, `--recursive` - looks for `dvc.yaml` files to reproduce in any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is directly related to repro targets. It used to read "determines the stages ... by searching ... for stages". Isn't this a good opportunity to fix that? TBH I already have a mountain of stashed changes so anything I can't get into existing PRs will probably get lost until we happen to notice it again 🙁

directories given as `targets`, and in their subdirectories. If there are no
directories among the targets, this option has no effect.

- `--no-commit` - do not store the outputs of this execution in the cache
(`dvc.yaml` and `dvc.lock` are still created or updated); useful to avoid
caching unnecessary data when exploring different data or stages. Use
`dvc commit` to finish the operation.

- `-p`, `--pipeline` - reproduce the entire pipelines that the `targets` belong
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
to. Use `dvc dag` to show the parent pipeline of a target.

- `-P`, `--all-pipelines` - reproduce all pipelines for all `dvc.yaml` files
present in the DVC project.

- `-m`, `--metrics` - show metrics after reproduction. The target pipelines must
have at least one metrics file defined either with the `dvc metrics` command,
or by the `-M` or `-m` options of the `dvc run` command.
Expand All @@ -124,12 +151,6 @@ up-to-date and only execute the final stage.
- `-i`, `--interactive` - ask for confirmation before reproducing each stage.
The stage is only executed if the user types "y".

- `-p`, `--pipeline` - reproduce the entire pipelines that the `targets` belong
to. Use `dvc dag <target>` to show the parent pipeline of a target.

- `-P`, `--all-pipelines` - reproduce all pipelines for all `dvc.yaml` files
present in the DVC project.

- `--no-run-cache` - execute stage commands even if they have already been run
with the same dependencies/outputs/etc. before.

Expand Down