Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term: remove Dvcfile from repro cmd ref. #1504

Merged
merged 8 commits into from
Jul 5, 2020
57 changes: 22 additions & 35 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ usage: dvc repro [-h] [-q | -v] [-f] [-s] [-c <path>] [-m] [--dry] [-i]
[--no-commit] [--downstream] [targets [targets ...]]

positional arguments:
targets Stage or .dvc file to reproduce. 'Dvcfile' by default.
targets Stage or .dvc file to reproduce
```

## Description
Expand All @@ -40,9 +40,6 @@ There's a few ways to restrict the stages that will be regenerated by this
command: by specifying stage file `targets`, or by using the `--single-item`,
`--cwd`, or other options.

If specific [DVC-files](/doc/user-guide/dvc-files-and-directories) (`targets`)
are omitted, `Dvcfile` will be assumed.

`dvc repro` does not run `dvc fetch`, `dvc pull` or `dvc checkout` to get data
files, intermediate or final results.

Expand Down Expand Up @@ -101,8 +98,7 @@ only execute the final stage.
(non-recursively) if multiple stage files are given as `targets`.

- `-c <path>`, `--cwd <path>` - directory within the project to reproduce from.
If no `targets` are given, it attempts to use `Dvcfile` in the specified
directory. Instead of using `--cwd`, one can alternately specify a target in a
Instead of using `--cwd`, one can alternately specify a target in a
subdirectory as `path/to/target.dvc`. This option can be useful for example
with subdirectories containing a separate pipeline that can either be
reproduced as part of the pipeline in the parent directory, or as an
Expand Down Expand Up @@ -169,7 +165,7 @@ only execute the final stage.
## Examples

For simplicity, let's build a pipeline defined below. (If you want get your
hands-on something more real, see this shot
hands-on something more real, see this short
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[pipeline tutorial](/doc/tutorials/pipelines)). It takes this `text.txt` file:

```
Expand All @@ -184,18 +180,13 @@ best
And runs a few simple transformations to filter and count numbers:

```dvc
$ dvc run -f filter.dvc -d text.txt -o numbers.txt \
$ dvc run -n filter -d text.txt -o numbers.txt \
"cat text.txt | egrep '[0-9]+' > numbers.txt"

$ dvc run -f Dvcfile -d numbers.txt -d process.py -M count.txt \
$ dvc run -n count -d numbers.txt -d process.py -M count.txt \
"python process.py numbers.txt > count.txt"
```

> Note that using `-f Dvcfile` with `dvc run` above is optional, the stage file
> name would otherwise default to `count.txt.dvc`. We use `Dvcfile` in this
> example because that's the default stage file name `dvc repro` will read
> without having to provide any `targets`.

Where `process.py` is a script that, for simplicity, just prints the number of
lines:

Expand All @@ -213,23 +204,23 @@ The result of executing these `dvc run` commands should look like this:
```dvc
$ tree
.
├── Dvcfile <---- second stage with a default DVC name
├── count.txt <---- result: "2"
├── filter.dvc <---- first stage
├── dvc.lock <---- file to record pipeline state
├── dvc.yaml <---- file containing list of stages.
├── numbers.txt <---- intermediate result of the first stage
├── process.py <---- code that implements data transformation
└── text.txt <---- text file to process
```

You may want to check the contents of `Dvcfile` and `count.txt` for later
You may want to check the contents of `dvc.lock` and `count.txt` for later
reference.

Ok, now, let's run the `dvc repro` command (remember, by default it reproduces
<abbr>outputs</abbr> tracked in `Dvcfile`, in this case `count.txt`):
Ok, now, let's run the `dvc repro` command:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'filter' didn't change, skipping
Stage 'count' didn't change, skipping
Data and pipelines are up to date.
```

Expand All @@ -247,17 +238,14 @@ If we now run `dvc repro`, we should see this:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'Dvcfile' changed.
Reproducing 'Dvcfile'
Running command:
python process.py numbers.txt > count.txt
Output 'count.txt' doesn't use cache. Skipping saving.
Saving information to 'Dvcfile'.
Stage 'filter' didn't change, skipping
Running stage 'count' with command:
python3 process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

You can now check that `Dvcfile` and `count.txt` have been updated with the new
information and updated dependency/output file hash values, and a new result,
You can now check that `dvc.lock` and `count.txt` have been updated with the new
information: updated dependency/output file hash values, and a new result,
respectively.

## Example: Downstream
Expand All @@ -277,14 +265,13 @@ Now, using the `--downstream` option results in the following output:

```dvc
$ dvc repro --downstream
WARNING: assuming default target 'Dvcfile'.
Data and pipelines are up to date.
```

The reason being that the `text.txt` file is a dependency in the target
[DVC-file](/doc/user-guide/dvc-files-and-directories) (`Dvcfile` by default).
This `Dvcfile` stage is dependent on `filter.dvc`, which happens first in this
pipeline (shown in the following figure):
The reason being that the `text.txt` file is a dependency in the last stage of
the pipeline (used by default by `dvc repro`), This last `count` stage is
dependent on `filter` stage, which happens first in this pipeline (shown in the
following figure):

```dvc
$ dvc dag
Expand All @@ -296,6 +283,6 @@ $ dvc dag
*
*
.---------.
| Dvcfile |
| count |
`---------'
```