Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term: remove Dvcfile from repro cmd ref. #1504

Merged
merged 8 commits into from
Jul 5, 2020
56 changes: 24 additions & 32 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ usage: dvc repro [-h] [-q | -v] [-f] [-s] [-c <path>] [-m] [--dry] [-i]
[--no-commit] [--downstream] [targets [targets ...]]

positional arguments:
targets Stage or .dvc file to reproduce. 'Dvcfile' by default.
targets Stage to reproduce.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description
Expand All @@ -24,6 +24,8 @@ the dependency graph (a
by the [stage files](/doc/command-reference/run) (DVC-files with dependencies)
that are found in the <abbr>project</abbr>. The commands defined in these stages
can then be executed in the correct order, reproducing pipeline results.
`dvc repro` relies on the DAG definition that it reads from `dvc.yaml`, and uses
`dvc.lock` to determine what exactly needs to be run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but this is out of scope for this PR. The whole cmd ref. needs to be rewritten so no use in adding a small update here, probably. Unless it was needed to explain the Dvcfile removal but I don' think that's the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Sure! will add it later when updating command reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1572


> Pipeline stages are typically defined using the `dvc run` command, while
> initial data dependencies can be registered by the `dvc add` command.
Expand All @@ -40,9 +42,6 @@ There's a few ways to restrict the stages that will be regenerated by this
command: by specifying stage file `targets`, or by using the `--single-item`,
`--cwd`, or other options.

If specific [DVC-files](/doc/user-guide/dvc-files-and-directories) (`targets`)
are omitted, `Dvcfile` will be assumed.

`dvc repro` does not run `dvc fetch`, `dvc pull` or `dvc checkout` to get data
files, intermediate or final results.

Expand Down Expand Up @@ -101,8 +100,7 @@ only execute the final stage.
(non-recursively) if multiple stage files are given as `targets`.

- `-c <path>`, `--cwd <path>` - directory within the project to reproduce from.
If no `targets` are given, it attempts to use `Dvcfile` in the specified
directory. Instead of using `--cwd`, one can alternately specify a target in a
Instead of using `--cwd`, one can alternately specify a target in a
subdirectory as `path/to/target.dvc`. This option can be useful for example
with subdirectories containing a separate pipeline that can either be
reproduced as part of the pipeline in the parent directory, or as an
Expand Down Expand Up @@ -169,7 +167,7 @@ only execute the final stage.
## Examples

For simplicity, let's build a pipeline defined below. (If you want get your
hands-on something more real, see this shot
hands-on something more real, see this short
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[pipeline tutorial](/doc/tutorials/pipelines)). It takes this `text.txt` file:

```
Expand All @@ -184,17 +182,15 @@ best
And runs a few simple transformations to filter and count numbers:

```dvc
$ dvc run -f filter.dvc -d text.txt -o numbers.txt \
$ dvc run -n filter -d text.txt -o numbers.txt \
"cat text.txt | egrep '[0-9]+' > numbers.txt"

$ dvc run -f Dvcfile -d numbers.txt -d process.py -M count.txt \
$ dvc run -n count -d numbers.txt -d process.py -M count.txt \
"python process.py numbers.txt > count.txt"
```

> Note that using `-f Dvcfile` with `dvc run` above is optional, the stage file
> name would otherwise default to `count.txt.dvc`. We use `Dvcfile` in this
> example because that's the default stage file name `dvc repro` will read
> without having to provide any `targets`.
> Note that a stage name is required when executing `dvc run`. It can be
> specified with `-n` (`--name`) option as we did above.

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
Where `process.py` is a script that, for simplicity, just prints the number of
lines:
Expand All @@ -213,23 +209,23 @@ The result of executing these `dvc run` commands should look like this:
```dvc
$ tree
.
├── Dvcfile <---- second stage with a default DVC name
├── count.txt <---- result: "2"
├── filter.dvc <---- first stage
├── dvc.lock <---- file to record pipeline state
├── dvc.yaml <---- file containing list of stages.
├── numbers.txt <---- intermediate result of the first stage
├── process.py <---- code that implements data transformation
└── text.txt <---- text file to process
```

You may want to check the contents of `Dvcfile` and `count.txt` for later
You may want to check the contents of `dvc.lock` and `count.txt` for later
reference.

Ok, now, let's run the `dvc repro` command (remember, by default it reproduces
<abbr>outputs</abbr> tracked in `Dvcfile`, in this case `count.txt`):
Ok, now, let's run the `dvc repro` command:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'filter' didn't change, skipping
Stage 'count' didn't change, skipping
Data and pipelines are up to date.
```

Expand All @@ -247,16 +243,13 @@ If we now run `dvc repro`, we should see this:

```dvc
$ dvc repro
WARNING: assuming default target 'Dvcfile'.
Stage 'Dvcfile' changed.
Reproducing 'Dvcfile'
Running command:
python process.py numbers.txt > count.txt
Output 'count.txt' doesn't use cache. Skipping saving.
Saving information to 'Dvcfile'.
Stage 'filter' didn't change, skipping
Running stage 'count' with command:
python3 process.py numbers.txt > count.txt
Updating lock file 'dvc.lock'
```

You can now check that `Dvcfile` and `count.txt` have been updated with the new
You can now check that `dvc.lock` and `count.txt` have been updated with the new
information and updated dependency/output file hash values, and a new result,
sarthakforwet marked this conversation as resolved.
Show resolved Hide resolved
respectively.

Expand All @@ -277,14 +270,13 @@ Now, using the `--downstream` option results in the following output:

```dvc
$ dvc repro --downstream
WARNING: assuming default target 'Dvcfile'.
Data and pipelines are up to date.
```

The reason being that the `text.txt` file is a dependency in the target
sarthakforwet marked this conversation as resolved.
Show resolved Hide resolved
[DVC-file](/doc/user-guide/dvc-files-and-directories) (`Dvcfile` by default).
This `Dvcfile` stage is dependent on `filter.dvc`, which happens first in this
pipeline (shown in the following figure):
[DVC-file](/doc/user-guide/dvc-files-and-directories). This `count` stage is
dependent on `filter` stage, which happens first in this pipeline (shown in the
sarthakforwet marked this conversation as resolved.
Show resolved Hide resolved
following figure):

```dvc
$ dvc dag
Expand All @@ -296,6 +288,6 @@ $ dvc dag
*
*
.---------.
| Dvcfile |
| count |
`---------'
```