Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: replaced dvc run references with dvc stage and dvc exp run in dvc dag #3218

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,19 @@ show the full project DAG.
### Directed acyclic graph

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example, console commands that take an
input and produce an outcome). The connections between stages are formed by the
<abbr>output</abbr> of one turning into the <abbr>dependency</abbr> of another.
A pipeline may produce intermediate data, and has a final result.
[stages](/doc/command-reference/stage) (for example, console commands that take
Comment on lines 23 to +24
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see we make more run-related changes here than in https://github.com/iterative/dvc.org/pull/3223/files#diff-7b6ee3d8c0ebb9e2a9dcf9077d16fda46133fb069f3879a3c38b0653fe00e850 but updating these links seems like a separate task. Let's create an issue to decide where should the "stage" definition be (probably a guide?) and then update links throughout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Closed.

an input and produce an outcome). The connections between stages are formed by
the <abbr>output</abbr> of one turning into the <abbr>dependency</abbr> of
another. A pipeline may produce intermediate data, and has a final result.

Data science and machine learning pipelines typically start with large raw
datasets, include intermediate featurization and training stages, and produce a
final model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).
written manually or built using the helper command `dvc stage add`. This allows
DVC to restore one or more pipelines later (see `dvc exp run` and `dvc repro`).
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.
Expand Down