Skip to content

Commit

Permalink
start: update links from all docs
Browse files Browse the repository at this point in the history
on top of #3941
  • Loading branch information
jorgeorpinel committed Sep 14, 2022
1 parent d20cae5 commit 28c5cc4
Show file tree
Hide file tree
Showing 28 changed files with 81 additions and 81 deletions.
5 changes: 2 additions & 3 deletions content/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,8 @@ $ dvc diff

Let's checkout the
[2-track-data](https://github.com/iterative/example-get-started/releases/tag/2-track-data)
tag, corresponding to the
[Data Versioning](/doc/start/data-and-model-versioning) _Get Started_ chapter,
right after we added `data.xml` file with DVC:
tag, corresponding to the [Data Versioning](/doc/start/data-management) _Get
Started_ chapter, right after we added `data.xml` file with DVC:

```dvc
$ git checkout 2-track-data
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/exp/branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ version.
https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging
[regular commits]: /doc/user-guide/experiment-management/persisting-experiments
[checkpoint experiments]: /doc/command-reference/exp/run#checkpoints
[stored and shared]: /doc/start/data-and-model-versioning#storing-and-sharing
[stored and shared]: /doc/start/data-management#storing-and-sharing

## Options

Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/exp/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,8 @@ $ dvc exp show --all-branches --pcp --sort-by roc_auc

![](/img/ref_pcp_filter.png) _Excluded avg_prec column_

📖 See [Metrics, Parameters, and Plots](/doc/start/metrics-parameters-plots) for
an introduction to parameters, metrics, plots.
📖 See
[Metrics, Parameters, and Plots](/doc/start/data-management/metrics-parameters-plots)
for an introduction to parameters, metrics, plots.

[regex]: https://regexone.com/
2 changes: 1 addition & 1 deletion content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ file or directory from. It also has the `--out` option to specify the location
to place the target data within the workspace. Combining these two options
allows us to do something we can't achieve with the regular `git checkout` +
`dvc checkout` process – see for example the
[Get Older Data Version](/doc/start/data-and-model-versioning#switching-between-versions)
[Get Older Data Version](/doc/start/data-management#switching-between-versions)
chapter of our _Get Started_.

Let's use the
Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,8 +194,8 @@ $ git checkout 3-config-remote
## Example: Tracking a file from the web

An advanced alternate to the intro of the
[Versioning Basics](/doc/start/data-and-model-versioning) part of the _Get
Started_ is to use `dvc import-url`:
[Versioning Basics](/doc/start/data-management) part of the _Get Started_ is to
use `dvc import-url`:

```dvc
$ dvc import-url https://data.dvc.org/get-started/data.xml \
Expand Down Expand Up @@ -282,8 +282,8 @@ And instead of an `etag` we have an `md5` hash value. We did this so its easy to
edit the data file.

Let's now manually reproduce the
[data processing part](/doc/start/data-pipelines) of the _Get Started_. Download
the example source code archive and unzip it:
[data processing part](/doc/start/data-management/data-pipelines) of the _Get
Started_. Download the example source code archive and unzip it:

```dvc
$ wget https://code.dvc.org/get-started/code.zip
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ data `path`, and the `outs` field contains the corresponding local path in the
<abbr>workspace</abbr>. It records enough metadata about the imported data to
enable DVC to efficiently determine whether the local copy is out of date.

To actually [version the data](/doc/start/data-and-model-versioning), `git add`
(and `git commit`) the import `.dvc` file.
To actually [version the data](/doc/start/data-management), `git add` (and
`git commit`) the import `.dvc` file.

⚠️ Relevant notes and limitation:

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ to `dvc config cache.type`).
> Note that pulling data does not affect code, `dvc.yaml`, or `.dvc` files.
> Those should be downloaded with `git pull`.
[data sharing]: /doc/start/data-and-model-versioning#storing-and-sharing
[data sharing]: /doc/start/data-management#storing-and-sharing

It has the same effect as running `dvc fetch` and `dvc checkout`:

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/push.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ the most common use cases for these commands.
> Those should be uploaded with `git push`. `dvc import` data is also ignored by
> this command.
[data sharing]: /doc/start/data-and-model-versioning#storing-and-sharing
[data sharing]: /doc/start/data-management#storing-and-sharing

The `dvc remote` used is determined in order, based on

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/remote/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ more details.
### Managing remote storage

> For an intro on DVC remote usage see
> [Storing and sharing data](/doc/start/data-and-model-versioning#storing-and-sharing).
> [Storing and sharing data](/doc/start/data-management#storing-and-sharing).
The [add](/doc/command-reference/remote/add),
[default](/doc/command-reference/remote/default),
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ up-to-date and only execute the final stage.
## Examples

> To get hands-on experience with data science and machine learning pipelines,
> see [Get Started: Data Pipelines](/doc/start/data-pipelines).
> see [Get Started: Data Pipelines](/doc/start/data-management/data-pipelines).
Let's build and reproduce a simple pipeline. It takes this `text.txt` file:

Expand Down
Empty file.
10 changes: 5 additions & 5 deletions content/docs/start/data-management/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ want to run (`python src/prepare.py data/data.xml`), its

DVC uses these metafiles to track the data used and produced by the stage, so
there's no need to use `dvc add` on `data/prepared`
[manually](/doc/start/data-and-model-versioning).
[manually](/doc/start/data-management).

<details id="stage-expand-to-see-what-happens-under-the-hood">

Expand All @@ -90,9 +90,9 @@ The command options used above mean the following:

- `-p prepare.seed,prepare.split` defines special types of dependencies —
[parameters](/doc/command-reference/params). We'll get to them later in the
[Metrics, Parameters, and Plots](/doc/start/metrics-parameters-plots) page,
but the idea is that the stage can depend on field values from a parameters
file (`params.yaml` by default):
[Metrics, Parameters, and Plots](/doc/start/data-management/metrics-parameters-plots)
page, but the idea is that the stage can depend on field values from a
parameters file (`params.yaml` by default):

```yaml
prepare:
Expand Down Expand Up @@ -149,7 +149,7 @@ Once you added a stage, you can run the pipeline with `dvc repro`. Next, you can
use `dvc push` if you wish to save all the data [to remote storage] (usually
along with `git commit` to version DVC metafiles).

[to remote storage]: /doc/start/data-and-model-versioning#storing-and-sharing
[to remote storage]: /doc/start/data-management#storing-and-sharing

## Dependency graphs (DAG)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ https://youtu.be/bu3l75eQlQo

First, let's see what is the mechanism to capture values for these ML
attributes. Let's add a final evaluation stage to our
[pipeline from before](/doc/start/data-pipelines):
[pipeline from before](/doc/start/data-management/data-pipelines):

```dvc
$ dvc run -n evaluate \
Expand Down Expand Up @@ -192,9 +192,9 @@ featurize:
### ⚙️ Expand to recall how it was generated.

The `featurize` stage
[was created](/doc/start/data-pipelines#dependency-graphs-dag) with this
`dvc run` command. Notice the argument sent to the `-p` option (short for
`--params`):
[was created](/doc/start/data-management/data-pipelines#dependency-graphs-dag)
with this `dvc run` command. Notice the argument sent to the `-p` option (short
for `--params`):

```dvc
$ dvc run -n featurize \
Expand Down
35 changes: 18 additions & 17 deletions content/docs/start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,23 +50,24 @@ in two independent trails:

### Data Management

- [**Data and model versioning**](/doc/start/data-and-model-versioning) (try
this next) is the base layer of DVC for large files, datasets, and machine
learning models. Use a regular Git workflow, but without storing large files
in the repo (think "Git for data"). Data is stored separately, which allows
for efficient sharing.

- [**Data and model access**](/doc/start/data-and-model-access) shows how to use
data artifacts from outside of the project and how to import data artifacts
from another DVC project. This can help to download a specific version of an
ML model to a deployment server or import a model to another project.

- [**Data pipelines**](/doc/start/data-pipelines) describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Think "Makefiles for data and ML projects" done right.

- [**Metrics, parameters, and plots**](/doc/start/metrics-parameters-plots) can
be attached to pipelines. These let you capture, navigate, and evaluate ML
- [**Data and model versioning**](/doc/start/data-management) (try this next) is
the base layer of DVC for large files, datasets, and machine learning models.
Use a regular Git workflow, but without storing large files in the repo (think
"Git for data"). Data is stored separately, which allows for efficient
sharing.

- [**Data and model access**](/doc/start/data-management/data-and-model-access)
shows how to use data artifacts from outside of the project and how to import
data artifacts from another DVC project. This can help to download a specific
version of an ML model to a deployment server or import a model to another
project.

- [**Data pipelines**](/doc/start/data-management/data-pipelines) describe how
models and other data artifacts are built, and provide an efficient way to
reproduce them. Think "Makefiles for data and ML projects" done right.

- [**Metrics, parameters, and plots**](/doc/start/data-management/metrics-parameters-plots)
can be attached to pipelines. These let you capture, navigate, and evaluate ML
projects without leaving Git. Think "Git for machine learning".

### Experimentation
Expand Down
4 changes: 2 additions & 2 deletions content/docs/studio/user-guide/prepare-your-repositories.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ Datasets, metrics, and hyperparameters can be added to a project in two ways:
repository.

[store and share your data and model files]:
/doc/start/data-and-model-versioning#storing-and-sharing
/doc/start/data-management#storing-and-sharing
[create data registries]: /doc/use-cases/data-registry
[create data pipelines]: /doc/start/data-pipelines
[create data pipelines]: /doc/start/data-management/data-pipelines
[ci/cd in machine learning]: /doc/use-cases/ci-cd-for-machine-learning

2. **Specify custom files with your metrics and parameters**: If you are working
Expand Down
12 changes: 6 additions & 6 deletions content/docs/use-cases/ci-cd-for-machine-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ code. Instead, DVC stores meta-information in Git ("codifying" data and ML
models) while pushing the actual data content to
[cloud storage](/doc/command-reference/remote). DVC also provides metrics-driven
navigation in Git repositories --
[tabulating and plotting](/doc/start/metrics-parameters-plots) model metrics
changes across commits.
[tabulating and plotting](/doc/start/data-management/metrics-parameters-plots)
model metrics changes across commits.

**Low friction**: Our sister project CML provides
[lightweight machine resource orchestration](https://cml.dev/doc/self-hosted-runners)
Expand All @@ -74,10 +74,10 @@ deploy and deliver new versions several times a day -- and even before the
weekend -- without fear of bugs/regressions.

**Metrics (Model Validation)**: Whenever a change is committed, DVC can check
that the [pipeline](/doc/start/data-pipelines) (including data, parameters,
code, and metrics) is up to date, thereby ensuring that Git commits and model
artifacts are in sync. DVC can also run benchmarks against previously deployed
models before a new one is
that the [pipeline](/doc/start/data-management/data-pipelines) (including data,
parameters, code, and metrics) is up to date, thereby ensuring that Git commits
and model artifacts are in sync. DVC can also run benchmarks against previously
deployed models before a new one is
[released into production](/doc/use-cases/data-registry). CML provides useful
tools to make this process easy -- including reporting metric changes with
interactive graphs and tables in pull request comments.
Expand Down
8 changes: 4 additions & 4 deletions content/docs/use-cases/data-registry/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

One of the main uses of <abbr>DVC repositories</abbr> is the
[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning).
DVC also enables cross-project [reusability](/doc/start/data-and-model-access)
of these <abbr>data artifacts</abbr>. This means that your projects can depend
on data from other repositories — like a **package management system for data
science**.
DVC also enables cross-project
[reusability](/doc/start/data-management/data-and-model-access) of these
<abbr>data artifacts</abbr>. This means that your projects can depend on data
from other repositories — like a **package management system for data science**.

![](/img/data-registry.png) _Data management middleware_

Expand Down
3 changes: 1 addition & 2 deletions content/docs/use-cases/fast-data-caching-hub.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,7 @@ part of your infrastructure; provisioned depending on data access speed and cost
requirements. You have the flexibility to switch storage providers at any time,
without having to change the directory structures or code of your projects.

[share data and ml models]:
/doc/start/data-and-model-versioning#storing-and-sharing
[share data and ml models]: /doc/start/data-management#storing-and-sharing

### What's next?

Expand Down
6 changes: 3 additions & 3 deletions content/docs/use-cases/model-registry.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ ML model registries give your team key capabilities:
- For security, control who can manage models, and audit their usage trails.

[versions]: /doc/use-cases/versioning-data-and-models
[mp]: /doc/start/metrics-parameters-plots
[mp]: /doc/start/data-management/metrics-parameters-plots
[experiments]: /doc/user-guide/experiment-management

Many of these benefits are built into DVC: Your [modeling process] and
Expand Down Expand Up @@ -57,8 +57,8 @@ process into [GitOps]. This means you can manage and deliver ML models with
software engineering methods such as continuous integration (CI/CD), which can
sync with the state of the artifacts in your registry.

[modeling process]: /doc/start/data-pipelines
[modeling process]: /doc/start/data-management/data-pipelines
[remote storage]: /doc/command-reference/remote
[sharing]: /doc/start/data-and-model-access
[sharing]: /doc/start/data-management/data-and-model-access
[via cml]: https://cml.dev/doc/cml-with-dvc
[gitops]: https://www.gitops.tech/
2 changes: 1 addition & 1 deletion content/docs/use-cases/versioning-data-and-models/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Benefits of our approach include:
- **Collaboration**: Easily distribute your project development and share its
data [internally](/doc/user-guide/how-to/share-a-dvc-cache) and
[remotely](/doc/command-reference/remote), or
[reuse](/doc/start/data-and-model-access) it in other places.
[reuse](/doc/start/data-management/data-and-model-access) it in other places.

- **Data compliance**: Review data modification attempts as Git
[pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/).
Expand Down
8 changes: 4 additions & 4 deletions content/docs/use-cases/versioning-data-and-models/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,16 +342,16 @@ very convenient having to remember to do so every time the dataset changes.
Here's where the [pipelines](/doc/command-reference/dag) feature of DVC comes in
handy. We touched on it briefly when we described `dvc run` and `dvc repro`. The
next step would be splitting the script into two parts and utilizing pipelines.
See [Get Started: Data Pipelines](/doc/start/data-pipelines) to get hands-on
experience with pipelines, and try to apply it here. Don't hesitate to join our
[community](/chat) and ask any questions!
See [Get Started: Data Pipelines](/doc/start/data-management/data-pipelines) to
get hands-on experience with pipelines, and try to apply it here. Don't hesitate
to join our [community](/chat) and ask any questions!

Another detail we only brushed upon here is the way we captured the
`metrics.csv` metrics file with the `-M` option of `dvc run`. Marking this
<abbr>output</abbr> as a metric enables us to compare its values across Git tags
or branches (for example, representing different experiments). See
`dvc metrics`,
[Comparing Changes](/doc/start/metrics-parameters-plots#comparing-iterations),
[Comparing Changes](/doc/start/data-management/metrics-parameters-plots#comparing-iterations),
and
[Comparing Many Experiments](/doc/start/experiments#comparing-many-experiments)
to learn more about managing metrics with DVC.
10 changes: 5 additions & 5 deletions content/docs/user-guide/basic-concepts/dvc-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ considered part of the project (e.g.
## DVC repository

A DVC project in a Git repository can also be called a _DVC repository_ or "the
repo". This setup enables the
[versioning features](/doc/start/data-and-model-versioning) of DVC
(recommended). Files tracked by Git are considered part of the DVC project when
referenced from DVC metafiles such as `dvc.lock`; for example source code that
is used as a <abbr>stage</abbr> command (`cmd` field in `dvc.yaml`).
repo". This setup enables the [versioning features](/doc/start/data-management)
of DVC (recommended). Files tracked by Git are considered part of the DVC
project when referenced from DVC metafiles such as `dvc.lock`; for example
source code that is used as a <abbr>stage</abbr> command (`cmd` field in
`dvc.yaml`).

## Further Reading

Expand Down
8 changes: 4 additions & 4 deletions content/docs/user-guide/basic-concepts/workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Adding versioning needs and dependency management can easily turn this near
impossible.

A <abbr>DVC project</abbr> structure is simplified by encapsulating
[data versioning](/doc/start/data-and-model-versioning) and
[pipelining](/doc/start/data-pipelines) (e.g. machine learning workflows), among
other features. This leaves a _workspace_ directory with a clean view of your
working raw data, source code, data artifacts, etc. and a few
[data versioning](/doc/start/data-management) and
[pipelining](/doc/start/data-management/data-pipelines) (e.g. machine learning
workflows), among other features. This leaves a _workspace_ directory with a
clean view of your working raw data, source code, data artifacts, etc. and a few
[metafiles](/doc/user-guide/project-structure) that enable these features. A
single version of the project is visible at a time.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ you have everything you need to get started with experiments and checkpoints.

DVC can version data as well as the ML model weights file in checkpoints during
the training process. To enable this, you will need to set up a
[DVC pipeline](/doc/start/data-pipelines) to train your model.
[DVC pipeline](/doc/start/data-management/data-pipelines) to train your model.

Now we need to add a training stage to `dvc.yaml` including `checkpoint: true`
in its <abbr>output</abbr>. This tells DVC which <abbr>cached</abbr> output(s)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ experiment(s). These files codify _pipelines_ that specify one or more
<abbr>stages</abbr> of the experiment workflow (code, <abbr>dependencies</abbr>,
<abbr>outputs</abbr>, etc.).

> 📖 See [Get Started: Data Pipelines](/doc/start/data-pipelines) for an intro
> to this topic.
> 📖 See
> [Get Started: Data Pipelines](/doc/start/data-management/data-pipelines) for
> an intro to this topic.
### Running the pipeline(s)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ remotes in the case of experiments.
[dvc experiments]: /doc/user-guide/experiment-management/experiments-overview
[created]: /doc/user-guide/experiment-management/running-experiments
[sharing regular project versions]:
/doc/start/data-and-model-versioning#storing-and-sharing
/doc/start/data-management#storing-and-sharing
[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes

## Preparation
Expand Down
3 changes: 1 addition & 2 deletions content/docs/user-guide/project-structure/dvc-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
You can use `dvc add` to track data files or directories located in your current
<abbr>workspace</abbr>\*. Additionally, `dvc import` and `dvc import-url` let
you bring data from external locations to your project, and start tracking it
locally. See [Data Versioning](/doc/start/data-and-model-versioning) for more
info.
locally. See [Data Versioning](/doc/start/data-management) for more info.

> \* Certain [external locations](/doc/user-guide/managing-external-data) are
> also supported.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ You can construct data science or machine learning pipelines by defining
individual [stages](/doc/command-reference/run) in one or more `dvc.yaml` files.
Stages form a pipeline when they connect with each other (forming a _dependency
graph_, see `dvc dag`). Refer to
[Get Started: Data Pipelines](/doc/start/data-pipelines).
[Get Started: Data Pipelines](/doc/start/data-management/data-pipelines).

<admon type="tip">

Expand Down

0 comments on commit 28c5cc4

Please sign in to comment.