Skip to content

Commit

Permalink
use exp init instead of stage add
Browse files Browse the repository at this point in the history
  • Loading branch information
iesahin committed Jan 26, 2022
1 parent e0a1266 commit f67f5f8
Showing 1 changed file with 21 additions and 32 deletions.
53 changes: 21 additions & 32 deletions content/docs/start/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,39 +21,34 @@ the [`example-dvc-experiments`][ede] project.

<details>

### ⚙️ Installing the example project
### ⚙️ Initializing a project into DVC experiments

These commands are run in the [`example-dvc-experiments`][ede] project. You can
run the commands in this document after cloning the repository, installing the
requirements, and pulling the data.
If you already have a DVC project, that's great. You can start to use `dvc exp`
commands right away to run experiments in your project. (See the [user's guide]
for detailed information.) Here, we briefly discuss how to structure an ML
project into a DVC experiments project with `dvc exp init`.

#### Clone the project and create virtual environment
[user's guide]: /doc/user-guide/experiment-management/

Please clone the project and create a virtual environment.

> We strongly recommend to create a virtual environment to keep the libraries we
> use isolated from the rest of your system. This prevents version conflicts.
A typical machine learning project has data, a set of scripts that trains a
model, a bunch of hyperparameters that modify these models, and outputs metrics
and plots to evaluate the models. DVC makes certain assumptions about the names
of these elements to initialize a project with:

```dvc
$ git clone https://github.com/iterative/example-dvc-experiments -b get-started
$ cd example-dvc-experiments
$ virtualenv .venv
$ . .venv/bin/activate
$ python -m pip install -r requirements.txt
$ dvc exp init python src/train.py
```

#### Get the data set
Here, `python src/train.py` describes how you run experiments. It could be any
other command.

The repository we cloned doesn't contain the dataset. Instead of storing the
data in the Git repository, we use DVC to retrieve from a shared data store. In
this case, we use `dvc pull` to update the missing data files.
If your project uses different names for them, you can set directories for
source code (default: `src`), data (`data/`), models (`models/`), plots
(`plots/`), and files for hyperparameters (`params.yaml`), metrics
(`metrics.json`) with the options supplied to `dvc exp init`.

```dvc
$ dvc pull
```

The repository already contains the necessary configuration to run the
experiments.
You can also set these options in a dialog format with
`dvc exp init --interactive`.

</details>

Expand All @@ -68,19 +63,13 @@ Experiment results have been applied to your workspace.
...
```

It runs the specified command (`python train.py`) in `dvc.yaml`. That command
writes the metrics values to `metrics.json`.
It runs the command we specified (`python train.py`), and creates models, plots
and metrics in respective directories.

This experiment is then associated with the values found in the parameters file
(`params.yaml`), and other dependencies (`data/images/`) with these produced
metrics.

The purpose of the `dvc exp` family of commands is to let you run, capture, and
compare the machine learning experiments at once as you iterate on your project.
The artifacts like models and metrics produced by each experiment are tracked by
DVC, and the associated parameters and metrics can be committed to Git as text
files.

You can review the experiment results with `dvc exp show` and see these metrics
and results in a nicely formatted table:

Expand Down

0 comments on commit f67f5f8

Please sign in to comment.