Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: exp cmd index, make_checkpoint() fn #2199

Merged
merged 22 commits into from
Feb 23, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
bebbf34
ref: add exp cmd and make_checkpoint() fn
jorgeorpinel Feb 16, 2021
ecfd6b9
ref: add auto link to exp[eriments] cmds, beter api ref
jorgeorpinel Feb 16, 2021
8f5cf67
ref: 2.0 warning in exp
jorgeorpinel Feb 16, 2021
f7cb618
ref: remove examples from exp index
jorgeorpinel Feb 16, 2021
19a6593
ref: reorder exp list in index and intro note
jorgeorpinel Feb 16, 2021
c811458
ref: fix exp index title and link to guide from intro
jorgeorpinel Feb 16, 2021
22af4cf
ref: gc --all-experiments
jorgeorpinel Feb 16, 2021
597161c
ref: update make_checkpoint example
jorgeorpinel Feb 17, 2021
3bfe780
ref: check make_checkpoint example
jorgeorpinel Feb 17, 2021
d90ac09
Merge branch 'guide/experiments' into ref/experiments
jorgeorpinel Feb 17, 2021
7bbd5b1
term: review "experiment" links/tooltips
jorgeorpinel Feb 17, 2021
e7c2584
ref: exapnd make_checkpoints example, use exp tooltips
jorgeorpinel Feb 17, 2021
a5c9aa8
ref: finish initial make_checkpoint() example
jorgeorpinel Feb 17, 2021
bb82194
Add dvc experiments to linked-terms
rogermparent Feb 17, 2021
ca9d31f
ref: complete exp index Desc
jorgeorpinel Feb 17, 2021
e20490d
Merge branch 'ref/experiments' of github.com:iterative/dvc.org into r…
jorgeorpinel Feb 17, 2021
8e2e1cb
Merge branch 'master' into ref/experiments
jorgeorpinel Feb 21, 2021
3ec578d
ref: explain exp run --reset and exp apply in make_checkpoint()
jorgeorpinel Feb 21, 2021
5e81e22
ref: hyperparams -> params in exp index
jorgeorpinel Feb 21, 2021
c7b734c
Merge branch 'master' into ref/experiments
jorgeorpinel Feb 22, 2021
56bf275
Update content/docs/api-reference/make_checkpoint.md
jorgeorpinel Feb 22, 2021
bec199e
ref: mention exp run --reset in make_checkpoint ex.
jorgeorpinel Feb 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions config/prismjs/dvc-commands.js
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ module.exports = [
'gc',
'freeze',
'fetch',
'exp',
'experiments',
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
'doctor',
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
'diff',
'destroy',
Expand Down
126 changes: 126 additions & 0 deletions content/docs/api-reference/make_checkpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# dvc.api.make_checkpoint()

Make an
[in-code checkpoint](/doc/user-guide/experiment-management#checkpoints-in-source-code).

```py
def make_checkpoint()
```

#### Usage:

```py
from dvc.api import make_checkpoint

while True:
# ... write a stage output
make_checkpoint()
```

## Description

To track successive steps in a longer <abbr>experiment</abbr>, you can write
your code so it registers checkpoints with DVC during runtime. This function
should be called by the code in stages executes by `dvc exp run` (see `cmd`
field of `dvc.yaml`).

> Note that for non-Python code, the alternative is to write a
> `.dvc/tmp/DVC_CHECKPOINT` signal file.

## Example: Every 100th iteration

Let's consider the following `dvc.yaml` file:

```yaml
stages:
every100:
cmd: python iterate.py
outs:
- int.txt:
checkpoint: true
```

The code in `iterate.py` will execute continuously increment an integer number
saved in `int.txt` (starting at 0). At 0 and every 100 loops, it makes a
checkpoint for `dvc experiments`:

```py
import os

from dvc.api import make_checkpoint

while True:
try:
if os.path.exists("int.txt"):
with open("int.txt", "r") as fd:
try:
i_ = int(fd.read()) + 1
except ValueError:
i_ = 0
else:
i_ = 0

with open("int.txt", "w") as fd:
fd.write(f"{i_}")

if i_ % 100 == 0:
make_checkpoint()

except KeyboardInterrupt:
exit()
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

Using `dvc repro` with a continuous process such as this may not be helpful, as
you know the output file will keep changing every time. Instead you can execute
the stage with `dvc exp run` and end the process when you decide:

```dvc
$ dvc exp run
Running stage 'every100':
> python iterate.py
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
Checkpoint experiment iteration 'd832784'.
Updating lock file 'dvc.lock'
Checkpoint experiment iteration '6f5009b'.
Updating lock file 'dvc.lock'
Checkpoint experiment iteration '75ff5e0'.
^C

Reproduced experiment(s): exp-8a3bd
Experiment results have been applied to your workspace.
```

In this example we kill the process (with Ctrl + C) after 3 checkpoints (at 0,
100, and 200). The <abbr>cache</abbr> will contain those 3 versions of
`int.txt`. And DVC applies the last checkpoint to the <abbr>workspace</abbr>
even when more cycles happened before the interrupt:

```dvc
$ cat int.txt
200
$ ls .dvc/cache
36 cf f8
```

`dvc exp show` will display these checkpoints as an experiment branch:

```dvc
$ dvc exp show
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Experiment ┃ Created ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ workspace │ - │
│ master │ Feb 10, 2021 │
│ │ ╓ exp-8a3bd │ 02:07 PM │
│ │ ╟ 75ff5e0 │ 01:54 PM │
│ │ ╟ 6f5009b │ 01:54 PM │
│ ├─╨ d832784 │ 01:54 PM │
└───────────────┴──────────────┘
# Press q to exit this screen.
```

Now if we use `dvc exp run` again, the process will start from 200. To restart
from a previous point or even from scratch, you can use use `dvc exp apply`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

See `dvc experiments` for more info on managing <abbr>experiments</abbr>.
48 changes: 48 additions & 0 deletions content/docs/command-reference/exp/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# experiments

⚠️ This feature is only available in DVC 2.0 ⚠️

A set of commands to generate and manage <abbr>experiment</abbr>:
[run](/doc/command-reference/exp/run), [show](/doc/command-reference/exp/show),
[diff](/doc/command-reference/exp/diff),
[apply](/doc/command-reference/exp/apply),
[branch](/doc/command-reference/exp/branch),
[resume](/doc/command-reference/exp/resume),
[gc](/doc/command-reference/exp/gc), [list](/doc/command-reference/exp/list),
[push](/doc/command-reference/exp/list), and
[pull](/doc/command-reference/exp/pull).

> Aliased to `dvc exp`.

## Synopsis

```usage
usage: dvc experiments [-h] [-q | -v]
{show,apply,diff,run,resume,res,gc,branch,list,push,pull} ...

positional arguments:
COMMAND
show Print experiments.
apply Apply the changes from an experiment to your workspace.
diff Show changes between experiments in the DVC repository.
run Reproduce complete or partial experiment pipelines.
resume (res) Resume checkpoint experiments.
gc Garbage collect unneeded experiments.
branch Promote an experiment to a Git branch.
list List local and remote experiments.
push Push a local experiment to a Git remote.
pull Pull an experiment from a Git remote.
```

## Description

> Note that DVC assumes that <abbr>experiments</abbr> are deterministic (see
> **Avoiding unexpected behavior** in `dvc run`).

## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
7 changes: 3 additions & 4 deletions content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,9 @@ specific one is given with `--remote`.

- `-a`, `--all-branches` - fetch cache for all Git branches instead of just the
current workspace. This means DVC may download files needed to reproduce
different versions of a `.dvc` file
([experiments](/doc/tutorials/get-started/experiments)), not just the ones
currently in the workspace. Note that this can be combined with `-T` below,
for example using the `-aT` flag.
different versions of a `.dvc` file, not just the ones currently in the
workspace. Note that this can be combined with `-T` below, for example using
the `-aT` flag.

- `-T`, `--all-tags` - same as `-a` above, but applies to Git tags as well as
the workspace. Note that both options can be combined, for example using the
Expand Down
19 changes: 12 additions & 7 deletions content/docs/command-reference/gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ Remove unused files and directories from <abbr>cache</abbr> or
## Synopsis

```usage
usage: dvc gc [-h] [-q | -v]
[-w] [-a] [-T] [--all-commits] [-c] [-r <name>]
[-f] [-j <number>] [-p [<path> [<path> ...]]]
usage: dvc gc [-h] [-q | -v] [-w] [-a] [-T] [--all-commits]
[--all-experiments] [-c] [-r <name>] [-f] [-j <number>]
[-p [<path> [<path> ...]]]
```

## Description
Expand All @@ -23,9 +23,9 @@ explicitly provide the right set of options to specify what data is still needed
(so that DVC can figure out what files can be safely deleted).

One of the scope options (`--workspace`, `--all-branches`, `--all-tags`,
`--all-commits`) or a combination of them must be provided. Each of them
corresponds to keeping the data for the current workspace, and possibly for a
certain set of commits (determined by reading the <abbr>DVC files</abbr> in
`--all-commits`, `--all-experiments`) or a combination of them must be provided.
Each of them corresponds to keeping the data for the current workspace, and for
a certain set of commits (determined by reading the <abbr>DVC files</abbr> in
them). See the [Options](#options) section for more details.

> Note that `dvc gc` tries to fetch any missing
Expand Down Expand Up @@ -53,7 +53,7 @@ The default remote is cleaned (see `dvc config core.remote`) unless the

- `-w`, `--workspace` - keep _only_ files and directories referenced in the
workspace. Note that this behavior is implied in `--all-tags`,
`--all-branches`, and `--all-commits`.
`--all-branches`, `--all-commits`, and `--all-commits`.

- `-a`, `--all-branches` - keep cached objects referenced in all Git branches,
and in the workspace (implying `-w`). Useful if branches are used to track
Expand All @@ -75,6 +75,11 @@ The default remote is cleaned (see `dvc config core.remote`) unless the
that is never referenced from the workspace or from any Git commit can still
be stored in the project's cache).

- `--all-experiments` same as `a`, `T`, but applies to all `dvc experiments`.
This preserves the cache for all
[experimental](/doc/user-guide/external-dependencies) data (including
intermediate checkpoints).

- `-p <paths>`, `--projects <paths>` - if a single remote or a single cache is
shared among different projects (e.g. a configuration like the one described
[here](/doc/use-cases/shared-development-server)), this option can be used to
Expand Down
9 changes: 9 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,11 @@
"label": "doctor",
"slug": "doctor"
},
{
"label": "experiments",
"slug": "exp",
"source": "exp/index.md"
},
{
"label": "fetch",
"slug": "fetch",
Expand Down Expand Up @@ -399,6 +404,10 @@
{
"slug": "read",
"label": "read()"
},
{
"slug": "make_checkpoint",
"label": "make_checkpoint()"
}
]
},
Expand Down
2 changes: 1 addition & 1 deletion content/docs/use-cases/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ learning models, and you want to
[versions of data and ML models](/doc/use-cases/versioning-data-and-model-files)
easily;
- understand how datasets and ML artifacts were built in the first place;
- compare model metrics among [experiments](/doc/start/experiments);
- compare model metrics among <abbr>experiments</abbr>;
- adopt engineering tools and best practices in data science projects;

DVC is for you!
Expand Down
3 changes: 1 addition & 2 deletions content/docs/user-guide/related-technologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,7 @@ _Luigi_, etc.
- DVC doesn't need to run any services. There's no GUI as a result, but we
expect some GUI services will be created on top of DVC.

- DVC can generate images with [experiment](/doc/start/experiments) workflow
visualizations.
- DVC can generate images with experiment workflow visualizations.

- DVC has transparent design. <abbr>DVC files</abbr> have a human-readable
format and can be easily reused by external tools.
Expand Down