Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list: document usage for data archive #1879

Closed
wants to merge 2 commits into from
Closed

list: document usage for data archive #1879

wants to merge 2 commits into from

Conversation

imhardikj
Copy link
Contributor

Fixes #1521
Partially addresses #1824

@shcheklein shcheklein temporarily deployed to dvc-landing-listcmd-vuq185nwyu October 19, 2020 19:39 Inactive
Comment on lines 126 to 130
## Example: Archive project data

An archive is a single file that contains multiple files from a project. It can
be used to backup project data. We can use `dvc list` to create an archive of
files and directories (<abbr>outputs</abbr>) tracked by DVC:
Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to think through the motivation and use cases here. Are we doing what the issue originally suggested ("create a lightweight copy of the project (for backup). It's "lightweight" because it wouldn't include any of the data tracked by DVC")? Or one of the options in #1521 (comment)? Or something else? And why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. is this ready for review? I see the PR is in draft state.

$ dvc list . -R --dvc-only | zip -@ data.zip # if `zip` available
```

Alternative for windows (if `xargs` available):
Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Capital W
  • Missing "is"

But I'm not sure this makes sense. How is it an alternative for Windows when it uses xargs which is a GNU tool?

$ dvc list . -R --dvc-only | xargs python -m zipfile -c data.zip
```

Use `git archive` to create an archive of data tracked by Git:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's tracked by Git it's not data.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @imhardikj I'm not sure that addressing #1521 is very important TBH. It would be nice but it's not a 0->1 update so maybe put that in a separate PR which we may or may not get to (there are higher priorities I think, according to your project proposal, but up to you if you want to work on that on extra time).

The job here should be pretty straightforward: read and review the correctness of the description, actually try dvc list throughout the doc including the examples, and update anything that is still outdated (the only change so far in this doc from 0 to 1 was to quickly grep/replace DVC-file for .dvc file and/or dvc.yaml — was that enough? I'm not sure).

Thanks

content/docs/command-reference/list.md Show resolved Hide resolved
content/docs/command-reference/list.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to dvc-landing-listcmd-vuq185nwyu October 22, 2020 18:22 Inactive
Comment on lines -19 to +24
DVC, by effectively replacing data files, models, directories with `.dvc` files
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
GitHub), you just see the `dvc.yaml` and `.dvc` files. This makes it hard to
navigate the project to find <abbr>data artifacts</abbr> for use with `dvc get`,
DVC replaces data files, models, directories, etc. with small
[metafiles](/doc/user-guide/dvc-files-and-directories#dvc-files-and-directories),
and hides actual locations and names. Hence data files aren't visible when you
browse a <abbr>DVC repository</abbr> on Git hosting (e.g. GitHub), you just see
the `dvc.yaml` and `.dvc` files. This makes it difficult to find <abbr>data
artifacts</abbr> while navigating in a project, for using with `dvc get`,
Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this addresses #1521 and it has some strange language like "for using with" among other phrases (that replace perfectly OK previous text). The changes don't seem to bring any new info. anyway.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks stale.

@jorgeorpinel
Copy link
Contributor

Sorry @imhardikj I'm closing this as stale. You have enough PRs open already anyway. If you're trying this one again (when/if you get the capacity) please first post your plan of action in the original issue and mention me, so we confirm we're on the same page. Thanks

@imhardikj imhardikj deleted the listcmd branch November 6, 2020 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

list: document usage for data export/archive
3 participants