-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: data status
#3812
ref: data status
#3812
Conversation
Link Check ReportThere were no links to check! |
Looks good! A couple high-level comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. @skshetry , and thank for keeping it simple first.
We'll need to do a few iterations here and I have some product questions, would love your feedback on that.
@skshetry When you are available, could you do another iteration on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving so that it's not blocked on me. There are still some pending minor discussions - like removing index page for now.
I’ll remove index and push another version tomorrow. :) |
Looks like we do need the
|
@skshetry pls check the Contributing section: {
"label": "Contributing",
"slug": "contributing",
"source": false,
"children": [
{
"label": "DVC Core Project",
"slug": "core"
},
{
"label": "Docs and Website",
"slug": "docs"
},
{
"label": "Writing Blog Posts",
"slug": "blog"
}
]
}, |
Thanks @shcheklein. I have deleted the index. |
All of the conversations are resolved, and the index file is now removed. I am merging this, if there’s anything, let me know. I am happy to work on top. And thankyou for the all the suggestions and help. 🙏🏼 |
Hi. A couple post-merge questions (based on existing conversations).
Since there's no index for this command's base, should we just list On the other hand, not having an index.md breaks the existing pattern and is technically incomplete since you can
At least mentioning the difference with |
makes sense!
let's keep it simple for now? that page is not useful at all ... and let's keep the hierarchy in place, also for simplicity and for consistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to merge something and follow up later if needed...
if there’s anything, let me know. I am happy to work on top.
Great work so far! Some other possible follow ups:
usage: dvc data status [-h] [-q | -v] | ||
[--granular] [--unchanged] | ||
[--untracked-files [{no,all}]] | ||
[--json] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be a better visual grouping (even if it doesn't match the help output) cc @dberenbaum
usage: dvc data status [-h] [-q | -v] | |
[--granular] [--unchanged] | |
[--untracked-files [{no,all}]] | |
[--json] | |
usage: dvc data status [-h] [-q | -v] [--json] [--granular] | |
[--unchanged] [--untracked-files [{no,all}]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look important to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably secondary, yes. It's a quality-related best practice for the cmd ref/ usage blocks we started applying recently, see #3345.
The `data status` command displays the state of the working directory and the | ||
changes with respect to the last Git commit (`HEAD`). It shows you what new | ||
changes have been committed to DVC, which haven't been committed, which files | ||
aren't being tracked by DVC and Git, and what files are missing from the | ||
<abbr>cache</abbr>. | ||
|
||
The `dvc data status` command only outputs information, it won't modify or | ||
change anything in your working directory. It's a good practice to check the | ||
state of your repository before doing `dvc commit` or `git commit` so that you | ||
don't accidentally commit something you don't mean to. | ||
|
||
An example output might look something like follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `data status` command displays the state of the working directory and the | |
changes with respect to the last Git commit (`HEAD`). It shows you what new | |
changes have been committed to DVC, which haven't been committed, which files | |
aren't being tracked by DVC and Git, and what files are missing from the | |
<abbr>cache</abbr>. | |
The `dvc data status` command only outputs information, it won't modify or | |
change anything in your working directory. It's a good practice to check the | |
state of your repository before doing `dvc commit` or `git commit` so that you | |
don't accidentally commit something you don't mean to. | |
An example output might look something like follows: | |
Displays the state of the <abbr>workspace</abbr> compared to the last Git commit | |
(`HEAD`). This includes committed and uncommitted additions, updates, and | |
deletions of DVC-tracked files. Checking the state of your tracked data is | |
useful to know what to `dvc add` (or `dvc commit`) and `git commit`. Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wha -> what
it was fine before I think, don't see a reason to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason to change this
Minor but not cosmetic: Reduced from 3 paragraph Desc. intro to 1. (We want explanations in refs to stay short.) I also removed some sentences that aren't needed IMO. Applied some other existing patterns (e.g. don't mention the command name at the beginning so it's not repetitive later when you need it in other paragraphs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying existing patterns is fine, everything else feels very cosmetic still + changes the meaning / original intention (which is fine, but we don't have strong enough reason to spend time reviewing this to my mind in this case and debate one more time about intentions in the text, benefits, etc, etc). Please, unless it's super important let's not do this - it takes a lot of time to review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm making a point to spend time on these for now just to compile mayor existing practices (example).
DVC-tracked directories. By default, `dvc data status` does not show | ||
individual changes for files inside the tracked directories. | ||
|
||
- `--untracked-files` - show files that are not being tracked by DVC and Git. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if it's tracked by EITHER DVC or Git it will not be included here?
@jorgeorpinel could you please distill what is actually important here and doesn't require changes to the DVC repo and make a PR to review? a lot of changes, lots of them very cosmetic, and I'm not sure if there is a reason behind them ... reviewing them takes time |
The cmd group index pages can be useful e.g. see https://dvc.org/doc/command-reference/params as a short version of the guide for people who just want to have enough context to use the feature. Not sure whether we need that for |
Started #3924 and resolved the things I included there.
Nothing I added so far, but these string changes are typically pretty easy to propagate to the core repo in my experience. |
* ref: update data status intro per #3812 (comment) * ref: update data status explanations per #3812 (comment) * ref: update data status --granular explanation per #3812 (comment) * ref: remove redundant data status example per #3812 (comment) * ref: roll back data status prefixes * Restyled by prettier (#3925) Co-authored-by: Restyled.io <[email protected]> * ref: reinstate duplicated example in data status per #3924 (comment) * ref: roll back minor style change per #3924 (review) * ref: simplify term "tracked dirs" per #3924 (review) * ref: note unknown data status for --granular rel #3924 (review) * ref: avoid term "file records" per #3924 (review) * ref: restore whitespaces Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <[email protected]>
I have tried to keep it simple.
I was not sure what to add in
data/index.md
, so I started to introduce a few concepts around index/dvc-data, which is going to be helpful in thedata/status.md
to clarify and not repeat that we are comparing to file hashes indvc.lock
and.dvc
files and just mention the termindex
. But I did not go completely and introduce that concept yet. :)Closes #3732.