Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A notion of plot #3826

Closed
Suor opened this issue May 19, 2020 · 7 comments
Closed

A notion of plot #3826

Suor opened this issue May 19, 2020 · 7 comments
Labels
discussion requires active participation to reach a conclusion p1-important Important, aka current backlog of things to do

Comments

@Suor
Copy link
Contributor

Suor commented May 19, 2020

Now we have plot data and plot templates, the plot itself is a result of some command line call and defined by its options. To have some sensible plot by default we need to hardcode some things like axes in the default plot template, which is awkward.

So I propose that we should have those concepts separated codified and documented:

  • plot data,
  • plot templates,
  • plots itself, as an application of some template to some data properly parameterized.

The issue with the first two is that they are intermixed both conceptually and codewise, I created #3818 to track that. The issue with the plot itself is that it's ephemeral - only exists as a cli command.

Providing a concept of plot, somehow saved alongside repo, will helps us show more sensible plots by default, by name, share them within a team or reuse in external tools, i.e. CI or Viewer.

This issue is open for discussion.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 19, 2020
@Suor Suor added the discussion requires active participation to reach a conclusion label May 19, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 19, 2020
@Suor
Copy link
Contributor Author

Suor commented May 19, 2020

@dmpetrov @pared your input will be valuable.

@skshetry
Copy link
Member

Providing a concept of plot, somehow saved alongside repo, will helps us show more sensible plots by default, by name, share them within a team or reuse in external tools, i.e. CI or Viewer.

@Suor, are you talking about having plot type and templates in something like dvc.yaml? Is this PR #3807 not enough?

@pared
Copy link
Contributor

pared commented May 20, 2020

@Suor
I think creating concept of plot is a good idea, and should probably be stored as dvc plot parameters in dvc.yml.

The main reason why it's hard to save the plot alongside other elements of the repository is that we can plot unknown number of revisions. For dvc plots show it will be just current one, for dvc plots diff it can range from 2 to {number_of_commits} in repo.

We can have plot params specified in dvc.yaml and just call plot via
dvc plot show metric, or dvc plot diff metric HEAD v1 v2 v4...
But in this case, the idea of plot is still kind of ephemeral, because the user needs to provide, one way or another, the revisions to plot.

The other thing that comes to my mind is requiring user to provide revisions which should be included and stored in dvc.yaml too, with some sensible defaults, like last 3 commits/ tags. But that seems like bad user experience: needing to update dvc.yaml with each new interesting commit/tag.

[EDIT]
But, if we agree to leave revision providing to user, I don't see a problem with utilizing already existing functionality to fulfill creating plot concept alongside other elements in repo:
One can already create custom template with <DVC_METRIC_DATA,data_source.csv> anchor, and this kind of template does not require providing data to be filled. So, what we could do is to make dvc plots create this kind of tailored template under .dvc/plot/ and just write its path in dvc.yaml.

@Suor
Copy link
Contributor Author

Suor commented May 20, 2020

@skshetry in #3807 we only mark plot data files as plots, which is confusing, while being useful for plot data files collection. To actually see a plot one still needs to specify some params.

@pared specifying revisions is fine, i.e. we simply build the same plot for different revisions/version of same data.

Custom template with hardcoded params can provide this functionality, but this is a misuse, which fails to reuse templates properly. The plot is a collection of those params passed to command line including template.

@pared
Copy link
Contributor

pared commented May 20, 2020

The plot is a collection of those params passed to command line including template.

Agree, I got carried away with this one.

@Suor
So I guess the scope of this task is to extend the functionality for plot provided in #3807 and let user specify plot params in plot section of dvc.yaml, right?

@pared pared added the p1-important Important, aka current backlog of things to do label May 20, 2020
@Suor
Copy link
Contributor Author

Suor commented May 20, 2020

@pared this is possible approach. There is a usual discrepancy though - if we store plots within git then from which revision do we take its description?

@pared
Copy link
Contributor

pared commented May 20, 2020

@Suor good point. I would say that we should go with latest revision in time, with current version of plot (even if uncommited) having priority.

@efiop efiop closed this as completed May 3, 2021
@iterative iterative locked and limited conversation to collaborators May 3, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion requires active participation to reach a conclusion p1-important Important, aka current backlog of things to do
Projects
None yet
Development

No branches or pull requests

4 participants