Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unassign and deprecate #219

Merged
merged 36 commits into from
Aug 19, 2022
Merged

Add unassign and deprecate #219

merged 36 commits into from
Aug 19, 2022

Conversation

aguschin
Copy link
Contributor

@aguschin aguschin commented Jul 25, 2022

This PR adds

  1. remove a stage from the artifact version
  2. mark an artifact version as deprecated
  3. mark an artifact as deprecated
  4. Make Stages work Like Envs

close #93
close #51

Since we're introducing multiple new actions here, we need to align names and terms in the new context. Maybe

  1. register and deregister an entire artifact
  2. publish (release?) and deprecate a version of the artifact (can switch 1st and 2nd to avoid extra changes)
  3. assign and unassign a stage to a version (or promote or demote if like the previous names).

WDYT?

A minor note for myself: we need to warn a user that he needs to deprecate a version if that version have active assignments (but we shouldn't do the same when deprecating an artifact: warn about active versions - which is a bit non-symmetrical).

@aguschin aguschin self-assigned this Jul 25, 2022
@aguschin aguschin mentioned this pull request Jul 25, 2022
all versions.

```console
$ gto show churn --av -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin --av - minot style comment. It's not common to have multi-letter shortcuts like this. I would not do this.

@aguschin
Copy link
Contributor Author

aguschin commented Aug 1, 2022

Hi, I added deregister, deprecate and unassign to the README. Please check - I think now we need to make sure the terms are meaningful and in the right place, since we're introducing new ones. @shcheklein @omesser I suggest to change some of them, cause I think it makes more sense. Please see and let's discuss.

cc @dmpetrov if you want to check out the README (I don't think those are breaking changes - just renamings here and in Studio).

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated
6. Deregistering an artifact - completing the lifecycle of an artifact and
marking it as an outdated one.

### Registering an artifact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, coz in Studio model registry, you see models that you annotated even if you have not registered them. Should the Add a model flow in Studio also call register and create the Git tag to register the model? Is it possible to combine these commands somehow, so that the user does not have to use so many commands (register, publish, annotate)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think it should not. We can explain that artifact is considered to be registered if there is any version (that wasn't deprecated/unpublished).

Also I recently understood that what Studio shows is the same you should get if you run GTO on a remote repo, e.g. $ gto show https://github.com/iterative/model-registry. GTO doesn't process remote repos yet, but we'll add it soon #220

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference from Studio still is that when you add a model from Studio, we don't call the GTO register api and create the model tag without versions. I think we should make this consistent with GTO and also call register. Also, IMO having to run 3 commands (register, publish, annotate) for getting your first version up is too much. Can register internally call publish to register the first version? And can it also accept model description etc and call annotate internally?

Copy link
Contributor Author

@aguschin aguschin Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the reasoning, but I'm afraid combining register and annotate in a single command is also confusing. Consider, that annotate edits artifacts.yaml and don't create a commit with this change, while register creates a git tag for an already existing commit. If a single command both changes files in the current index and creates a git tag, that can be quite confusing IMO.

Even if we make annotate to create a new commit, register can be applied to any commit, not to the last one only. So combining register and annotate is useful only if you want to annotate+register something on the top of the HEAD. Which is useful when you register your artifact for the first time, but probably not that useful once you've annotated it already.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tapadipti for pointing out this! I would expect that during the Add Model scenario Studio calls 2 commands

  • gto register -> create a tag
  • gto annotate

Not sure about that a separate command is necessary, but I would upvote for consistent behavior!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin Questions:

  • In terms of displaying the model in the Studio model registry, how should we distinguish between models that are only annotated and models that are annotated and registered? Should we treat them the same and display both these types of models in the MR?
  • In terms of the Add a model flow in Studio, should we ask the users if they want to only annotate or annotate&register? I think this will introduce unnecessary confusion to the users. Can we agree on one flow?

Copy link
Contributor Author

@aguschin aguschin Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how should we distinguish between models that are only annotated and models that are annotated and registered

Please check out this issue. Maybe this is a solution.

should we ask the users if they want to only annotate or annotate&register

I would ask this from @alex000kim and @mnrozhkov cause they understand user needs better than I do.

on a side note, a similar thing in GTO was proposed by @jorgeorpinel a while ago #145

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tapadipti, do we have an issue for this already or should we create one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't. I'm reading #226 and it looks like we should conclude that discussion and then create required tickets in Studio. Has there been any discussion on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing as far as I know.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Aug 5, 2022

Codecov Report

Merging #219 (b5e7b7c) into main (6a49b25) will decrease coverage by 1.81%.
The diff coverage is 77.99%.

@@            Coverage Diff             @@
##             main     #219      +/-   ##
==========================================
- Coverage   82.72%   80.91%   -1.82%     
==========================================
  Files          16       16              
  Lines        1534     1886     +352     
==========================================
+ Hits         1269     1526     +257     
- Misses        265      360      +95     
Impacted Files Coverage Δ
gto/registry.py 73.27% <58.33%> (-18.52%) ⬇️
gto/utils.py 70.00% <64.28%> (-3.47%) ⬇️
gto/api.py 78.26% <67.85%> (+5.87%) ⬆️
gto/exceptions.py 73.55% <69.23%> (+1.07%) ⬆️
gto/tag.py 83.67% <76.71%> (-6.27%) ⬇️
gto/cli.py 69.56% <83.87%> (+0.27%) ⬆️
gto/base.py 85.38% <85.66%> (-0.71%) ⬇️
gto/config.py 95.78% <100.00%> (+1.10%) ⬆️
gto/constants.py 100.00% <100.00%> (ø)
gto/ext.py 71.79% <100.00%> (ø)
... and 7 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Co-authored-by: tapadipti <[email protected]>
README.md Outdated
Comment on lines 50 to 59
1. Registering artifact - marking the very beginning of the artifact history in
the Git repository.
2. Publishing a version - marking an important change in the artifact.
3. Assigning a stage to a version - marking the readiness to be consumed by a
downstream system.
4. Unassigning a stage - removing an "artifact is ready for this stage" mark.
5. Deprecating a version - marking a version as the one that should not be used
any longer.
6. Deregistering an artifact - completing the lifecycle of an artifact and
marking it as an outdated one.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the minimalistic standpoint, I was thinking to only introduce (and implement) what seams really necessary: 2, 3, 4, 6. But on the other hand, having all commands makes the picture complete (even if some parts are optional). WDYT? Should we for now skip adding 1 and 5 cause they look as not essential, or should we introduce them and structure the README around the most supposedly common commands?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest having a single command for 1 and annotate. But 5 (unassign version) is good to keep.

Copy link
Contributor Author

@aguschin aguschin Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a chat with @mnrozhkov, the suggestion is to remove the 1. Registering an artifact from GTO and use instead 2. Publishing a version. 1. doesn't solve any particular problem, and 6. Deregistering an artifact should be undone by either deleting a deregistering git tag (mymodel@deregistered) manually or by 2.:

$ gto deregister mymodel
Created git tag 'mymodel@deregistered'
# now the model is deregistered with all its versions
$ gto publish mymodel
Created git tag '[email protected]`
# now the model is registered again

WDYT @tapadipti?

Copy link

@mnrozhkov mnrozhkov Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin there are few questions for clarification

  • after you publish mymodel again, the only [email protected] is active and available in MR for mymodel? (old version if any are still deprecated?)
  • can I re-publish a model version that was deprecated? (I don't see any use case for this, just curious)
  • IMO deprecated sounds better than deregister, also you use verb register to explain publish command ))) may be it's also not optimal command name? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question about the use case! Please ask any other questions like this, if you have some :)

I guess there are two situations in which you may want to deprecate and then un-deprecate a model:

  1. You deprecated an artifact so others don't use it, but then changed your mind (in this case deleting a model deprecation git tag will do, although it don't preserve history)
  2. You want to deprecate all previous model versions so others don't use them (e.g. you found a critical bug that seems to exist in all versions prior to some version), but still want to develop and use the artifact further.

in 1 you can just delete a model deprecation git tag, but that won't preserve history of deprecation/undeprecation which could be valuable information (there should have been a reason to deprecate, the user should have been describing that reason in model deprecation git tag message). -- That's why I initially created those commands in pairs (register artifact & deregister artifact).

in 2 you may just go over all previous versions and create a version deprecation git tag for each of them. But that may be a lot of git tags and may not be desirable. Although, this is a premature optimization probably and we should ask user to do this if he wants to deprecate all versions at once, but not an artifact.

On the other topic, I agree that deprecate sounds better than deregister, but having paired command names may be easier to grasp for the user. Also, we may name this not "register & deregister an artifact", but "add & deprecate an artifact".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just delete a model deprecation git tag, but that won't preserve history

I think we shouldn't implement solutions that don't preserve history

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not though? Git allows you to overwrite history if you really want to.
If a user messes up (makes a typo, pushes a wrong version, etc) they should be able to delete/remove that tag.
I am sure users will still find a way to do that with git, so we might as well make it easier for them by supporting deletion behavior with gto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I've implemented --delete flags for untag and deprecate commands (this is not the default behavior). Those delete all relevant Git tags. You can find examples in the README.
  2. For deprecate I'm implementing the following approach: any new tag created after the artifact was deprecated "undeprecates" it. If you're going to do that, GTO will error out by default, so you'll need to provide --force flag.

@alex000kim
Copy link

my 2c:

  • It'll be hard to memorize the differences between register, publish, deregister, assign, unassign, depricate
  • Just throwing it out there: would it be possible to use the following commands? (or just a similar pattern)
    • gto assign awesome-model --version v1.0.0 instead of gto publish awesome-model --version v1.0.0
    • gto unassign awesome-model --version v1.0.0 instead of gto deprecate awesome-model --version v0.0.1
    • gto assign awesome-model --stage prod instead of gto assign awesome-model prod
    • gto unassign awesome-model --stage prod (stays the same)
    • gto archive awesome-model instead of gto deregister awesome-model

@aguschin
Copy link
Contributor Author

It's a great idea @alex000kim!
I'm trying to implement this and it seems it should work 🎉

I'm also checking if the following is possible

gto assign awesome-model v1.0.0  # instead of `gto publish awesome-model --version v1.0.0`
gto unassign awesome-model v1.0.0  # instead of `gto deprecate awesome-model --version v0.0.1`
gto assign awesome-model prod  # instead of `gto assign awesome-model prod`
gto assign awesome-model v1.0.0 prod  # instead of `gto assign awesome-model prod --version v1.0.0`
gto unassign awesome-model prod
gto archive awesome-model  # instead of `gto deregister awesome-model`

if we could remove --version and --stage options, it could make this even more clear.

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated
$ gto assign awesome-model prod
Created git tag 'awesome-model#prod#1' that adds stage 'prod' to 'v0.0.1'
$ gto tag awesome-model --version v0.0.1 --stage prod
Created git tag 'awesome-model#prod#1'
Copy link

@alex000kim alex000kim Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this message specify the version as well?
E.g. Created git tag 'awesome-model#prod#1' based on v0.0.1 of awesome-model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since 1) we can't assign a stage without a version and 2) gto tag awesome-model --version v0.0.1 is very different behavior than gto tag awesome-model --version v0.0.1 --stage prod

What do you think about a --to-version option when using --stage?

gto tag awesome-model --stage prod --to-version v0.0.1

Copy link
Contributor Author

@aguschin aguschin Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better - easier to read the command. Do you think this should do both version registration and promotion then?

$ gto tag x3 HEAD --version v0.0.1 --stage prod
Created git tag '[email protected]'
Created git tag 'x3#prod#1'

couple more examples to clarify how this will work:

$ gto tag x3 HEAD --stage prod
Created git tag 'x3#prod#2'

$ gto tag x3 --stage prod --to-version v0.0.2
Created git tag 'x3#prod#3'

$ gto tag x3 HEAD --to-version v0.0.3 --stage prod
Error: provide either 'ref' or 'to_version', but not both

$ gto tag x3 --version v0.0.4 --to-version v0.0.4 --stage prod
Error: provide either 'version' or 'to_version', but not both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess whether this makes sense or not depends on whether gto should allow promotion to a stage from a git ref OR if only a particular version should be allowed to be promoted to a stage.
I kind of assumed the latter.

Copy link
Contributor Author

@aguschin aguschin Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, if you try to assign a stage, but the version doesn't exist for the REF, GTO will register it for you. But if you supply --skip-registration git tag, then it won't.

I.e. let's suppose there are no versions for x4:

$ gto tag x4 HEAD --stage prod
Created git tag '[email protected]'
Created git tag 'x3#prod#1'

$ gto tag x4 HEAD^1 --stage prod --skip-registration
Created git tag 'x3#prod#1'

Looks like the difference between --version and --to-version is this: if you provide REF and --to-version, you will get an error, cause there should be only one way to specify the Git revision to reference with Git tag you'll create. While if you provide REF and --version, this will be OK: you register a version and reference REF with Git tag you create. That's why if you provide REF, --version and --stage, you can both register a version and make an assignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think it's better to keep the commands simple and not allow doing 2 things (register version and assign stage) with one command.
  • And having 2 similar flags (--version and --to-version) looks confusing to me.
  • Also, why do we want to generalize this to tag/untag, instead of making the commands specifically about versions and stages? Do we see more kinds of tags being required in the future, and want to be able to cover those potential scenarios? Since versions and stages have special meanings for us and since we treat them separately (eg, we only display the latest stage per version but we display all versions per model), I think it's easier to understand and remember the commands if they are separate for the distinct functionalities. Simplifying the name/syntax is a good idea, but not sure if combining them into a single command is.

Copy link
Contributor Author

@aguschin aguschin Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the commands simple and not allow doing 2 things (register version and assign stage) with one command.

# I guess you're suggesting we shouldn't allow this: 
$ gto tag x6 HEAD --stage prod --version 1.0.0
Created git tag '[email protected]'
Created git tag 'x5#prod#1'

# Should we allow this?
$ gto tag x5 HEAD --stage prod
Created git tag '[email protected]'
Created git tag 'x5#prod#1'

# If we don't allow the latter, the alternative is to error out
$ gto tag x5 HEAD --stage prod
Error: no version registered at HEAD. Register a version first, then do an assignment.

But the ability to assign a stage with automatic version registration (like the second command) is a useful shortcut, I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why do we want to generalize this to tag/untag, instead of making the commands specifically about versions and stages?

From the above comments from @mnrozhkov and @alex000kim I assume it's too hard to memorize 5 different commands and syntaxes for a new user. I agree on this direction: GTO does very simple things and should be a lightweight tool, and having a big tough-to-read README with many commands is what I would like to avoid.

On the other hand, if unifying commands brings more confusion than value, let's not do those. Currently (without --to-version and --from-version) I think it's pretty straightforward though (maybe it's needed to be clarified in docs): if you supply --stage, you do an assignment, otherwise, you do a registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterated on this with @mnrozhkov and eventually decided that

  1. tag looks cool, but if you try to investigate how it works, it becomes confusing (different behavior with --stage and without --stage
  2. register and assign are easier cause they separate use cases well
  3. deprecate can unassign stages and deregister artifacts

I've updated the README and it looks good now: we have 3 commands only: register, assign, deprecate. Almost without breaking changes. Feel free to read and provide any more feedback re this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this. So now:

  • in CLI you have register, assign, deprecate.
  • in API you have register, unregister, assign, unassign, deprecate

This is because Studio is the main user of API, and having separate commands for them is clearer.
For human in CLI we'll have deprecate unifying deregister and unassign, making things simpler, hopefully.

A side note: it looks like API and CLI in GTO diverge more and more over time.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@aguschin
Copy link
Contributor Author

aguschin commented Aug 19, 2022

I'm merging this with how it works to release existing changes. Let's enhance on top of what it's there already. Follow-up issues

Feel free to join any discussion if you wish.

@aguschin aguschin merged commit 691ca4b into main Aug 19, 2022
@aguschin aguschin deleted the feature/unassign branch August 19, 2022 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

How to deprecate a model Unclear demotion behavior
7 participants