Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--force not allowed during dvc move #7240

Open
ntjess opened this issue Jan 8, 2022 · 5 comments
Open

--force not allowed during dvc move #7240

ntjess opened this issue Jan 8, 2022 · 5 comments
Labels
A: data-management Related to dvc add/checkout/commit/move/remove bug Did we break something? ui user interface / interaction

Comments

@ntjess
Copy link

ntjess commented Jan 8, 2022

Bug Report

Description

--force is required to overwrite a file during move, but this isn't allowed as a parameter

Reproduce

  1. Create a directory with files a and b, checking both into dvc
  2. dvc move a b produces the error
    Use '--force' to overwrite.
  3. dvc move --force a b produces the error
    ERROR: unrecognized arguments: --force

Expected

force should allow overwriting

Environment information

Output of dvc doctor:

DVC version: 2.9.3 (pip)
---------------------------------
Platform: Python 3.9.7 on Windows-10-10.0.22000-SP0
Supports:
        gdrive (pydrive2 = 1.10.0),
        webhdfs (fsspec = 2021.11.1),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: gdrive
Workspace directory: NTFS on C:\
Repo: dvc, git
@tirkarthi
Copy link
Contributor

Sample test case. The move used to overwrite before 256cb06

def test_move_file_exists(tmp_dir, dvc):
    from dvc.exceptions import OverlappingOutputPathsError

    tmp_dir.dvc_gen({"foo": "foo", "bar": "bar"})
    dvc.move("foo", "bar")

    assert (tmp_dir / "bar").read_text() == "foo"
    assert (tmp_dir / "bar.dvc").exists()
    assert not (tmp_dir / "foo").exists()
    assert not (tmp_dir / "foo.dvc").exists()
commit 256cb06921da79dd5fcb8c35023504a98afdacdf
Author: Peter Rowlands (변기호) <[email protected]>
Date:   Wed May 5 13:09:55 2021 +0900

    move: validate new stage before move (#5953)
    
    * move: validate new stage before move
    
    * verify that nothing was moved in tests

@tirkarthi
Copy link
Contributor

tirkarthi commented Jan 9, 2022

A simpler fix would be to pass force=True during new stage creation and I can create a PR. Please let me know if I am missing some other case here.

Edit : force=True probably based on condition like if both source and destination are tracked data sources.

diff --git a/dvc/repo/move.py b/dvc/repo/move.py
index a4f1a7c1..c2dcbf4c 100644
--- a/dvc/repo/move.py
+++ b/dvc/repo/move.py
@@ -67,6 +67,7 @@ def move(self, from_path, to_path):
             wdir=new_wdir,
             outs=[to_path],
             meta=stage.meta,
+            force=True
         )
 
         os.unlink(stage.path)

@pmrowla
Copy link
Contributor

pmrowla commented Jan 9, 2022

dvc move isn't supposed to have a --force argument, the error message suggesting to use --force is the bug (the message probably comes from stage code that isn't actually applicable during dvc move). The preferred fix here would be to correct the error message.

move overwriting by default prior to that change was a bug and led to unexpected behavior, particularly when dealing with tracked directories (see the original issue that was fixed by that change #5886)

Adding support for --force is something we can consider, but determining when it is allowed is a bit more complicated than just "are both source and data tracked outputs".

@pmrowla pmrowla added bug Did we break something? ui user interface / interaction labels Jan 9, 2022
@tirkarthi
Copy link
Contributor

Thanks for the clarification @pmrowla . The hint seems to be tightly coupled to the validation function. The hint makes sense for single/pipeline stages created with run as suggested in #4003. Perhaps an additional flag being passed to check_stage_exists like show_hint which is by default is True and dvc.move passes it False through **kwargs while trying to create a stage?

@pmrowla
Copy link
Contributor

pmrowla commented Jan 9, 2022

I don't think we need an extra flag, the hint in check_stage_exists can probably just be specific to when stage is PipelineStage here:

dvc/dvc/stage/utils.py

Lines 293 to 301 in 92f714e

hint = "Use '--force' to overwrite."
if not isinstance(stage, PipelineStage):
raise StageFileAlreadyExistsError(
f"'{stage.relpath}' already exists. {hint}"
)
elif stage.name and stage.name in dvcfile.stages:
raise DuplicateStageName(
f"Stage '{stage.name}' already exists in '{stage.relpath}'. {hint}"
)

cc: @skshetry

@daavoo daavoo added A: move A: data-management Related to dvc add/checkout/commit/move/remove and removed A: move labels Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove bug Did we break something? ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

4 participants