Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc import-url: can't pull data if using --no-download #10594

Open
DGrady opened this issue Oct 18, 2024 · 0 comments
Open

dvc import-url: can't pull data if using --no-download #10594

DGrady opened this issue Oct 18, 2024 · 0 comments

Comments

@DGrady
Copy link

DGrady commented Oct 18, 2024

Bug Report

Description

The documentation for import-url explains that running this command:

dvc import-url --no-download s3://mybucket/data.csv

should create a DVC metadata file with the pointer and hash information for the source data file, and that it should not download the data immediately. That works as expected.

The documentation also states that if I later run

dvc pull data.csv

at that point, it will download the data and place it in my work tree. (I guess it's not clear whether the data will be added to the cache?) This doesn't work; instead

> dvc pull data.csv
Collecting
Fetching
Building workspace index
Comparing indexes
Applying changes
Everything is up to date.
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
data.csv
Is your cache up to date?
<https://error.dvc.org/missing-files>

Reproduce

Expected

Based on the documentation, my expectation is that

dvc import-url --no-download s3://mybucket/data.csv
dvc pull data.csv

should copy data.csv to my local work tree from S3, that data.csv should not be added to the cache, and that any changes to data.csv in S3 should cause local pipelines that use data.csv as a dependency to be flagged as out of date.

This expected behavior is explained in a couple of places in the documentation:

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.12.7 on macOS-15.0.1-x86_64-i386-64bit
Subprojects:
	dvc_data = 3.16.6
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.8
Supports:
	http (aiohttp = 3.10.10, aiohttp-retry = 2.8.3),
	https (aiohttp = 3.10.10, aiohttp-retry = 2.8.3),
	s3 (s3fs = 2024.9.0, boto3 = 1.35.36)
Config:
	Global: /Users/dan/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/a60ba71a9b85f3c8d2c283884465924b

Additional Information (if any):

@DGrady DGrady changed the title dvc import-url: Non-cached imported URLs won't pull dvc import-url: can't pull data if using --no-download Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant