Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC can not push file larger than 100MiB due to an upstream bug #10643

Open
CNLHC opened this issue Dec 5, 2024 · 0 comments
Open

DVC can not push file larger than 100MiB due to an upstream bug #10643

CNLHC opened this issue Dec 5, 2024 · 0 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? fs: oss Related to the Alibaba Cloud OSS filesystem

Comments

@CNLHC
Copy link

CNLHC commented Dec 5, 2024

Bug Report

Description

Since a bug in the ossfs dependency, dvc can not push file larger than 100MiB to the oss remote.

Reproduce

dvc init
mkfile -n 200m test.blob
dvc add test.blob
dvc remote add foo oss://<oss_bucket>
dvc push -r s9t

The output report the file is pushed to remote which is false.

...python3.12/site-packages/ossfs/async_oss.py:388: RuntimeWarning: coroutine 'resumable_upload' was never awaited                                                                  
  await self._call_oss(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Pushing
1 file pushed

Expected

The file should be pushed to the oss, or if something unexpected happen, the CLI should report an error.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 3.58.0 (pip)
-------------------------
Platform: Python 3.12.7 on macOS-15.1.1-arm64-arm-64bit
Subprojects:
	dvc_data = 3.16.7
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.9
Supports:
	http (aiohttp = 3.9.5, aiohttp-retry = 2.9.1),
	https (aiohttp = 3.9.5, aiohttp-retry = 2.9.1),
	oss (ossfs = 2023.12.0),
	s3 (s3fs = 2024.10.0, boto3 = 1.35.36)
Config:
	Global: /Users/liuhancheng/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: oss
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/adbc8e36d46e0788fce6cf0882302974

Additional Information (if any):

The root cause of this issue is a bug in the ossfs package. According to the warning info, this line is used to upload large file
https://github.com/fsspec/ossfs/blob/224e98868f32018fabacdac1eb5daddb16ce419c/src/ossfs/async_oss.py#L388
and finally this line(L159) will invoke the underlying method to perform uploading.
https://github.com/fsspec/ossfs/blob/224e98868f32018fabacdac1eb5daddb16ce419c/src/ossfs/async_oss.py#L159

when the method_name is , the method(service, *args, **kwargs) returns a future that is not awaited, which cause this problem.

A small and quick fix is applying this patch:

diff --git a/src/ossfs/async_oss.py b/src/ossfs/async_oss.py
index 8a07b5e..f01b501 100644
--- a/src/ossfs/async_oss.py
+++ b/src/ossfs/async_oss.py
@@ -156,7 +156,10 @@ class AioOSSFileSystem(BaseOSSFileSystem, AsyncFileSystem):
             if not method:
                 method = getattr(aiooss2, method_name)
                 logger.debug("CALL: %s - %s - %s", method.__name__, args, kwargs)
-                out = method(service, *args, **kwargs)
+                if method_name =="resumable_upload":
+                    out = await method(service, *args, **kwargs)
+                else:
+                    out = method(service, *args, **kwargs)
             else:
                 logger.debug("CALL: %s - %s - %s", method.__name__, args, kwargs)
                 out = await method(*args, **kwargs)
@CNLHC CNLHC changed the title DVC can not push file larger than 100MiB due to upstream bug DVC can not push file larger than 100MiB due to an upstream bug Dec 5, 2024
@shcheklein shcheklein added A: data-sync Related to dvc get/fetch/import/pull/push fs: oss Related to the Alibaba Cloud OSS filesystem bug Did we break something? labels Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? fs: oss Related to the Alibaba Cloud OSS filesystem
Projects
None yet
Development

No branches or pull requests

2 participants