Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files from HuggingFace cannot be cached #746

Open
skshetry opened this issue Dec 25, 2024 · 1 comment
Open

files from HuggingFace cannot be cached #746

skshetry opened this issue Dec 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@skshetry
Copy link
Member

skshetry commented Dec 25, 2024

Description

  File "/home/runner/work/datachain/datachain/.nox/examples-3-12/lib/python3.12/site-packages/datachain/lib/prefetcher.py", line 31, in _prefetch_input
    await obj._prefetch()
  File "/home/runner/work/datachain/datachain/.nox/examples-3-12/lib/python3.12/site-packages/datachain/lib/file.py", line 275, in _prefetch
    await client._download(self, callback=self._download_cb)
  File "/home/runner/work/datachain/datachain/.nox/examples-3-12/lib/python3.12/site-packages/datachain/client/fsspec.py", line 374, in _download
    await self._put_in_cache(file, callback=callback)
  File "/home/runner/work/datachain/datachain/.nox/examples-3-12/lib/python3.12/site-packages/datachain/client/fsspec.py", line 382, in _put_in_cache
    etag = await self.get_current_etag(file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/datachain/datachain/.nox/examples-3-12/lib/python3.12/site-packages/datachain/client/fsspec.py", line 204, in get_current_etag
    info = await self.fs._info(self.get_full_path(file.path))
                 ^^^^^^^^^^^^^
AttributeError: 'HfFileSystem' object has no attribute '_info'. Did you mean: 'info'?

HfClient does not implement all the APIs necessary for caching. That also breaks prefetching.
HFFileSystem only offers sync interface, so it is very much incompatible with the expectation of Client which expects an AsyncFileSystem. So it's fundamentally broken.

@skshetry skshetry added the bug Something isn't working label Dec 25, 2024
@skshetry
Copy link
Member Author

I'll disable prefetching for hf for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant