Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling with bucket access issues #600

Open
dmpetrov opened this issue Nov 14, 2024 · 3 comments
Open

Error handling with bucket access issues #600

dmpetrov opened this issue Nov 14, 2024 · 3 comments
Assignees
Labels
bug Something isn't working priority-p1

Comments

@dmpetrov
Copy link
Member

Description

I got a massive stackstrace while a single line is enough like:

[email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

DC should carefully handle these types of errors.

The stack trace:

Using cached virtualenv
Listing gs://mpii-human-pose: 0 objects [00:00, ? objects/s]
Processed: 1 rows [00:00, 11.23 rows/s] [00:00, ? objects/s]
Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/lib/dc.py", line 474, in from_storage
    .save(list_ds_name, listing=True)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/lib/dc.py", line 764, in save
    query=self._query.save(
          ^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/query/dataset.py", line 1579, in save
    query = self.apply_steps()
            ^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/query/dataset.py", line 1128, in apply_steps
    result = step.apply(
             ^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/query/dataset.py", line 572, in apply
    self.populate_udf_table(udf_table, query)
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/query/dataset.py", line 492, in populate_udf_table
    process_udf_outputs(
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/query/dataset.py", line 335, in process_udf_outputs
    for row in udf_output:
               ^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/lib/udf.py", line 369, in <genexpr>
    output = (dict(zip(self.signal_names, row)) for row in udf_outputs)
                                                           ^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/lib/udf.py", line 368, in <genexpr>
    udf_outputs = (self._flatten_row(row) for row in result_objs)
                                                     ^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/lib/listing.py", line 35, in list_func
    for entries in iter_over_async(client.scandir(path.rstrip("/")), get_loop()):
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/asyn.py", line 238, in iter_over_async
    done, obj = asyncio.run_coroutine_threadsafe(get_next(), loop).result()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/asyn.py", line 231, in get_next
    obj = await ait.__anext__()
          ^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/client/fsspec.py", line 225, in scandir
    await main_task
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/client/gcs.py", line 57, in _fetch_flat
    await self._get_pages(prefix, page_queue)
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/datachain/client/gcs.py", line 91, in _get_pages
    page = await self.fs._call(
           ^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/gcsfs/core.py", line 447, in _call
    status, headers, info, contents = await self._request(
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/gcsfs/retry.py", line 130, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/gcsfs/core.py", line 440, in _request
    validate_response(status, contents, path, args)
  File "/tmp/local/datachain_venv/python3.12/default/lib/python3.12/site-packages/gcsfs/retry.py", line 111, in validate_response
    raise OSError(f"Forbidden: {path}\n{msg}")
OSError: Forbidden: b/mpii-human-pose/o
[email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).
Query script exited with error code 1

Version Info


@dmpetrov dmpetrov added bug Something isn't working priority-p1 labels Nov 14, 2024
@0x2b3bfa0
Copy link
Member

Same issue as https://github.com/iterative/studio/issues/10235, but the point is user experience, not specific errors (?)

@shcheklein
Copy link
Member

  • First step - never show stack traces unless -v or something terrible happened
  • Second step (and in general going forward) pay attention to "regular" exceptions and handle them properly - sometimes can be wrapper, sometimes we need to change logic (e.g. test in advance runtime so that we don't have an exception at all later).

@dmpetrov
Copy link
Member Author

More examples.

No credentials:

  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/aiobotocore/signers.py", line 90, in sign
    auth.add_auth(request)
    ~~~~~~~~~~~~~^^^^^^^^^
  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/botocore/auth.py", line 423, in add_auth
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

Not existing bucket:

  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/datachain/client/s3.py", line 101, in _fetch_flat
    await get_pages(it, page_queue)
  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/datachain/client/s3.py", line 56, in get_pages
    async for page in it:
        await page_queue.put(page.get(contents_key, []))
  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/aiobotocore/paginate.py", line 30, in __anext__
    response = await self._make_request(current_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tester/venv/venv-py309/lib/python3.13/site-packages/aiobotocore/client.py", line 412, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectVersions operation: The specified bucket does not exist

This should be DC native error message (not boto or other libs). Also, it it should mention bucket names. Otherwise, it's hard to nail down the issue in a large script.

@amritghimire amritghimire self-assigned this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-p1
Projects
None yet
Development

No branches or pull requests

4 participants