Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSClient auth fails with token-based application credentials.json #390

Open
joconnor-ecaa opened this issue Dec 29, 2023 · 5 comments
Open

Comments

@joconnor-ecaa
Copy link
Contributor

The following

gcloud auth application-default login
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/application_default_credentials.json
python -c "from cloudpathlib import GSPath; GSPath('gs://private_file').download_to('.')"

fails with

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/joe/cloudpathlib/cloudpathlib/cloudpath.py", line 171, in __call__
    cls.__init__(new_obj, cloud_path, *args, **kwargs)  # type: ignore[type-var]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/cloudpathlib/cloudpath.py", line 230, in __init__
    client = self._cloud_meta.client_class.get_default_client()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/cloudpathlib/client.py", line 104, in get_default_client
    cls._default_client = cls()
                          ^^^^^
  File "/Users/joe/cloudpathlib/cloudpathlib/gs/gsclient.py", line 88, in __init__
    self.client = StorageClient.from_service_account_json(application_credentials)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/venv/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 109, in from_service_account_json
    return cls.from_service_account_info(credentials_info, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/venv/lib/python3.11/site-packages/google/cloud/client/__init__.py", line 76, in from_service_account_info
    credentials = service_account.Credentials.from_service_account_info(info)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/venv/lib/python3.11/site-packages/google/oauth2/service_account.py", line 240, in from_service_account_info
    signer = _service_account_info.from_dict(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joe/cloudpathlib/venv/lib/python3.11/site-packages/google/auth/_service_account_info.py", line 50, in from_dict
    raise exceptions.MalformedError(
google.auth.exceptions.MalformedError: Service account info was not in the expected format, missing fields token_uri, client_email.

This is because the logic in the GSClient init assumes that if a GOOGLE_APPLICATION_CREDENTIALS file exists, it is in the format of a service account JSON key (i.e. the call to from_service_account_json).

When using workload identity federation GOOGLE_APPLICATION_CREDENTIALS is in a different format (see here).

It is possible to work around this with existing functionality, e.g. explicitly creating a google.storage.client.Client or credentials object. However it would be nice if GSClient and GSPath "just work" with workload identity federation. I've been monkeypatching GSClient and GSPath to achieve this in a few projects.

The simplest workaround is probably to replace the call to from_service_account_json with a call to google.auth.load_credentials_from_file.

@pjbull
Copy link
Member

pjbull commented Dec 29, 2023

Thanks for the detailed report. From what I read, looks like this wouldn't be a breaking change for users that rely on the way things work now, so happy to take the fix you suggested.

Do you know if there is a good way to get and test the workload identity credentials?

@joconnor-ecaa
Copy link
Contributor Author

Do you know if there is a good way to get and test the workload identity credentials?

You can set up workload identity federation by following this guide:

https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions

Adding a CI workflow that authenticates using the github action shown in that blog and then runs the live GCS tests should do the trick. Unsure if there's a simpler way to test.

@beazerj
Copy link

beazerj commented Dec 3, 2024

Any timeline on implementing this?

@pjbull
Copy link
Member

pjbull commented Dec 3, 2024

@beazerj I don't have a timeline for this—have a few higher priority items I am working on.
Happy to take a PR that has a fix this.

From my reading, I think the fix might be simpler. Remove this block:

if application_credentials is None:
application_credentials = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")

I think that if that is done, when we call StorageClient() it will get the default auth which should do the right think in the scenario.

We just need someone to confirm this fix works and submit a PR. I think we probably won't explicitly test the live scenario since getting the config right to do so looks too complicated.

@amardeep
Copy link

amardeep commented Dec 8, 2024

Just ran into this bug today. Hoping for a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants