Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Launch Failed for GCP Service Account from Controller #4512

Closed
weih1121 opened this issue Dec 27, 2024 · 1 comment
Closed

Job Launch Failed for GCP Service Account from Controller #4512

weih1121 opened this issue Dec 27, 2024 · 1 comment
Assignees

Comments

@weih1121
Copy link
Contributor

weih1121 commented Dec 27, 2024

Issue Summary:

While running the following command to launch jobs using a GCP service account:

python cli.py jobs launch ~/hello-sky/hello_sky.yaml --use-spot

The job failed with an error related to invalid OAuth 2.0 credentials:

(sky-3b74-hong, pid=3644) Your "OAuth 2.0 Service Account" credentials are invalid. Please run
(sky-3b74-hong, pid=3644)   $ gcloud auth login
(sky-3b74-hong, pid=3644) OSError: No such file or directory.

The error message indicates an issue with the GCP service account credentials not being properly configured or authenticated. Upon further investigation, the root cause appears to be the environment variable GOOGLE_APPLICATION_CREDENTIALS not being set.

Error detail:

Failed to launch a cluster with error: subprocess.CalledProcessError: Command 'pushd /tmp &>/dev/null && { gcloud --help > /dev/null 2>&1 || { mkdir -p ~/.sky/logs && wget --quiet https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-424.0.0-linux-x86_64.tar.gz > ~/.sky/logs/gcloud_installation.log && tar xzf google-cloud-sdk-424.0.0-linux-x86_64.tar.gz >> ~/.sky/logs/gcloud_installation.log && rm -rf ~/google-cloud-sdk >> ~/.sky/logs/gcloud_installation.log && mv google-cloud-sdk ~/ && ~/google-cloud-sdk/install.sh -q >> ~/.sky/logs/gcloud_installation.log 2>&1 && echo "source ~/google-cloud-sdk/path.bash.inc > /dev/null 2>&1" >> ~/.bashrc && source ~/google-cloud-sdk/path.bash.inc >> ~/.sky/logs/gcloud_installation.log 2>&1; }; } && popd &>/dev/null && [[ "$(uname)" == "Darwin" ]] && skypilot_gsutil() { gsutil -m -o "GSUtil:parallel_process_count=1" "$@"; } || skypilot_gsutil() { gsutil -m "$@"; }; GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json skypilot_gsutil ls -d gs://skypilot-workdir-hong-d0701bb4' returned non-zero exit status 1.

Root cause:
The root cause is the missing environment variable GOOGLE_APPLICATION_CREDENTIALS. The environment was not properly set up to point to the service account key file located at ~/.config/gcloud/application_default_credentials.json.

After manually setting the environment variable and activating the service account with:

export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/application_default_credentials.json
gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS

The issue was resolved, and the gsutil command began functioning correctly, allowing the job to run successfully.

Version & Commit info:

  • sky -v: PLEASE_FILL_IN
  • sky -c: PLEASE_FILL_IN
@weih1121
Copy link
Contributor Author

weih1121 commented Jan 2, 2025

Close this issue since use a service account for GCP needs to config remote_identity: SERVICE_ACCOUNT in ~/.sky/config.yaml first.

@weih1121 weih1121 closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant