Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module does not work with Ephemeral Agents/Runners #94

Open
dansiviter opened this issue May 24, 2021 · 19 comments
Open

Module does not work with Ephemeral Agents/Runners #94

dansiviter opened this issue May 24, 2021 · 19 comments
Labels
bug Something isn't working P2 high priority issues question Further information is requested triaged Scoped and ready for work

Comments

@dansiviter
Copy link
Contributor

This module is not designed for ephemeral runners/agents such as Terraform Cloud Remote Operations that cannot have gcloud on the PATH and cannot have custom base images. If anything fails you have to unwind and re-apply all the changes again to force gcloud to be re-downloaded.

Example:

module.asm_install.module.gcloud_kubectl.null_resource.additional_components[0] (local-exec): Executing; ["/bin/sh" "-c" ".terraform/modules/asm_install/scripts/check_components.sh .terraform/modules/asm_isntall/cache/xxxx/google-cloud-skd/bin/gcloud kubectl,kpt,beta.kustomize"]

Error: Error running command '.terraform/modules/asm_install/scripts/check_components.sh .terraform/modules/asm_isntall/cache/xxxx/google-cloud-skd/bin/gcloud kubectl,kpt,beta.kustomize': exit status 127. Output:

In the above example, due to a previously failed run the repeat run then fails to get passed this as it assumes gcloud exists in the cache. However, when it runs it returns a 127 command not found code.

@morgante
Copy link
Contributor

Are you setting the GCLOUD_TF_DOWNLOAD variable? it is specifically to enable downloading in environments where gcloud is not present.

@morgante morgante added the question Further information is requested label May 24, 2021
@dansiviter
Copy link
Contributor Author

Apologies, I should have stated skip_download: false is set.

@morgante
Copy link
Contributor

I see, I'm not actually sure how we can fix this though. Open to ideas.

@dansiviter
Copy link
Contributor Author

I would argue we need to break the link between persisted TF state and local caching. That would mean dropping the following:

  • random_id.cache,
  • null_resource.prepare_cache,
  • null_resource.download_gcloud,
  • null_resource.download_jq.

Then performing caching from within each script to check for gcloud (or jq) each time. This will maintain the exiting functionality running locally and caching the binaries. When this runs in an ephemeral agent/runner it will be slower as it'll be downloading each time but should at least it should complete (bar any download issues).

@morgante
Copy link
Contributor

Moving the logic into each script (and, therefore, into bash) would probably work, though it does increase complexity.

However, I should be clear it's not necessarily a top priority for our team. In general we've found adding more functionality for downloading gcloud to add complexity (and a source of bugs) and are focusing more on having it available in all execution environments. So we're probably unlikely to work on this soon.

If this is a priority for you, I'd be happy to review a pull request though. I do think it's reasonable to move the check into each script.

@dansiviter
Copy link
Contributor Author

If this helps focus the issue, we hit this when trying to install ASM (via terraform-google-kubernetes-engine/modules/asm) so if there is a way in the works to do that without this module (via kubectl-wrapper) then it will be less of an issue for us.

@morgante
Copy link
Contributor

We'd love to move ASM install away from a bash script, but it definitely won't be happening soon.

@AndreaGiardini
Copy link

I am hitting the same problem on terraform cloud. Have you found any workaround?

@onsails
Copy link

onsails commented Oct 3, 2021

I've managed to work-around adding for_each to the module arguments:

module "gcloud" {
  # force new state on each apply
  for_each = {
    timestamp = "${timestamp()}"
  }

  ...
}

Of course, it works for idempotent create_cmd and special handling in delete_cmd

@trajan0x
Copy link

this is still breaking for me on ephemeral destroys

@Mithun2507
Copy link

in the recent upgrade of the terraform enterprise where we need to move to the new hashicorp/tfc-agent we are experiencing this issue. Do we have any fix on this from container image creation perspective?
In the custom tfe agent image we are installing the gcloud SDK and adding the cloud SDK path to the global profile but still we get below error

Error: local-exec provisioner error
with module.gcloud-module-test.null_resource.run_destroy_command[0]
on .terraform/modules/gcloud-module-test/main.tf line 258, in resource "null_resource" "run_destroy_command":
provisioner "local-exec" {
Error running command 'PATH=/root/.tfc-agent/component/terraform/runs/run-wRE74yf2dNFqyGmg/config/sandbox/.terraform/modules/gcloud-module-test/cache/380921a3/google-cloud-sdk/bin:$PATH
gcloud info
': exit status 127. Output: /bin/sh: 2: gcloud: not found

@peridotml
Copy link

peridotml commented Jun 21, 2023

I am having the same issue as @Mithun2507 using terraform cloud. I imagine there are setting to make it work, but it would be helpful for folks newer to terraform cloud. Would really appreciate if a community member could help.

I am trying not skipping the download! I did not realize that was off by default

@bharathkkb
Copy link
Member

@Mithun2507 in the custom TFE image is gcloud in path? Could you try locally pulling the image and checking if you can execute gcloud commands?

@peridotml Setting download to true should work but it would be more robust if you can create a custom image that has gcloud installed.

@Radiergummi
Copy link

I just hit this issue, with otherwise innocuous TF configuration. As this has been open for more than two years now, maybe it'd be time to either add a failure condition to the bash script if running in an ephemeral environment, or simply state in the documentation that the module isn't compatible with Terraform Cloud and alike? That would save users a bunch of time without lots of work....

@Mithun2507
Copy link

@Mithun2507 in the custom TFE image is gcloud in path? Could you try locally pulling the image and checking if you can execute gcloud commands?

Yes, in the image we had the path specified. I tested the image locally and gcloud works fine locally. Hashi informed us they don't support gcloud in the new tfc-agent image

@anonimitoraf
Copy link

Just encountered this too. All this time I thought I was doing something wrong 😭

@idelsink
Copy link

idelsink commented Apr 30, 2024

I'm currently experimenting with setting up an IAP tunnel in Terraform Cloud and using terraform-ssh-tunnel to setup a socks5 tunnel. I'm able to connect to my private kubernetes cluster during the plan phase, but during execution it doesn't want to connect. I can at-least for now share my crude method of installing the GCloud SDK in Terraform Cloud:

Here I use and external data module to call a bash script that will make the SDK available

data "external" "install_gcloud" {
  program = [
    "bash",
    "${path.module}/install-gcloud.sh"
  ]
  query = {
    GOOGLE_CREDENTIALS                  = base64decode(google_service_account_key.cluster_access.private_key)
    GCLOUD_SDK_ACTIVATE_SERVICE_ACCOUNT = "true"
  }
}

And install-gcloud.sh:

#!/usr/bin/env bash
# Install gcloud CLI
set -e

JSON_QUERY=$(jq -r)

# Make all of the keys in JSON_QUERY available as env vars
for key in $(echo "$JSON_QUERY" | jq -r 'keys[]'); do
  value=$(echo "$JSON_QUERY" | jq -r ".$key")
  case "${key}" in
    # Custom handling of special keys if needed
    # "foo")
    #   ...do thing
    #   ;;
    *)
      # Default option is to export like export key=value
      declare "${key}=${value}"
      export "${key?}"
      ;;
  esac
done

# Values that can changed
: ${CACHE_PATH:=$(realpath "cache")}
: ${GCLOUD_SDK_ACTIVATE_SERVICE_ACCOUNT:="false"}
: ${GCLOUD_SDK_PLATFORM:="linux"}
: ${GCLOUD_SDK_UPGRADE:="false"}
: ${GCLOUD_SDK_VERSION:="473.0.0"}
: ${GOOGLE_CREDENTIALS:=""}
: ${JQ_PLATFORM:="linux"}
: ${JQ_VERSION:="1.7"}

: ${GCLOUD_SDK_SERVICE_ACCOUNT_KEYFILE:="${CACHE_PATH}/terraform-google-credentials.json"}
: ${GCLOUD_TAR_PATH:="${CACHE_PATH}/google-cloud-sdk-${GCLOUD_SDK_VERSION}-${GCLOUD_SDK_PLATFORM}-x86_64.tar.gz"}
: ${JQ_FILE_PATH:="${CACHE_PATH}/jq-${JQ_VERSION}-${JQ_PLATFORM}-64"}
: ${GCLOUD_SDK_PATH:="${CACHE_PATH}/google-cloud-sdk-${GCLOUD_SDK_VERSION}-${GCLOUD_SDK_PLATFORM}-x86_64"}

: ${GCLOUD_BIN_PATH:="${GCLOUD_SDK_PATH}/bin"}

GCLOUD_SDK_DOWNLOAD_URI="https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-${GCLOUD_SDK_VERSION}-${GCLOUD_SDK_PLATFORM}-x86_64.tar.gz"
JQ_DOWNLOAD_URL="https://github.com/stedolan/jq/releases/download/jq-${JQ_VERSION}/jq-${JQ_PLATFORM}64"

# Prepare cache directory
mkdir -p "${CACHE_PATH}"

if [[ ! -f ${GCLOUD_TAR_PATH} ]]; then
  # Only download SDK when its not locally available
  curl -sL -o "${GCLOUD_TAR_PATH}" "${GCLOUD_SDK_DOWNLOAD_URI}"
fi

if [[ ! -f ${JQ_FILE_PATH} ]]; then
  # Only download JQ when its not locally available
  curl -sL -o "${JQ_FILE_PATH}" "${JQ_DOWNLOAD_URL}"
  chmod +x "${JQ_FILE_PATH}"
fi

# Decompress and prepare bin directory (with JQ)
mkdir -p "${GCLOUD_SDK_PATH}"
tar -xzf "${GCLOUD_TAR_PATH}" --directory "${GCLOUD_SDK_PATH}" --strip-components 1 # Decompress and ignore top level directory in archive
cp "${JQ_FILE_PATH}" "${GCLOUD_BIN_PATH}"

export PATH=$PATH:${GCLOUD_BIN_PATH}

if [[ "${GCLOUD_SDK_UPGRADE}" = "true" ]]; then
  gcloud components update --quiet
fi

if [[ ! -z "$GOOGLE_CREDENTIALS" ]]; then
  printf "%s" "$GOOGLE_CREDENTIALS" > "${GCLOUD_SDK_SERVICE_ACCOUNT_KEYFILE}"
fi

if [[ "${GCLOUD_SDK_ACTIVATE_SERVICE_ACCOUNT}" = "true" ]]; then
  gcloud --quiet auth activate-service-account --key-file "${GCLOUD_SDK_SERVICE_ACCOUNT_KEYFILE}"
fi

jq_version=$(jq --version)
gcloud_version=$(gcloud version --format="json")

OUTPUT_JSON_STRING=$(jq --null-input \
  --arg cache_path "${CACHE_PATH}" \
  --arg gcloud_path "${GCLOUD_BIN_PATH}" \
  --arg gcloud_version "$gcloud_version" \
  --arg jq_version "$jq_version" \
  '{
      cache_path: $cache_path,
      gcloud_path: $gcloud_path,
      gcloud_version: $gcloud_version,
      jq_version: $jq_version,
  }')

echo "${OUTPUT_JSON_STRING}"

Now you can use the output from the data module to call gcloud:

...
  gcloud_cmd = "${data.external.install_gcloud.result.gcloud_path}/gcloud"
...

@imranzunzani
Copy link

This is still a problem. It will be helpful to skip referencing the cache by introducing a flag like skip_cache=true , which enforces skip_download=false every time.

@imranzunzani
Copy link

I've managed to work-around adding for_each to the module arguments:

module "gcloud" {
  # force new state on each apply
  for_each = {
    timestamp = "${timestamp()}"
  }

  ...
}

Of course, it works for idempotent create_cmd and special handling in delete_cmd

This doesn't work. Fails while destroying elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 high priority issues question Further information is requested triaged Scoped and ready for work
Projects
None yet
Development

No branches or pull requests