Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceAccount missing role for Image Streaming #2251

Open
moshikbaruch opened this issue Jan 22, 2025 · 0 comments
Open

ServiceAccount missing role for Image Streaming #2251

moshikbaruch opened this issue Jan 22, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@moshikbaruch
Copy link

moshikbaruch commented Jan 22, 2025

TL;DR

The service account created by the module lacks the necessary permissions to support the gcfs (Image streaming) feature.
Specifically, it does not include the roles/serviceusage.serviceUsageConsumer permission, which is required for image streaming to function correctly. This results in pods failing to start on affected nodes.

GCP deprecate the old role called: roles/container.nodeServiceAccount
and the new one doesn't have the required permissions (roles/container.defaultNodeServiceAccount)

Expected behavior

The service account created by the module should include all necessary roles and permissions,
including roles/serviceusage.serviceUsageConsumer, to support GKE features like image streaming without manual intervention.

GCP had a role before called roles/container.nodeServiceAccount which included the reqired policy for that:

includedPermissions:
- autoscaling.sites.writeMetrics
- logging.logEntries.create
- monitoring.metricDescriptors.create
- monitoring.metricDescriptors.list
- monitoring.timeSeries.create
- monitoring.timeSeries.list
- resourcemanager.projects.get
- resourcemanager.projects.list
- serviceusage.services.use
- storage.objects.get
- storage.objects.list
name: roles/container.nodeServiceAccount

but they decided to deprecate it and moved to a new one called roles/container.defaultNodeServiceAccount, the new one doesn't have serviceusage.services.use by default and therefore giving errors when image streaming is enabled.

includedPermissions:
- autoscaling.sites.writeMetrics
- logging.logEntries.create
- monitoring.metricDescriptors.create
- monitoring.metricDescriptors.list
- monitoring.timeSeries.create
- monitoring.timeSeries.list
name: roles/container.defaultNodeServiceAccount

the expected behavior is that the module will check if image streaming is enabled and add another role to the newly created service account which includes roles/serviceusage.serviceUsageConsumer

Observed behavior

The service account created by the module does not include roles/serviceusage.serviceUsageConsumer, causing Our pods started to fail off, and could not start, with many errors such as

  • bus error
  • pods starting but getting stuck
  • input/output errors

and we started to get errors on the node:
level=error msg="AuthRefresh fails for one secret" error="rpc error: code = PermissionDenied desc = Caller does not have required permission to use project xxx. Grant the caller the roles/serviceusage.serviceUsageConsumer role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=xxx and then retry.

we noticed that the role of the service account the modules creates changed, and that the policy is missing.

Terraform Configuration

module "gke" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-public-cluster"
  version = "v34.0.0"

  project_id         = var.project_id
  name               = local.cluster_name
  ...

  create_service_account = true
  grant_registry_access  = true
  
  service_account = "create"
  node_metadata   = "GKE_METADATA"
  ...

  enable_gcfs                         = var.enable_gcfs
  ... 



  master_authorized_networks = flatten([
    ...
  ])

  node_pools              = var.node_pools
  node_pools_oauth_scopes = var.node_pools_oauth_scopes
  node_pools_labels       = var.node_pools_labels
  node_pools_metadata     = var.node_pools_metadata
  node_pools_tags         = var.node_pools_tags
  node_pools_taints       = var.node_pools_taints
}

Terraform Version

1.57

Additional information

The following was a difficult to investigate, I didn't found any GKE updates about the change.
we opened a ticket about the change and the support said to either add the policy somehow, or use the roles/container.nodeServiceAgent

I think the module should add another role to the creation of the service account when gcfs enabled - (roles/serviceusage.serviceUsageConsumer) or changing the default role to roles/container.nodeServiceAgent.

Thank you!

@moshikbaruch moshikbaruch added the bug Something isn't working label Jan 22, 2025
@moshikbaruch moshikbaruch changed the title Service Account Missing Role for Image Streaming ServiceAccount missing role for Image Streaming Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant