-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable standard forms of GCP auth for oci sources #815
base: main
Are you sure you want to change the base?
Enable standard forms of GCP auth for oci sources #815
Conversation
Hey @thejosephstevens can you signoff your commits so we can run the tests. See https://github.com/fluxcd/pkg/pull/815/checks?check_run_id=31453839284 |
f6308e6
to
1ce9cc4
Compare
Done! |
1ce9cc4
to
ded413e
Compare
@thejosephstevens please run |
ded413e
to
cb4a6bc
Compare
@darkowlzz can you fork this and run our GCP e2e tests please. |
@thejosephstevens could you please run |
590d595
to
d48b493
Compare
Just got them all working! |
Hi @thejosephstevens, thanks for all the work here! 🏅 One question:
Here, are you specifically referring to GCP Workload Identity Federation? |
I am! We generate cross-cloud identities for our workloads backed by k8s OIDC through Workload Identity Federation (or equivalent) in each cloud. We just happen to run our core operations primarily in GCP, which is why I jumped in here to contribute. At a super specific level, we use |
Signed-off-by: Joseph Stevens <[email protected]>
d48b493
to
d1c9d10
Compare
Hi, I just tested this using our existing integration test suite and they continue to work for both workload identity and node identity within GCP. Inspecting the effect of the code change, I don't think it'll cause any breaking change for our existing users if they followed our recommended setup from the Flux docs. Trying to understand the reasoning behind the previous implementation, it was implemented in fluxcd/image-reflector-controller#194, in 2021. It seems GCP announced workload identity federated in 2021 https://cloud.google.com/blog/products/identity-security/enable-keyless-access-to-gcp-with-workload-identity-federation. It's possible that the initial implementation was based on some old sample code from that year. The auth client code changes look good to me, except for the proxy part, but I'll take some time to do some more testing and research around this before a full review. Thanks for working on this. |
I would like to clarify about my last comment, it felt like I may have mislead in what I looked at and tested. In the dev meeting today, @matheuscscp brought the concerns regarding the proxy configuration and we had a really long discussion about how to keep the proxy support. I believe @matheuscscp will post about it separately in detail. But I'd like to clarify that initially I was more concerned about the mention of breaking change and only tested that the auth continues to work for our existing use cases. I didn't review all the code, just the token exchange flow that the google library does, hence I said that I'd like to test and research more of this before a full review. |
So, yes, like @darkowlzz mentioned in the previous comment, this PR is currently removing support for proxy configuration when talking to the internal metadata server of GCE/GKE. The On one hand, I confess that I'd normally be very tempted to accept the current solution of the PR as is. I appreciate very much the simplicity of just calling a single function from the Google library. This has many benefits, including the ease of maintenance, and I believe it's also what most projects would do. On the other hand, the function I think a good compromise would be to copy, modify and maintain the function As the first line of the documentation says, the function The function I'd like to note that some of the JSON configurations may still require talking to external HTTP endpoints. Full support for per-object proxy configuration would mean supporting those HTTP calls as well. Workload Identity Federation is probably one of those, as the OIDC ServiceAccount Token issued by Kubernetes must still be exchanged for a Google Access Token, which is done by talking to a Google HTTP API. It would be harder to support proxy for that, as it would require us to copy and adapt a lot more code from the Google library. But, as I mentioned previously, I'd be fine introducing support for new use cases without removing proxy support for the code path that we already have. And we probably need an issue to track proxy support for the remaining code paths. |
Hmmmm, that seems like a lot of code to copy and maintain that's currently just handled by Google. My initial reaction is it seems like if the requirement is to maintain per-object proxy support for requesting metadata-service tokens, it seems like it would be simplest to just split logic on the presence of a I may be missing some context here though, I know that Flux supports auth strategies defined at the object level, but I'm not actually aware of when that would be used. From my experience I would expect:
I understand global proxy requirements (something-something network complexities), but in order to need per-object proxy support I think that would require consumers to be effectively shoving their own identities into Flux for auth by way of proxies, but as it stands today the metadata URL is hardcoded in Flux, so I just don't know how that would even function to produce different auth resolutions. From the base issue here it looks like maybe this is focused on consistency across providers and the inherent per-object auth model that Flux uses, but I don't know if I'm missing use cases from the community. I'd be interested if there's context you could point me to (or raise) that I'm missing. I don't have a strong opinion on implementation details, but I think if the consensus is to head down the route of copying chunks of the Google auth providers and working them in I may ask if one of you could pick up the remainder of the PR, as I just hadn't earmarked that much more time for this work. |
This is related to #311 . We use k8s-based OIDC for all of our GCP auth, which allows for us to configure workloads in clusters in a simple, templated way to get access to cloud resources without ever having to exchange secret materials. This has some fairly straightforward security benefits, in addition to removing operational concerns of secret distribution, rotation, invalidation, etc...
The updated code piggybacks on the
DefaultTokenSource
method from the google auth package in order to support all standard methods of obtaining an auth token, in addition to the previously supported method requiring the well-known metadata endpoint to be available (only applicable in GCE and GKE clusters with Workload Identity enabled).Note - This technically constitutes a breaking change for auth behavior, but it should be very uncommon for it to negatively impact users as they would have had to set up Flux with some form of ambient auth, and leave it in place even though it's not taking effect and they're falling back to metadata endpoint auth. If, for example, someone has a flux workload with
GOOGLE_APPLICATION_CREDENTIALS
configured, with the existing code that would be ignored, but with this change in place their auth behavior would change. I'm happy to write up full details on what scenarios would be impacted by this for release notes, where would be best for me to put that?Edit:
I just wanted to note that I was able to create a build of source-controller based off of this
pkg/oci
revision and it appears to work fine. I'm not seeing any issues introduced, and I was able to pull a GCP Artifact Registry OCI repo with k8s-based OIDC from an EKS cluster.