Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Add designs/multi-cluster.md #2746
base: main
Are you sure you want to change the base?
📖 Add designs/multi-cluster.md #2746
Changes from all commits
eb207cb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the cluster-api example here, is the intention that controllers will be able to reconcile CRDs in clusters that they know about that may only exist in a subset of clusters (e.g. Machine objects in the management cluster but not in the workload cluster) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think that has to be possible. Otherwise we need all resources that we watch in all clusters
(especially good point because today a controller crashes if a resource doesn't exist)
EDIT: Further down:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Question is whether one would rather group them in managers such that every manager has a uniform set of clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my updated PR #2726. You can now opt into provider and/or the default cluster per controller via options:
There is no logic yet for a controller to decide whether to engage with a provider cluster or not. Now it's with all of them. If the setup is more diverse, we might want such a functionality, e.g. some kind of pre-check: ctrl.WantsToEngage(ctx, cluster) bool`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still understanding the changes in #2726, but i think what you are saying here makes sense to me and would solve the issue.
+1, i think we definitely need some way for the client user to specify when it should check a specific cluster for a resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow think it should be the author's and managers responsibility (for now) to group them into groups which are working with the pattern. At this point, we don't know what we don't know. Once this is released, we can gather some feedback on edge cases and take it from there. I suspect the majority of use cases will be still single cluster reconcile loops.
Maybe document this edge case and mark this feature overall as experimental? This way we not committing to full production level stability, and allow to gather more feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return
[]string
here rather than[]Cluster
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Would be good for consistency with the Get func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a misunderstanding of the interface. The getter is actually the constructor. The life-cycle of the returned clusters is owned by the manager (they are added as runnables). Hence, the
List
returns names, not clusters. We should rather renameGet
toCreate
orConnect
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to think about if and how this is doable, but ideally the "thing that comes and goes" wouldn't be typed to
cluster.Cluster
but can be anything, so this mechanism can also be used if folks have sources that are not kube watchesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be mostly about a more generic name? (can't think of much that would work, maybe something like scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing the controller type directly and requiring all its dependencies to known how to deepcopy themselves, how about having something like a
controllerconstructor
(name tbd) in between that is filled with a[]watchConstructor{source func(Cluster) source.Source, handler func(Cluster) handler.Handler, predicate func(cluster) []predicate.Predicate}
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require more invasive changes to our public API (the Controller interface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you can call
Watch
on an existing controller. The idea is to not let theController
or its dependencies have any knowledge about this but instead have a thing on top of theController
that is configured with constructors that take acluster.Cluster
and return a source/predicate/handler and then uses those to callWatch
when a new cluster appears.When one disappears, it would cancel the
context
on the Source.The idea really is the opposite, I do not want the
Controller
to know how to extend itself like this, IMHO this is a higher-level abstraction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare #2726 after latest push. I have implemented @alvaroaleman's idea via a
MultiClusterController
wrapper implementingcluster.AwareRunnable
and just callingWatch
on the actual controller. All the deepcopy'ing is gone 🎉 Much nicer IMO. @alvaroaleman great intuition!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eventhandlers are stateless, why do we need the deepcopy for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the propotype. I think this is because EventHandler then would store the Cluster (it is using that info to set the ClusterName field in the request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gone now in #2726.
Will update the design here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the BYO request/eventhandler changes in #3019, I brought this back after remembering it was mentioned in the proposal. The previous version of my prototype had a weird second layer of event handler that was wrapping the actual event wrapper and was using the event object to communicate the cluster name in. That felt all kinds of weird.
Because we now have BYO EventHandlers, it's possible that they are not entirely stateless (as @sbueringer pointed out, some event handlers might have to store the cluster name in absence of any information on the event object itself). So I think this approach is the most clean, to be honest. It's entirely optional in #3019, existing EventHandlers don't need to be changed.