Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Cluster Example / Pattern #2755

Closed
Smithx10 opened this issue Apr 3, 2024 · 13 comments
Closed

Multi Cluster Example / Pattern #2755

Smithx10 opened this issue Apr 3, 2024 · 13 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Smithx10
Copy link

Smithx10 commented Apr 3, 2024

I'm starting to write a controller that will need to span clusters, and I am not sure if what was documented 4 years ago is still the way to move forward. https://github.com/kubernetes-sigs/controller-runtime/blob/main/designs/move-cluster-specific-code-out-of-manager.md

I've read through: #745

Is there an example / pattern people should use when writing a controller that spans more than 1 cluster?

@sbueringer
Copy link
Member

sbueringer commented Apr 4, 2024

This one could be interesting for you: #2746 (although not implemented yet)

@Danil-Grigorev
Copy link
Member

Yes, this pattern still applies and is pretty simple to use - here is an example for2 cluster setup. You will only need to correctly use both clients and be conscious about that. However, at some point I’ve seen problems with integration tests using envtest, while deploying 2 clusters in parallel (slack message). Still needed to be tested on latter versions, maybe it is no longer an issue.

@Smithx10
Copy link
Author

Smithx10 commented Apr 6, 2024

I am trying to implement the following: But this old code doesn't match up anymore...

Not sure how I am supposed to get the cache of other clusters so I can mirror CRDs created in other clusters etc.

func NewSecretMirrorReconciler(mgr manager.Manager, mirrorCluster cluster.Cluster) error {
	return ctrl.NewControllerManagedBy(mgr).
		// Watch Secrets in the reference cluster
		For(&corev1.Secret{}).
		// Watch Secrets in the mirror cluster
		Watches(
			source.NewKindWithCache(&corev1.Secret{}, mirrorCluster.GetCache()),
			&handler.EnqueueRequestForObject{},
		).
		Complete(&secretMirrorReconciler{
			referenceClusterClient: mgr.GetClient(),
			mirrorClusterClient:    mirrorCluster.GetClient(),
		})
	}
}

I imagine something changed that I can't find .

@Smithx10
Copy link
Author

Smithx10 commented Apr 6, 2024

Looks like this behavior changed in d6a053f .

Looked at https://github.com/k8ssandra/k8ssandra-operator/blob/main/controllers/control/k8ssandratask_controller.go#L373 but it's using an older version of controller-runtime.

This PR is what removed the functionality that is being used by k8ssandra https://github.com/kubernetes-sigs/controller-runtime/pull/2120/files#diff-54e8061fb2925948c62f36b19e08785ec1fb90b349cfe48c73239f4cca8c6ef5L71

Reading through it, it's not obvious to me how to do it, possible skill issue :P

I'm not sure I see the correct way to configure watches in other clusters, a pointer / example would be much appreciated.

I guess at this point I'll just state what problem I'm trying to solve:

Id like to keep a set of CRDs sync'd between many clusters. How should 1 of the clusters update itself if the CRD is created or updated in another cluster?

I'd like to have Cluster A, Watch for updates in B C
I'd like to have Cluster B, Watch for updates in A C
I'd like to have Cluster C, Watch for updates in A B

Hopefully this will result in a Fault Tolerance when querying the value of the CRD.

Does this sound like a sane approach? Any gotchas with doing this in K8S?

@Smithx10
Copy link
Author

Smithx10 commented Apr 8, 2024

Looks like after taking another look you can use "WatchesRawSource" to create a source and pop them on Mgr.

I was able to apply from the other cluster and see the Reconcile loop get invoked.

func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager, c1, c2 cluster.Cluster) error {

	c1src := source.Kind(c1.GetCache(), &webappv1.Guestbook{})
	c2src := source.Kind(c2.GetCache(), &webappv1.Guestbook{})

	return ctrl.NewControllerManagedBy(mgr).
		For(&webappv1.Guestbook{}).
		Watches(&webappv1.Guestbook{}, &handler.EnqueueRequestForObject{}).
		WatchesRawSource(c1src, &handler.EnqueueRequestForObject{}).
		WatchesRawSource(c2src, &handler.EnqueueRequestForObject{}).
		Complete(r)
}

@Smithx10
Copy link
Author

Smithx10 commented Apr 8, 2024

While going through the design / testing and implementation I've come across the following to solve the leader election for multi-cluster:

If we are running a controller that is going to act upon multiple clusters, we can elect a leader in 1 of the clusters to take actions. The default settings in the manager package of controller-runtime doesn't support this but does support providing our own logic for handling it via "LeaderElectionResourceLockInterface resourcelock.Interface"

https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/manager/manager.go#L202

The following interface exposes which controller will be the leader.

type Interface interface {
      // Get returns the LeaderElectionRecord
      Get(ctx context.Context) (*LeaderElectionRecord, []byte, error)

      // Create attempts to create a LeaderElectionRecord
      Create(ctx context.Context, ler LeaderElectionRecord) error

      // Update will update and existing LeaderElectionRecord
      Update(ctx context.Context, ler LeaderElectionRecord) error

      // RecordEvent is used to record events
      RecordEvent(string)

      // Identity will return the locks Identity
      Identity() string

      // Describe is used to convert details on current resource lock
      // into a string
      Describe() string
}

@sbueringer
Copy link
Member

Probably too much for what you need. But in Cluster API we create additional caches/clients per cluster that we want to communicate with. Maybe there's something useful for you in this code: https://github.com/kubernetes-sigs/cluster-api/blob/main/controllers/remote/cluster_cache_tracker.go

@Smithx10
Copy link
Author

Smithx10 commented Apr 9, 2024

@sbueringer Thanks, some nice things in there. Do you by chance know if there is a simple way to not dump when one of the cluster caches time out? I saw you made a accessor which brought them in and out, is that required or is there anything in the manager than helps with this? My expectation is that cluster will come up eventually, removing it would require a config change in my scenario.

Will be experimenting with this more.

I was testing with this and firewalling off the k8s api. Still resulted in failure:

Looks like when set to 0 we default to 2 minutes. Need to investigate
https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/config/v1alpha1/types.go#L103

	// CacheSyncTimeout refers to the time limit set to wait for syncing caches.
	// Defaults to 2 minutes if not set.
	// +optional
	CacheSyncTimeout *time.Duration `json:"cacheSyncTimeout,omitempty"`
2024-04-09T19:06:57-04:00       ERROR   setup   problem running manager {"error": "failed to wait for guestbook caches to sync: timed out waiting for cache to be synced for Kind *v1.Guestbook"}
main.main
        /home/smith/projects/guestbook/cmd/main.go:233
runtime.main
        /usr/lib/go/src/runtime/proc.go:271
// SetupWithManager sets up the controller with the Manager.
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager, clusters []cluster.Cluster) error {
	rp := true
	cb := ctrl.NewControllerManagedBy(mgr).
		For(&webappv1.Guestbook{}).
		Watches(&webappv1.Guestbook{}, &handler.EnqueueRequestForObject{}).
		WithOptions(controller.Options{
			RecoverPanic: &rp,
		})

@sbueringer
Copy link
Member

Do you by chance know if there is a simple way to not dump when one of the cluster caches time out?

No I don't know. We create a separate cache and client per cluster we communicate with. We don't use the "Cluster" struct.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 7, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants