KIT Guest Clusters ETCD pods unable to become ready when using Karpenter v0.11.0+ #241

njtran · 2022-06-30T15:43:54Z

When using Karpenter version v0.12.1, using a Provisioner that has a small ttlSecondsAfterEmpty can result in removing the node that an ETCD replica is scheduled to. The associated PVC for the ETCD pod then never binds to the volume.

The initial thought is since Karpenter does not pre-bind to pods anymore after v0.11.0, this may introduce some undesired behavior for the EBS Volumes on the ETCD instances.

The nodes for the PVCs for ETCD 1 and 2 were never deleted. Even though Karpenter brought up a replacement node for ETCD 0, the pod and PVC never binded

NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
etcd-data-kit-guest-cluster-etcd-0   Pending                                                                        kit-gp3        23m
etcd-data-kit-guest-cluster-etcd-1   Bound     pvc-f993bc10-327d-4414-a3b2-acc95d956ab0   40Gi       RWO            kit-gp3        23m
etcd-data-kit-guest-cluster-etcd-2   Bound     pvc-f781eb15-4bfb-4395-b2c1-52b53c54a63a   40Gi       RWO            kit-gp3        23m

➜  k get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                             STORAGECLASS   REASON   AGE
pvc-f781eb15-4bfb-4395-b2c1-52b53c54a63a   40Gi       RWO            Delete           Bound    tekton-tests/etcd-data-kit-guest-cluster-etcd-2   kit-gp3                 22m
pvc-f993bc10-327d-4414-a3b2-acc95d956ab0   40Gi       RWO            Delete           Bound    tekton-tests/etcd-data-kit-guest-cluster-etcd-1   kit-gp3                 22m

This is the error message when describing the ETCD pod.

  Warning  FailedScheduling  20m                 default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: failed to get node "ip-192-168-86-66.us-west-2.compute.internal": node "ip-192-168-86-66.us-west-2.compute.internal" not found
  Warning  FailedScheduling  16m (x2 over 18m)   default-scheduler  (combined from similar events): 0/5 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 2 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  15m (x6 over 19m)   default-scheduler  0/5 nodes are available: 1 Insufficient cpu, 1 Insufficient memory, 1 Too many pods, 2 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  15m (x6 over 19m)   default-scheduler  0/5 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 2 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  11m (x5 over 19m)   default-scheduler  0/4 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  97s (x20 over 18m)  default-scheduler  0/5 nodes are available: 1 node(s) didn't find available persistent volumes to bind, 2 node(s) didn't have free ports for the requested pod ports, 2 node(s) didn't match Pod's node affinity/selector.

The text was updated successfully, but these errors were encountered:

tzneal · 2022-06-30T18:18:38Z

From what I've seen, this is caused by the PVC having a selected node annotation for a node that was deleted.

running PreBind plugin "VolumeBinding": binding volumes: failed to get node "ip-192-168-86-66.us-we
st-2.compute.internal": node "ip-192-168-86-66.us-west-2.compute.internal" not found

prateekgogia · 2022-06-30T19:32:16Z

So if the first initial node is not deleted by Karpenter does it work?

njtran added the bug Something isn't working label Jun 30, 2022

njtran mentioned this issue Jun 30, 2022

Test: Implemented ginkgo test against local kubeconfig aws/karpenter-provider-aws#2015

Merged

3 tasks

prateekgogia added the 0.2 alpha label Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KIT Guest Clusters ETCD pods unable to become ready when using Karpenter v0.11.0+ #241

KIT Guest Clusters ETCD pods unable to become ready when using Karpenter v0.11.0+ #241

njtran commented Jun 30, 2022

tzneal commented Jun 30, 2022

prateekgogia commented Jun 30, 2022

KIT Guest Clusters ETCD pods unable to become ready when using Karpenter v0.11.0+ #241

KIT Guest Clusters ETCD pods unable to become ready when using Karpenter v0.11.0+ #241

Comments

njtran commented Jun 30, 2022

tzneal commented Jun 30, 2022

prateekgogia commented Jun 30, 2022