Ensure Karpenter Creates New NodeClaim Before Deleting Existing Node for Consolidation #1879

EdwinPhilip · 2024-12-13T10:00:06Z

Description

What problem are you trying to solve?
Currently, when consolidating underutilized or empty nodes, Karpenter does not create new NodeClaims before initiating the deletion or draining of existing nodes. This can lead to a temporary loss of capacity, causing disruptions to workloads, especially when there are no spare nodes available in the cluster.

This behavior poses challenges for workloads that require high availability or have strict scheduling constraints, as pods may remain in a pending state until new nodes are provisioned.

Proposed Behavior
When consolidating nodes, Karpenter should:

Preemptively create a new NodeClaim to ensure sufficient capacity is available.
Wait for the new node to reach the Ready state before initiating the deletion or draining of the existing node.
Provide a configurable option (e.g., waitForReadyBeforeConsolidation) to enable or disable this behavior, allowing users to choose between faster consolidation and safer capacity transitions.

Use Case

Workload Impact: High-availability applications or latency-sensitive workloads can experience disruptions during consolidation if nodes are deleted before replacements are ready.
Cluster Stability: In clusters with minimal buffer capacity, this behavior can lead to pending pods and degraded service availability.
Spot Instances: Spot interruptions are unpredictable and draining nodes before having a new node available will cause pods to be unavailable, potentially causing application degradation

Steps to Reproduce

Deploy a Karpenter-managed cluster with minimal buffer capacity.
Trigger a consolidation event where underutilized nodes are identified for deletion.
Observe that Karpenter deletes nodes before ensuring new nodes are provisioned and ready.

Expected Behavior

New nodes should be provisioned and become Ready before existing nodes are deleted during consolidation.
Workloads should remain unaffected during node consolidation events.

Potential Solutions

Introduce a configuration option like waitForReadyBeforeConsolidation: true at NodePool level to enable this behavior.

How important is this feature to you?
This feature would enhance cluster stability and workload availability, making Karpenter a more robust solution for production environments.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-12-13T10:00:15Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

EdwinPhilip added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 13, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Karpenter Creates New NodeClaim Before Deleting Existing Node for Consolidation #1879

Ensure Karpenter Creates New NodeClaim Before Deleting Existing Node for Consolidation #1879

EdwinPhilip commented Dec 13, 2024 •

edited

Loading

k8s-ci-robot commented Dec 13, 2024

Ensure Karpenter Creates New NodeClaim Before Deleting Existing Node for Consolidation #1879

Ensure Karpenter Creates New NodeClaim Before Deleting Existing Node for Consolidation #1879

Comments

EdwinPhilip commented Dec 13, 2024 • edited Loading

Description

k8s-ci-robot commented Dec 13, 2024

EdwinPhilip commented Dec 13, 2024 •

edited

Loading