Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler: LastOrdinal based on replicas instead of FreeCap #8388

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

pierDipi
Copy link
Member

When scaling down and compacting, basing the last ordinal on the free capacity structure leads to have a lastOrdinal off by one since FreeCap might contain the free capacity for unschedulable pods.

We will have to continue including unschduelable pods in FreeCap because it might happen that a pod becomes unscheduleble for external reasons like node gets shutdown for pods with lower ordinals and the pod need to be rescheduled and during that time period we want to consider those when compacting; once all vpods that were on that pod that is gone get rescheduled, FreeCap will only include scheduleable pods.

Fixes #

Proposed Changes

  • Scheduler: LastOrdinal based on replicas instead of FreeCap

Pre-review Checklist

  • At least 80% unit test coverage
  • E2E tests for any new behavior
  • Docs PR for any user-facing impact
  • Spec PR for any new API feature
  • Conformance test for any change to the spec

Release Note


Docs

@knative-prow knative-prow bot requested review from Cali0707 and Leo6Leo December 16, 2024 14:40
@knative-prow knative-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 16, 2024
@pierDipi pierDipi force-pushed the scheduler-last-ordinal-wrong branch from 1aeabdb to 3f807e9 Compare December 16, 2024 14:43
@pierDipi
Copy link
Member Author

/cherry-pick release-1.15

@pierDipi
Copy link
Member Author

/cherry-pick release-1.16

@knative-prow-robot
Copy link
Contributor

@pierDipi: once the present PR merges, I will cherry-pick it on top of release-1.15 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow-robot
Copy link
Contributor

@pierDipi: once the present PR merges, I will cherry-pick it on top of release-1.16 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pierDipi
Copy link
Member Author

/cherry-pick release-1.15

@knative-prow-robot
Copy link
Contributor

@pierDipi: once the present PR merges, I will cherry-pick it on top of release-1.15 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -219,23 +217,19 @@ func pendingFromVPod(vpod scheduler.VPod) int32 {
return int32(math.Max(float64(0), float64(expected-scheduled)))
}

func (s *stateBuilder) updateFreeCapacity(logger *zap.SugaredLogger, free []int32, last int32, podName string, vreplicas int32) ([]int32, int32) {
func (s *stateBuilder) updateFreeCapacity(logger *zap.SugaredLogger, free []int32, podName string, vreplicas int32) []int32 {
Copy link
Member

@matzew matzew Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we name the second arg freeCap ?

@@ -280,12 +282,14 @@ func (a *autoscaler) compact(s *st.State) error {
return err
}

lastOrdinal := s.Replicas - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the replica-1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changed slightly now

@pierDipi pierDipi force-pushed the scheduler-last-ordinal-wrong branch from 3f807e9 to 7bad45b Compare December 17, 2024 11:26
@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 17, 2024
When scaling down and compacting, basing the last ordinal on the
free capacity structure leads to have a lastOrdinal off by one since
`FreeCap` might contain the free capacity for unschedulable pods.

We will have to continue including unschduelable pods in FreeCap
because it might happen that a pod becomes unscheduleble for external
reasons like node gets shutdown for pods with lower ordinals
and the pod need to be rescheduled and during that time period
we want to consider those when compacting; once all vpods that
were on that pod that is gone get rescheduled, FreeCap will only
include scheduleable pods.

Signed-off-by: Pierangelo Di Pilato <[email protected]>
@pierDipi pierDipi force-pushed the scheduler-last-ordinal-wrong branch from 7bad45b to 4fb34df Compare December 17, 2024 11:27
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 64.19%. Comparing base (4087c3a) to head (4fb34df).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/scheduler/state/state.go 90.00% 1 Missing and 1 partial ⚠️
pkg/scheduler/statefulset/autoscaler.go 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8388      +/-   ##
==========================================
- Coverage   64.22%   64.19%   -0.03%     
==========================================
  Files         388      388              
  Lines       23324    23310      -14     
==========================================
- Hits        14979    14965      -14     
  Misses       7550     7550              
  Partials      795      795              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pierDipi added a commit to pierDipi/eventing-kafka-broker that referenced this pull request Dec 17, 2024
Signed-off-by: Pierangelo Di Pilato <[email protected]>
pierDipi added a commit to pierDipi/eventing-kafka-broker that referenced this pull request Dec 17, 2024
Signed-off-by: Pierangelo Di Pilato <[email protected]>
@pierDipi
Copy link
Member Author

/test upgrade-tests

Copy link
Member

@matzew matzew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
Copy link

knative-prow bot commented Dec 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matzew, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot merged commit 4dbc2ba into knative:main Dec 19, 2024
34 of 36 checks passed
@knative-prow-robot
Copy link
Contributor

@pierDipi: new pull request created: #8393

In response to this:

/cherry-pick release-1.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow-robot
Copy link
Contributor

@pierDipi: new pull request created: #8394

In response to this:

/cherry-pick release-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants