Skip to content

Commit

Permalink
Review fixes
Browse files Browse the repository at this point in the history
Signed-off-by: Ukri Niemimuukko <[email protected]>
  • Loading branch information
uniemimu authored and togashidm committed Jan 24, 2022
1 parent 6dede47 commit 8529a6b
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 19 deletions.
2 changes: 1 addition & 1 deletion gpu-aware-scheduling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ You should follow extender configuration instructions from the
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.

#### Deploy GAS
GPU Aware Scheduling uses go modules. It requires Go 1.13+ with modules enabled in order to build. GAS has been tested with Kubernetes 1.15+.
GPU Aware Scheduling uses go modules. It requires Go 1.16 with modules enabled in order to build. GAS has been tested with Kubernetes 1.22.
A yaml file for GAS is contained in the deploy folder along with its service and RBAC roles and permissions.

**Note:** If run without the unsafe flag a secret called extender-secret will need to be created with the cert and key for the TLS endpoint.
Expand Down
32 changes: 15 additions & 17 deletions gpu-aware-scheduling/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,32 +71,30 @@ GAS supports certain node labels as a means to allow telemetry based GPU selecti
descheduling of PODs using a certain GPU. You can create node labels with the
[Telemetry Aware Scheduling](../../telemetry-aware-scheduling/README.md) labeling strategy,
which puts them in its own namespace. In practice the supported labels need to be in the
`telemetry.aware.scheduling.POLICYNAME/` namespace, where the POLICYNAME may be anything.
`telemetry.aware.scheduling.POLICYNAME/`[^1] namespace.

The node label `gas-deschedule-pods-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
which corresponds to the gpu names under /dev/dri, will result in GAS labeling the PODs which
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. You may then
use with a kubernetes descheduler to pick the pods for descheduling. So TAS labels the node, and
based on the node label GAS finds and labels the PODs. Descheduler can be configured to
deschedule the pods based on pod labels.
The node label `gas-deschedule-pods-GPUNAME`[^2] will result in GAS labeling the PODs which
use the named GPU with the `gpu.aware.scheduling/deschedule-pod=gpu` label. So TAS labels the node,
and based on the node label GAS finds and labels the PODs. You may then use a kubernetes descheduler
to pick the pods for descheduling via their labels.

The node label `gas-disable-GPUNAME` where the GPUNAME can be e.g. card0, card1, card2... which
corresponds to the gpu names under /dev/dri, will result in GAS stopping the use of the named
GPU for new allocations.
The node label `gas-disable-GPUNAME`[^2] will result in GAS stopping the use of the named GPU for new
allocations.

The node label `gas-prefer-gpu=GPUNAME` where the GPUNAME can be e.g. card0, card1, card2...
which corresponds to the gpu names under /dev/dri, will result in GAS trying to use the named
The node label `gas-prefer-gpu=GPUNAME`[^2] will result in GAS trying to use the named
GPU for new allocations before other GPUs of the same node.

Note that the value of the labels starting with gas-deschedule-pods-GPUNAME and
gas-disable-GPUNAME doesn't matter. You may use e.g. "true" as the value. The only exception to
Note that the value of the labels starting with `gas-deschedule-pods-GPUNAME`[^2] and
`gas-disable-GPUNAME`[^2] doesn't matter. You may use e.g. "true" as the value. The only exception to
the rule is `PCI_GROUP` which has a special meaning, explained separately. Example:
`gas-disable-card0=PCI_GROUP`.

[^1]: POLICYNAME is defined by the name of the TASPolicy. It can vary.
[^2]: GPUNAME can be e.g. card0, card1, card2… which corresponds to the gpu names under `/dev/dri`.

### PCI Groups

If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP` where the GPUNAME can be e.g. card0,
card1, card2... which corresponds to the gpu names under /dev/dri, the disabling will impact a
If GAS finds a node label `gas-disable-GPUNAME=PCI_GROUP`[^2] the disabling will impact a
group of GPUs which is defined in the node label `gpu.intel.com/pci-groups`. The syntax of the
pci group node label is easiest to explain with an example: `gpu.intel.com/pci-groups=0.1_2.3.4`
would indicate there are two pci-groups in the node separated with an underscore, in which card0
Expand All @@ -105,7 +103,7 @@ find the node label `gas-disable-card3=PCI_GROUP` in a node with the previous ex
label, GAS would stop using card2, card3 and card4 for new allocations, as card3 belongs in that
group.

`gas-deschedule-pods-GPUNAME` supports the PCI-GROUP value similarly, the whole group in which
`gas-deschedule-pods-GPUNAME`[^2] supports the PCI_GROUP value similarly, the whole group in which
the named gpu belongs, will end up descheduled.

The PCI group feature allows for e.g. having a telemetry action to operate on all GPUs which
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ func (c *Cache) checkPodResourceAdjustment(containerRequests []resourceMap,
}

// This must be called with rwmutex locked
// set add=true to add, false to remove resources.
// set adj=true to add, false to remove resources.
func (c *Cache) adjustPodResources(pod *v1.Pod, adj bool, annotation, nodeName string) error {
// get slice of resource maps, one map per container
containerRequests := containerRequests(pod)
Expand Down

0 comments on commit 8529a6b

Please sign in to comment.