This reposity holds instructions for configuring GPU acclerated Windows nodes in AKS clusters
-
Create an AKS cluster following https://docs.microsoft.com/en-us/azure/aks/learn/quick-windows-container-deploy-cli#create-an-aks-cluster
-
Add a Windows Server node pool following https://learn.microsoft.com/en-us/azure/aks/learn/quick-windows-container-deploy-cli#add-a-windows-node-pool
Notes:
-
kubernetes version must be 1.23 or later
-
node-vm-size should be a
NV*
series which support DirectX acceleration. [https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu]
-
-
Add the 'NVDIA GPU Driver Extension' to the VirtualMachineScaleSet created for the new node pool added above
https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-windows
https://docs.microsoft.com/en-us/cli/azure/vmss/extension?view=azure-cli-latest
-
Ensure VirtualMachineScaleSet instances are using the most up-to-date model
-
Wait for VM extensions to run
-
Deploy `k8s-directx-device-plugin.yaml to your cluster
kubectl apply -f https://raw.githubusercontent.com/marosset/aks-windows-gpu-acceleration/main/k8s-directx-device-plugin/k8s-directx-device-plugin.yaml
-
run
kubectl describe node {gpu-node}
and look forCapacity: cpu: 6 ephemeral-storage: 133703676Ki memory: 58719796Ki microsoft.com/directx: 1 pods: 30 Allocatable: cpu: 5840m ephemeral-storage: 133703676Ki memory: 51379764Ki microsoft.com/directx: 1 pods: 30
-
In your workload deployment files add requests for 'microsoft.com/directx` resoruces
... spec: containers: ... resources: requests: microsoft.com/directx: "1"
-
Ensure
daemonSet
pods are runningCheck status with
kubectl get pods -A
-
Check device plugin pod logs
kubectl logs -n kube-system {pod-name}
You should see something like
kubectl logs -n kube-system directx-device-plugin-5t7dc ERROR: logging before flag.Parse: I0505 19:10:39.046785 5076 main.go:98] GPU NVIDIA Tesla M60 id: PCI\VEN_10DE&DEV_13F2&SUBSYS_115E10DE&REV_A1\6&2B78CA89&0&0 ERROR: logging before flag.Parse: W0505 19:10:39.096589 5076 main.go:90] 'Microsoft Hyper-V Video' doesn't match 'nvidia', ignoring this gpu ERROR: logging before flag.Parse: I0505 19:10:39.096589 5076 main.go:101] pluginSocksDir: /var/lib/kubelet/device-plugins/ ERROR: logging before flag.Parse: I0505 19:10:39.096589 5076 main.go:103] socketPath: /var/lib/kubelet/device-plugins/directx.sock 2022/05/05 19:10:40 Starting to serve on /var/lib/kubelet/device-plugins/directx.sock 2022/05/05 19:10:40 Deprecation file not found. Invoke registration 2022/05/05 19:10:40 ListAndWatch