GPU support for Kubernetes workloads¶
MKE provides graphics processing unit (GPU) support for Kubernetes workloads that run on Linux worker nodes. This topic describes how to configure your system to use and deploy NVIDIA GPUs.
Install the GPU drivers¶
GPU support requires that you install GPU drivers, which you can do either prior to or after installing MKE. Installing the GPU drivers installs the NVIDIA driver using a runfile on your Linux host.
Note
This procedure describes how to manually install the GPU drivers. However, Mirantis recommends that you use a pre-existing automation system to automate the installation and patching of the drivers, along with the kernel and other host software.
Enable the NVIDIA GPU device plugin by setting
nvidia_device_plugin
totrue
in the MKE configuration file.Verify that your system supports NVIDIA GPU:
lspci | grep -i nvidia
Verify that your GPU is a supported NVIDIA GPU Product.
Install all the dependencies listed in the NVIDIA Minimum Requirements.
Verify that your system is up to date and that you are running the latest kernel version.
Install the following packages:
Ubuntu:
sudo apt-get install -y gcc make curl linux-headers-$(uname -r)
RHEL:
sudo yum install -y kernel-devel-$(uname -r) \ kernel-headers-$(uname -r) gcc make curl elfutils-libelf-devel
Verify that the
i2c_core
andipmi_msghandler
kernel modules are loaded:sudo modprobe -a i2c_core ipmi_msghandler
Persist the change across reboots:
echo -e "i2c_core\nipmi_msghandler" | sudo tee /etc/modules-load.d/nvidia.conf
Review the NVIDIA libraries, which are located under the following directory on the host:
NVIDIA_OPENGL_PREFIX=/opt/kubernetes/nvidia sudo mkdir -p $NVIDIA_OPENGL_PREFIX/lib echo "${NVIDIA_OPENGL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/nvidia.conf sudo ldconfig
Install the NVIDIA GPU driver:
NVIDIA_DRIVER_VERSION=<version-number> curl -LSf https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run -o nvidia.run sudo sh nvidia.run --opengl-prefix="${NVIDIA_OPENGL_PREFIX}"
Set
<version-number>
to the NVIDIA driver version of your choice.Load the NVIDIA Unified Memory kernel module and create device files for the module on startup:
sudo tee /etc/systemd/system/nvidia-modprobe.service << END [Unit] Description=NVIDIA modprobe [Service] Type=oneshot RemainAfterExit=yes ExecStart=/usr/bin/nvidia-modprobe -c0 -u [Install] WantedBy=multi-user.target END sudo systemctl enable nvidia-modprobe sudo systemctl start nvidia-modprobe
Enable the NVIDIA persistence daemon to initialize GPUs and keep them initialized:
sudo tee /etc/systemd/system/nvidia-persistenced.service << END [Unit] Description=NVIDIA Persistence Daemon Wants=syslog.target [Service] Type=forking PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid Restart=always ExecStart=/usr/bin/nvidia-persistenced --verbose ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced [Install] WantedBy=multi-user.target END sudo systemctl enable nvidia-persistenced sudo systemctl start nvidia-persistenced
Test the device plugin and review its description:
kubectl describe node <node-name>
Example output:
Capacity: cpu: 8 ephemeral-storage: 40593612Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 62872884Ki nvidia.com/gpu: 1 pods: 110 Allocatable: cpu: 7750m ephemeral-storage: 36399308Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 60775732Ki nvidia.com/gpu: 1 pods: 110 ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 500m (6%) 200m (2%) memory 150Mi (0%) 440Mi (0%) nvidia.com/gpu 0 0
Schedule GPU workloads¶
The following example describes how to deploy a simple workload that reports detected NVIDIA CUDA devices.
Create a practice Deployment that requests
nvidia.com/gpu
in thelimits
section. The Pod will be scheduled on any available GPUs in your system.kubectl apply -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: run: gpu-test name: gpu-test spec: replicas: 1 selector: matchLabels: run: gpu-test template: metadata: labels: run: gpu-test spec: containers: - command: - sh - -c - "deviceQuery && sleep infinity" image: kshatrix/gpu-example:cuda-10.2 name: gpu-test resources: limits: nvidia.com/gpu: "1" EOF
Verify that it is in the
Running
state:
kubectl get pods | grep "gpu-test"NAME READY STATUS RESTARTS AGE gpu-test-747d746885-hpv74 1/1 Running 0 14m
Review the logs. The presence of
Result = PASS
indicates a successful deployment:kubectl logs <name of the pod>
Example output:
deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla V100-SXM2-16GB" ... deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1 Result = PASS
Determine the overall GPU capacity of your cluster by inspecting its nodes:
echo $(kubectl get nodes -l com.docker.ucp.gpu.nvidia="true" \ -o jsonpath="0{range .items[*]}+{.status.allocatable['nvidia\.com/gpu']}{end}") | bc
Set the proper replica number to acquire all available GPUs:
kubectl scale deployment/gpu-test --replicas N
Verify that all of the replicas are scheduled:
kubectl get pods | grep "gpu-test"
Example output:
NAME READY STATUS RESTARTS AGE gpu-test-747d746885-hpv74 1/1 Running 0 12m gpu-test-747d746885-swrrx 1/1 Running 0 11m
Remove the Deployment and corresponding Pods:
kubectl delete deployment gpu-test
Troubleshooting¶
If you attempt to add an additional replica to the previous example Deployment,
it will result in a FailedScheduling
error with the
Insufficient nvidia.com/gpu
message.
Add an additional replica:
kubectl scale deployment/gpu-test --replicas N+1 kubectl get pods | grep "gpu-test"
Example output:
NAME READY STATUS RESTARTS AGE gpu-test-747d746885-hpv74 1/1 Running 0 14m gpu-test-747d746885-swrrx 1/1 Running 0 13m gpu-test-747d746885-zgwfh 0/1 Pending 0 3m26s
Review the status of the failed Deployment:
kubectl describe po gpu-test-747d746885-zgwfh
Example output:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
See also