GPU support for Kubernetes workloads

MKE provides GPU support for Kubernetes workloads running on Linux worker nodes. This exercise walks you through setting up your system to use underlying GPU support, and through deploying GPU-targeted workloads.

Installing the GPU drivers

GPU drivers are required for setting up GPU support. The instalation of these drivers can occur either before or after the installation of MKE.

The GPU drivers installation procedure will install the NVIDIA driver by way of a runfile on your Linux host. Note that this procedure uses version 440.59, which is the latest available and verified version at the time of this writing.

Note

This procedure describes how to manually install these drivers, but it is recommended that you use a pre-existing automation system to automate installation and patching of the drivers along with the kernel and other host software.

  1. Ensure that your NVIDIA GPU is supported:

    lspci | grep -i nvidia
    
  2. Verify that your GPU is a supported NVIDIA GPU Product.

  3. Install dependencies.

  4. Verify that your system is up-to-date, and you are running the latest kernel.

  5. Install the following packages depending on your OS.

    • Ubuntu:

      sudo apt-get install -y gcc make curl linux-headers-$(uname -r)
      
    • RHEL:

      sudo yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make curl elfutils-libelf-devel
      
  6. Ensure that i2c_core and ipmi_msghandler kernel modules are loaded:

    sudo modprobe -a i2c_core ipmi_msghandler
    

    To persist the change across reboots:

    echo -e "i2c_core\nipmi_msghandler" | sudo tee /etc/modules-load.d/nvidia.conf
    

    All of the NVIDIA libraries are present under the specific directory on the host:

    NVIDIA_OPENGL_PREFIX=/opt/kubernetes/nvidia
    sudo mkdir -p $NVIDIA_OPENGL_PREFIX/lib
    echo "${NVIDIA_OPENGL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/nvidia.conf
    sudo ldconfig
    
  7. Run the installation:

    NVIDIA_DRIVER_VERSION=440.59
    curl -LSf https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run -o nvidia.run
    sudo sh nvidia.run --opengl-prefix="${NVIDIA_OPENGL_PREFIX}"
    
  8. Load the NVIDIA Unified Memory kernel module and create device files for the module on startup:

    sudo tee /etc/systemd/system/nvidia-modprobe.service << END
    [Unit]
    Description=NVIDIA modprobe
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/bin/nvidia-modprobe -c0 -u
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-modprobe
    sudo systemctl start nvidia-modprobe
    
  9. Enable the NVIDIA persistence daemon to initialize GPUs and keep them initialized:

    sudo tee /etc/systemd/system/nvidia-persistenced.service << END
    [Unit]
    Description=NVIDIA Persistence Daemon
    Wants=syslog.target
    
    [Service]
    Type=forking
    PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
    Restart=always
    ExecStart=/usr/bin/nvidia-persistenced --verbose
    ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-persistenced
    sudo systemctl start nvidia-persistenced
    

See https://docs.nvidia.com/deploy/driver-persistence/index.html for more information.

Test the device plugin

MKE includes a GPU device plugin to instrument your GPUs, which is necessary for GPU support. This section describes deploying nvidia.com/gpu.

kubectl describe node <node-name>

...
Capacity:
cpu:                8
ephemeral-storage:  40593612Ki
hugepages-1Gi:      0
hugepages-2Mi:      0
memory:             62872884Ki
nvidia.com/gpu:     1
pods:               110
Allocatable:
cpu:                7750m
ephemeral-storage:  36399308Ki
hugepages-1Gi:      0
hugepages-2Mi:      0
memory:             60775732Ki
nvidia.com/gpu:     1
pods:               110
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource        Requests    Limits
--------        --------    ------
cpu             500m (6%)   200m (2%)
memory          150Mi (0%)  440Mi (0%)
nvidia.com/gpu  0           0

Scheduling GPU workloads

To consume GPUs from your container, request nvidia.com/gpu in the limits section. The following example shows how to deploy a simple workload that reports detected NVIDIA CUDA devices.

  1. Create the example deployment:

    kubectl apply -f- <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      creationTimestamp: null
      labels:
        run: gpu-test
      name: gpu-test
    spec:
      replicas: 1
      selector:
        matchLabels:
          run: gpu-test
      template:
        metadata:
          labels:
            run: gpu-test
        spec:
          containers:
          - command:
            - sh
            - -c
            - "deviceQuery && sleep infinity"
            image: kshatrix/gpu-example:cuda-10.2
            name: gpu-test
            resources:
              limits:
                nvidia.com/gpu: "1"
    EOF
    
  2. If you have any available GPUs in your system, the pod will be scheduled on them. After some time, the Pod should be in the Running state:

    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          14m
    
  3. Check the logs and look for Result = PASS to verify successful completion:

    kubectl logs <name of the pod>
    
    deviceQuery Starting...
    
    CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "Tesla V100-SXM2-16GB"
    ...
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
    Result = PASS
    
  4. Determine the overall GPU capacity of your cluster by inspecting its nodes:

    echo $(kubectl get nodes -l com.docker.ucp.gpu.nvidia="true" -o jsonpath="0{range .items[*]}+{.status.allocatable['nvidia\.com/gpu']}{end}") | bc
    
  5. Set the proper replica number to acquire all available GPUs:

    kubectl scale deployment/gpu-test --replicas N
    
  1. Verify that all of the replicas are scheduled:

    kubectl get po
    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          12m
    gpu-test-747d746885-swrrx   1/1     Running   0          11m
    

If you attempt to add an additional replica, it should result in a FailedScheduling error with Insufficient nvidia.com/gpu message:

kubectl scale deployment/gpu-test --replicas N+1

kubectl get po
NAME                        READY   STATUS    RESTARTS   AGE
gpu-test-747d746885-hpv74   1/1     Running   0          14m
gpu-test-747d746885-swrrx   1/1     Running   0          13m
gpu-test-747d746885-zgwfh   0/1     Pending   0          3m26s

Run kubectl describe po gpu-test-747d746885-zgwfh to see the status of the failed deployment:

...
Events:
Type     Reason            Age        From               Message
----     ------            ----       ----               -------
Warning  FailedScheduling  <unknown>  default-scheduler  0/2 nodes are available: 2 Insufficient nvidia.com/gpu.

Remove the deployment and corresponding pods:

kubectl delete deployment gpu-test

See also

Kubernetes