GPU support for Kubernetes workloads

MKE provides graphics processing unit (GPU) support for Kubernetes workloads that run on Linux worker nodes. This topic describes how to configure your system to use and deploy NVIDIA GPUs.

Install the GPU drivers

GPU support requires that you install GPU drivers, which you can do either prior to or after installing MKE. Installing the GPU drivers installs the NVIDIA driver using a runfile on your Linux host.

Note

This procedure describes how to manually install the GPU drivers. However, Mirantis recommends that you use a pre-existing automation system to automate the installation and patching of the drivers, along with the kernel and other host software.

  1. Enable the NVIDIA GPU device plugin by setting nvidia_device_plugin to true in the MKE configuration file.

  2. Verify that your system supports NVIDIA GPU:

    lspci | grep -i nvidia
    
  3. Verify that your GPU is a supported NVIDIA GPU Product.

  4. Install all the dependencies listed in the NVIDIA Minimum Requirements.

  5. Verify that your system is up to date and that you are running the latest kernel version.

  6. Install the following packages:

    • Ubuntu:

      sudo apt-get install -y gcc make curl linux-headers-$(uname -r)
      
    • RHEL:

      sudo yum install -y kernel-devel-$(uname -r) \
      kernel-headers-$(uname -r) gcc make curl elfutils-libelf-devel
      
  7. Verify that the i2c_core and ipmi_msghandler kernel modules are loaded:

    sudo modprobe -a i2c_core ipmi_msghandler
    
  8. Persist the change across reboots:

    echo -e "i2c_core\nipmi_msghandler" | sudo tee /etc/modules-load.d/nvidia.conf
    
  9. Review the NVIDIA libraries, which are located under the following directory on the host:

    NVIDIA_OPENGL_PREFIX=/opt/kubernetes/nvidia
    sudo mkdir -p $NVIDIA_OPENGL_PREFIX/lib
    echo "${NVIDIA_OPENGL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/nvidia.conf
    sudo ldconfig
    
  10. Install the NVIDIA GPU driver:

    NVIDIA_DRIVER_VERSION=<version-number>
    curl -LSf https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run -o nvidia.run
    sudo sh nvidia.run --opengl-prefix="${NVIDIA_OPENGL_PREFIX}"
    

    Set <version-number> to the NVIDIA driver version of your choice.

  11. Load the NVIDIA Unified Memory kernel module and create device files for the module on startup:

    sudo tee /etc/systemd/system/nvidia-modprobe.service << END
    [Unit]
    Description=NVIDIA modprobe
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/bin/nvidia-modprobe -c0 -u
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-modprobe
    sudo systemctl start nvidia-modprobe
    
  12. Enable the NVIDIA persistence daemon to initialize GPUs and keep them initialized:

    sudo tee /etc/systemd/system/nvidia-persistenced.service << END
    [Unit]
    Description=NVIDIA Persistence Daemon
    Wants=syslog.target
    
    [Service]
    Type=forking
    PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
    Restart=always
    ExecStart=/usr/bin/nvidia-persistenced --verbose
    ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-persistenced
    sudo systemctl start nvidia-persistenced
    
  13. Test the device plugin and review its description:

    kubectl describe node <node-name>
    

    Example output:

    Capacity:
    cpu:                8
    ephemeral-storage:  40593612Ki
    hugepages-1Gi:      0
    hugepages-2Mi:      0
    memory:             62872884Ki
    nvidia.com/gpu:     1
    pods:               110
    Allocatable:
    cpu:                7750m
    ephemeral-storage:  36399308Ki
    hugepages-1Gi:      0
    hugepages-2Mi:      0
    memory:             60775732Ki
    nvidia.com/gpu:     1
    pods:               110
    ...
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource        Requests    Limits
    --------        --------    ------
    cpu             500m (6%)   200m (2%)
    memory          150Mi (0%)  440Mi (0%)
    nvidia.com/gpu  0           0
    

Schedule GPU workloads

The following example describes how to deploy a simple workload that reports detected NVIDIA CUDA devices.

  1. Create a practice Deployment that requests nvidia.com/gpu in the limits section. The Pod will be scheduled on any available GPUs in your system.

    kubectl apply -f- <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      creationTimestamp: null
      labels:
        run: gpu-test
      name: gpu-test
    spec:
      replicas: 1
      selector:
        matchLabels:
          run: gpu-test
      template:
        metadata:
          labels:
            run: gpu-test
        spec:
          containers:
          - command:
            - sh
            - -c
            - "deviceQuery && sleep infinity"
            image: kshatrix/gpu-example:cuda-10.2
            name: gpu-test
            resources:
              limits:
                nvidia.com/gpu: "1"
    EOF
    
  2. Verify that it is in the Running state:

kubectl get pods | grep "gpu-test"
NAME                        READY   STATUS    RESTARTS   AGE
gpu-test-747d746885-hpv74   1/1     Running   0          14m
  1. Review the logs. The presence of Result = PASS indicates a successful deployment:

    kubectl logs <name of the pod>
    

    Example output:

    deviceQuery Starting...
    
    CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "Tesla V100-SXM2-16GB"
    ...
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
    Result = PASS
    
  2. Determine the overall GPU capacity of your cluster by inspecting its nodes:

    echo $(kubectl get nodes -l com.docker.ucp.gpu.nvidia="true" \
    -o jsonpath="0{range .items[*]}+{.status.allocatable['nvidia\.com/gpu']}{end}") | bc
    
  3. Set the proper replica number to acquire all available GPUs:

    kubectl scale deployment/gpu-test --replicas N
    
  4. Verify that all of the replicas are scheduled:

    kubectl get pods | grep "gpu-test"
    

    Example output:

    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          12m
    gpu-test-747d746885-swrrx   1/1     Running   0          11m
    
  5. Remove the Deployment and corresponding Pods:

    kubectl delete deployment gpu-test
    

Troubleshooting

If you attempt to add an additional replica to the previous example Deployment, it will result in a FailedScheduling error with the Insufficient nvidia.com/gpu message.

  1. Add an additional replica:

    kubectl scale deployment/gpu-test --replicas N+1
    kubectl get pods | grep "gpu-test"
    

    Example output:

    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          14m
    gpu-test-747d746885-swrrx   1/1     Running   0          13m
    gpu-test-747d746885-zgwfh   0/1     Pending   0          3m26s
    
  2. Review the status of the failed Deployment:

    kubectl describe po gpu-test-747d746885-zgwfh
    

    Example output:

    Events:
    Type     Reason            Age        From               Message
    ----     ------            ----       ----               -------
    Warning  FailedScheduling  <unknown>  default-scheduler  0/2 nodes are available: 2 Insufficient nvidia.com/gpu.