GPU support for Kubernetes workloads

GPU support for Kubernetes workloads

Docker Enterprise provides GPU support for Kubernetes workloads. This exercise walks you through setting up your system to use underlying GPU support, and through deploying GPU-targeted workloads.

To complete the steps, you will need a Docker Hub account as well as an Amazon AWS account or equivalent. The instructions use AWS instances but you can also do them on any of the platforms supported by Docker Enterprise.

Installing a Linux-only MKE cluster

This section describes how to install a MKE cluster with one or more Linux instances. You will use this cluster in the remaining steps.

Creating the first Linux instance

  1. Create the first Linux instance using the steps at https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine to install a two-node Linux-only MKE cluster using the Ubuntu 18.04 AMI.

  2. Log into your Linux instance and install Mirantis Container Runtime.

  3. Install MKE 3.3.0 version on this first Linux instance:

    1. Download the MKE offline bundle using the following command:

      $ curl -o ucp_images.tar.gz https://packages.docker.com/caas/ucp_images_3.3.0.tar.gz
      
    2. Load the MKE image using the following command:

      $ docker load < ucp_images.tar.gz
      
    3. Run the following command to install MKE. Substitute your password for <password> and the public IP address of your VM for the <public IP> placeholder.

      $ docker container run \
      --rm \
      --interactive \
      --tty \
      --name ucp \
      --volume /var/run/docker.sock:/var/run/docker.sock \
      docker/ucp:3.3.0 \
      install \
      --admin-password <password> \
      --debug \
      --force-minimums \
      --san <public IP>
      

      On completion of the command, you will have a single-node MKE cluster with the Linux instance as its Manager node.

Creating additional Linux instances

  1. Use your browser on your local system to log into the MKE installation above.

  2. Navigate to the nodes list and click on Add Node at the top right of the page.

  3. In the Add Node page select “Linux” as the node type. Choose ‘Worker’ for the node role.

  4. Optionally, you may also select and set custom listen and advertise addresses.

  5. A command line will be generated that includes a join-token. It should look something like:

    docker swarm join ... --token <join-token> ...
    
  6. Copy this command line from the UI for use later.

  7. For each additional Linux instance that you need to add to the cluster, do the following:

    1. Create the instance using the steps at https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine , using the Ubuntu 16.04 or 18.04 AMIs.

    2. Log into your Linux instance and install Mirantis Container Runtime.

    3. Download the MKE offline bundle using the following command:

      $ curl -o ucp_images.tar.gz https://packages.docker.com/caas/ucp_images_3.3.0.tar.gz
      
    4. Load the MKE image using the following command:

      $ docker load < ucp_images.tar.gz
      
    5. Add your Linux instance to the MKE cluster by running the swarm-join commandline generated above.

Installing and setting up docker and kubectl CLIs

To access your MKE cluster it is necessary to have both the docker CLI and the kubectl CLI running on your local system.

Perform the following steps once you are finished installing the MKE cluster.

  1. Download the MKE CLI client and client certificates.
  2. Install and Set Up kubectl.

Installing the GPU drivers

GPU drivers are required for setting up GPU support. The instalation of these drivers can occur either before or after the installation of MKE.

The GPU drivers installation procedure will install the NVIDIA driver by way of a runfile on your Linux host. Note that this procedure uses version 440.59, which is the latest available and verified version at the time of this writing.

Note

This procedure describes how to manually install these drivers, but it is recommended that you use a pre-existing automation system to automate installation and patching of the drivers along with the kernel and other host software.

  1. Ensure that your NVIDIA GPU is supported:

    lspci | grep -i nvidia
    
  2. Verify that your GPU is a supported NVIDIA GPU Product.

  3. Install dependencies.

  4. Verify that your system is up-to-date, and you are running the latest kernel.

  5. Install the following packages depending on your OS.

    • Ubuntu:

      sudo apt-get install -y gcc make curl linux-headers-$(uname -r)
      
    • RHEL:

      sudo yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make curl elfutils-libelf-devel
      
  6. Ensure that i2c_core and ipmi_msghandler kernel modules are loaded:

    sudo modprobe -a i2c_core ipmi_msghandler
    

    To persist the change across reboots:

    echo -e "i2c_core\nipmi_msghandler" | sudo tee /etc/modules-load.d/nvidia.conf
    

    All of the NVIDIA libraries are present under the specific directory on the host:

    NVIDIA_OPENGL_PREFIX=/opt/kubernetes/nvidia
    sudo mkdir -p $NVIDIA_OPENGL_PREFIX/lib
    echo "${NVIDIA_OPENGL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/nvidia.conf
    sudo ldconfig
    
  7. Run the installation:

    NVIDIA_DRIVER_VERSION=440.59
    curl -LSf https://us.download.nvidia.com/XFree86/Linux-x86_64/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run -o nvidia.run
    

    Note

    –opengl-prefix must be set to /opt/kubernetes/nvidia: sudo sh nvidia.run –opengl-prefix=”${NVIDIA_OPENGL_PREFIX}”

  8. Load the NVIDIA Unified Memory kernel module and create device files for the module on startup:

    sudo tee /etc/systemd/system/nvidia-modprobe.service << END
    [Unit]
    Description=NVIDIA modprobe
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/bin/nvidia-modprobe -c0 -u
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-modprobe
    sudo systemctl start nvidia-modprobe
    
  9. Enable the NVIDIA persistence daemon to initialize GPUs and keep them initialized:

    sudo tee /etc/systemd/system/nvidia-persistenced.service << END
    [Unit]
    Description=NVIDIA Persistence Daemon
    Wants=syslog.target
    
    [Service]
    Type=forking
    PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
    Restart=always
    ExecStart=/usr/bin/nvidia-persistenced --verbose
    ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
    
    [Install]
    WantedBy=multi-user.target
    END
    
    sudo systemctl enable nvidia-persistenced
    sudo systemctl start nvidia-persistenced
    

See Driver Persistence <https://docs.nvidia.com/deploy/driver-persistence/index.html> for more information.

Deploying the device plugin

MKE includes a GPU device plugin to instrument your GPUs, which is necessary for GPU support. This section describes deploying nvidia.com/gpu.

kubectl describe node <node-name>

...
Capacity:
cpu:                8
ephemeral-storage:  40593612Ki
hugepages-1Gi:      0
hugepages-2Mi:      0
memory:             62872884Ki
nvidia.com/gpu:     1
pods:               110
Allocatable:
cpu:                7750m
ephemeral-storage:  36399308Ki
hugepages-1Gi:      0
hugepages-2Mi:      0
memory:             60775732Ki
nvidia.com/gpu:     1
pods:               110
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource        Requests    Limits
--------        --------    ------
cpu             500m (6%)   200m (2%)
memory          150Mi (0%)  440Mi (0%)
nvidia.com/gpu  0           0

Scheduling GPU workloads

To consume GPUs from your container, request nvidia.com/gpu in the limits section. The following example shows how to deploy a simple workload that reports detected NVIDIA CUDA devices.

  1. Create the example deployment:

    kubectl apply -f- <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    creationTimestamp: null
    labels:
       run: gpu-test
    name: gpu-test
    spec:
    replicas: 1
    selector:
       matchLabels:
          run: gpu-test
    template:
       metadata:
          labels:
          run: gpu-test
       spec:
          containers:
          - command:
          - sh
          - -c
          - "deviceQuery && sleep infinity"
          image: kshatrix/gpu-example:cuda-10.2
          name: gpu-test
          resources:
             limits:
                nvidia.com/gpu: "1"
    EOF
    
  2. If you have any available GPUs in your system, the pod will be scheduled on them. After some time, the Pod should be in the Running state:

    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          14m
    
  3. Check the logs and look for Result = PASS to verify successful completion:

    kubectl logs <name of the pod>
    
    deviceQuery Starting...
    
    CUDA Device Query (Runtime API) version (CUDART static linking)
    
    Detected 1 CUDA Capable device(s)
    
    Device 0: "Tesla V100-SXM2-16GB"
    ...
    
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
    Result = PASS
    
  4. Determine the overall GPU capacity of your cluster by inspecting its nodes:

    echo $(kubectl get nodes -l com.docker.ucp.gpu.nvidia="true" -o jsonpath="0{range .items[*]}+{.status.allocatable['nvidia\.com/gpu']}{end}") | bc
    
  5. Set the proper replica number to acquire all available GPUs:

    kubectl scale deployment/gpu-test --replicas N
    
  1. Verify that all of the replicas are scheduled:

    kubectl get po
    NAME                        READY   STATUS    RESTARTS   AGE
    gpu-test-747d746885-hpv74   1/1     Running   0          12m
    gpu-test-747d746885-swrrx   1/1     Running   0          11m
    

If you attempt to add an additional replica, it should result in a FailedScheduling error with Insufficient nvidia.com/gpu message:

kubectl scale deployment/gpu-test --replicas N+1

kubectl get po
NAME                        READY   STATUS    RESTARTS   AGE
gpu-test-747d746885-hpv74   1/1     Running   0          14m
gpu-test-747d746885-swrrx   1/1     Running   0          13m
gpu-test-747d746885-zgwfh   0/1     Pending   0          3m26s

Run kubectl describe po gpu-test-747d746885-zgwfh to see the status of the failed deployment:

...
Events:
Type     Reason            Age        From               Message
----     ------            ----       ----               -------
Warning  FailedScheduling  <unknown>  default-scheduler  0/2 nodes are available: 2 Insufficient nvidia.com/gpu.

Remove the deployment and corresponding pods:

kubectl delete deployment gpu-test