NVIDIA GPU Workloads#

Mirantis Kubernetes Engine (MKE) 4k supports running workloads on NVIDIA GPU nodes. Current support is limited to NVIDIA GPUs.

Info

GPU Feature Discovery (GFD) is enabled by default.

You can run the following command to verify your GPU specifications:

sudo lspci | grep -i nvidia

Example output:

00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

NVIDIA GPU Operator#

To manage your GPU resources and enable GPU support, MKE 4k installs the NVIDIA GPU Operator on your cluster.

important

Before you enable the NVIDIA GPU Operator, ensure that your worker nodes meet the kernel tuning requirements, as the default Linux limits may not be sufficient for the nvidia-device-plugin which can lead to startup crashes.

Example error message, which presents within the nvidia-device-plugin-daemonset:

1 main.go:173] failed to create FS watcher for /var/lib/kubelet/device-plugins/: too many open files

To enable NVIDIA GPU Operator:

Edit /etc/sysctl.conf on all GPU nodes as follows:

# Required for NVIDIA Device Plugin to function correctly
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

Apply the modification:
```
sudo sysctl -p
```

The use of NVIDIA GPU Operator causes the following resources to be installed and configured on each node:

Tip

To verify your GPU specifications:

sudo lspci | grep -i nvidia

Example output:

00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

For a comprehensive list of the devices that are supported by the NVIDIA GPU Operator, refer to the official NVIDIA documentation Platform Support.

Important

For air-gapped MKE 4k clusters, you must deploy a package registry and mirror the drivers to it, as described in the official NVIDIA documentation, Install NVIDIA GPU Operator in Air-Gapped Environments - Local Package Repository.
GPU Operator is not verified on Rocky OS. As such, the use of GPU Operator on the Rocky OS might require community-supported drivers.
RHEL does not currently support GPU Operator.

Configuration#

NVIDIA GPU support is disabled in MKE 4k by default. To enable NVIDIA GPU support:

Obtain the mke4.yaml configuration file:
```
mkectl init > mke4.yaml
```
Navigate to the devicePlugins.nvidiaGPU section of the mke4.yaml configuration file, and set the enabled parameter to true.
```
devicePlugins:
  nvidiaGPU:
    enabled: true
```
Apply the new configuration setting:
```
mkectl apply -f mke4.yaml
```

Important

Pod startup time can vary depending on node performance, during which the Pods will seem to be in a state of failure.

Verification#

Once your NVIDIA GPU support configuration has completed, you can verify your setup.

Ensure Initialization of GPU Operator and Nvidia Drivers#

Important

All pods listed must present Running or Completed status. Do not proceed if the status is Init, ContainerCreating, or Pending.

Filter for the relevant pods:

kubectl get pods -n mke | grep -E 'NAME|nvidia|gpu'

The nvidia-cuda-validator pod presents Completed status to indicate a successful validation.

Example output: Initialization in Prgoress (DO NOT PROCEED):

NAME                                             READY   STATUS     RESTARTS
gpu-feature-discovery-2lwcw                      0/1     Init:0/2   0
gpu-operator-785554d7b6-wqjrb                    1/1     Running    1
nvidia-container-toolkit-daemonset-xh4sb         0/1     Init:0/1   0
nvidia-dcgm-exporter-fmpqm                       0/1     Init:0/1   0
nvidia-device-plugin-daemonset-jmrq8             0/1     Init:0/1   0
nvidia-driver-daemonset-hcxkn                    0/1     Running    0
nvidia-operator-validator-tbxp9                  0/1     Init:0/4   0

Example output: System Read (SAFE TO PROCEED):

NAME                                             READY   STATUS      RESTARTS
gpu-feature-discovery-2lwcw                      1/1     Running     0
gpu-operator-785554d7b6-wqjrb                    1/1     Running     1
nvidia-container-toolkit-daemonset-xh4sb         1/1     Running     0
nvidia-cuda-validator-q89wk                      0/1     Completed   0
nvidia-dcgm-exporter-fmpqm                       1/1     Running     0
nvidia-device-plugin-daemonset-jmrq8             1/1     Running     0
nvidia-driver-daemonset-hcxkn                    1/1     Running     0
nvidia-operator-validator-tbxp9                  1/1     Running     0

Detect NVIDIA GPU Devices#

Run a simple GPU workload that reports detected NVIDIA GPU devices:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
EOF

Verify the successful completion of the Pod:

kubectl get pods | grep "gpu-pod"

Example output:

NAME                        READY   STATUS    RESTARTS   AGE
gpu-pod                     0/1     Completed 0          7m56s

Run a GPU Workload#

Create the workload:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
    resources:
      limits:
        nvidia.com/gpu: 1
EOF

Run the following command once the Pod has reached Completed status:

kubectl logs pod/cuda-vectoradd

Example output:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Clean up the Pod:
```
kubectl delete -f cuda-vectoradd.yaml
```

Count GPUs#

Run the following command once you have enabled the NVIDIA GPU Device Plugin and the Pods have stabilized:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPUs:.metadata.labels.nvidia\.com/gpu\.count"

Example results, showing a cluster with 3 control-plane nodes and 3 worker nodes:

NAME                                           GPUs
ip-172-31-174-195.us-east-2.compute.internal   1
ip-172-31-228-160.us-east-2.compute.internal   <none>
ip-172-31-231-180.us-east-2.compute.internal   1
ip-172-31-26-15.us-east-2.compute.internal     <none>
ip-172-31-3-198.us-east-2.compute.internal     1
ip-172-31-99-105.us-east-2.compute.internal    <none>