Local-first GPU Cluster with nvkind and Time Splitting

You have a brand new shiny GPU and want to start experimenting with it by running some sample experiments in Kubernetes, but how would you start that. In this short tutorial, we go over how to use nvkind, the gpu-operator to start running some basic experiemtns using your new GPU. We assume that the reader already has things such as Docker, golang, and relevant drivers/systems (nvidia-ctk, nvidia-smi, etc.) installed too.

Installing nvkind

First, you want to make sure that your docker runtime ir properly configured to run Nvidia workloads. To check, you should run the following:

docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L

That should have an output like the following:

GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-cb076aca-f9ce-5a42-7e1e-a5ad60e69baf)

If you do run into any issues, make sure to configure the Docker backend to use Nvidia:

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker

If all of these steps have passed, you are now able to install nvkind. To do so, there are two methods, using go and building the binary from scratch. In this tutorial, we will install it using go which is just a simple command. Running nvkind should verify that it has been successfully installed.

go install github.com/NVIDIA/nvkind/cmd/nvkind@latest
nvkind

NAME:
   nvkind - kind for use with NVIDIA GPUs

USAGE:
   nvkind [global options] command [command options]

VERSION:
   devel

COMMANDS:
   cluster  perform operations on cluster with NVIDIA GPUs
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -v  print the version

Now, there is no fundamental difference between the normal kind and nvkind. The underlying concepts and functionalities are still the same. However, one platform has a better native integration with GPUs.

Deploying your first cluster with nvkind

To deploy your first cluster with nvkind, you can use the following configuration file. What this configuration does is it creates a cluster with a single control plane node and a single worker node. The worker node is configured to have a GPU mounted to it.

kind-gpu.yaml

  kind: Cluster
  apiVersion: kind.x-k8s.io/v1alpha4
  nodes:
  - role: control-plane
  - role: worker
    extraMounts:
      - hostPath: /dev/null
        containerPath: /var/run/nvidia-container-devices/all
And you can deploy the cluster with the following command:

nvkind cluster create --config kind-gpu.yaml

After a while, you should be able to see the cluster running with kubectl commands.

kubectl get nodes
NAME          STATUS   ROLES   AGE   VERSION
kind-control-plane   Ready    control-plane   3m25s   v1.27.0
kind-worker          Ready    <none>          3m17s   v1.27.0

In order to see the GPU being used, you can use the following command:

kubectl describe node kind-worker
Allocatable:
  cpu:                16
  ephemeral-storage:  960300048Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             63414596Ki
  nvidia.com/gpu:     1
  pods:               110

Running your first GPU workload with a sample pod.

Now, you are able to run your first GPU workload with a sample pod. You can use the following configuration file to run a pod with a GPU.

pod-gpu.yaml

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
    resources:
      limits:
        nvidia.com/gpu: 1

What this configuration does is it runs a pod with a GPU. The pod is configured to run the vectoradd sample from the cuda-sample image. You can apply the configuration with the following command:

kubectl apply -f pod-gpu.yaml

After a while, you should be able to see the pod running with the following command:

kubectl get pods
NAME          READY   STATUS      RESTARTS   AGE
cuda-vectoradd   0/1     Completed   0          10s

And you can also verify that the GPU is being used with the following command:

kubectl logs cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Why limit yourself to a single GPU?

Now, you are able to run your first GPU workload with a sample pod. However, you might be wondering why limit yourself to a single GPU? A limitation of Kubernetes is that without any special configurations, you cannot run multiple pods on the same GPU. That means that if you had 1 GPU availble, and wanted to run 4 pods on it, then you would only be able to run 1 pod at a time. This inefficient since you are leaving compute time on the table and much of the gpu resources idle. That is where time splitting comes in. According to the Nvidia documentation:

The NVIDIA GPU Operator enables oversubscription of GPUs through a set of extended options for the NVIDIA Kubernetes Device Plugin. GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another.

For the rest of this tutorial, we will be using time splitting to “divide” up our GPU resource across multiple pods that run concurrently. First, you want to create a configmap which will detail the configuration options needed for the GPU operator:

gpu-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config-all
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 4
This configmap is telling the GPU operator to divide the GPU resource across 4 pods. You can apply the configuration with the following command:

kubectl apply -f gpu-configmap.yaml

Next, we will need to update our cluster device plugin to factor in the new configurations. You can do so by running the following command which essentially patches and restarts plugin and gpu-discovery pods:

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
    -n gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'

After a while, you should be able to see the plugin and gpu-discovery pods restarted with the following command:

kubectl get pods -n gpu-operator
NAME                                    READY   STATUS    RESTARTS   AGE
gpu-operator-controller-manager-5665f76b7d-9p25z   2/2     Running   0          10m
gpu-operator-gpu-feature-discovery-2l5bf       1/1     Running   0          10m
gpu-operator-gpu-feature-discovery-j2rmq       1/1     Running   0          10m
gpu-operator-gpu-feature-discovery-v464l       1/1     Running   0          10m
gpu-operator-gpu-feature-discovery-x996s       1/1     Running   0          10m
gpu-operator-nvidia-driver-daemonset-q6vq2     1/1     Running   0          10m

However, now you notice that when you run kubectl describe node kind-worker, you will see that the GPU available number is now 4, showing that we have successfully “created” a configuration where the cluster thinks we have 4 GPUs available.

Allocatable:
  ...
  nvidia.com/gpu:     4
  ...

Running multiple pods with GPU resources

The main source for this section is from the AWS documentation on running a sample workload at the documentation here. In this section, we will be using the same sample workload and running it with 4 pods. In this case, we will be running the cifar10 classification task.

pod-gpu-cifar10.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow-cifar10-deployment
  labels:
    app: tensorflow-cifar10
spec:
  replicas: 4
  selector:
    matchLabels:
      app: tensorflow-cifar10
  template:
    metadata:
      labels:
        app: tensorflow-cifar10
    spec:
      containers:
      - name: tensorflow-cifar10
        image: public.ecr.aws/r5m2h0c9/cifar10_cnn:v2
        resources:
          limits:
            nvidia.com/gpu: 1

Running this deployment will create 4 pods, each with a GPU resource. You can verify this by running the following command:

kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
tensorflow-cifar10-deployment-5d6bb86d9d-6c6h6   1/1     Running   0          2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-7h7h7   1/1     Running   0          2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-8h8h8   1/1     Running   0          2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-9h9h9   1/1     Running   0          2m24s

Conclusion

In this tutorial, we have learned how to use nvkind to deploy a cluster with a GPU and run a sample workload on it. We have also learned how to use time splitting to “divide” up the GPU resource across multiple pods that run concurrently. This allows you to maximize the utilization of your GPU resources and run more workloads on it.