← Local-first GPU Cluster with nvkind and Time Splitting
You have a brand new shiny GPU and want to start experimenting with it by running some sample experiments in Kubernetes, but how would you start that. In this short tutorial, we go over how to use nvkind, the gpu-operator to start running some basic experiemtns using your new GPU. We assume that the reader already has things such as Docker, golang, and relevant drivers/systems (nvidia-ctk, nvidia-smi, etc.) installed too.
Installing nvkind
First, you want to make sure that your docker runtime ir properly configured to run Nvidia workloads. To check, you should run the following:
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L
That should have an output like the following:
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-cb076aca-f9ce-5a42-7e1e-a5ad60e69baf)
If you do run into any issues, make sure to configure the Docker backend to use Nvidia:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker
If all of these steps have passed, you are now able to install nvkind. To do so, there are two methods, using go and building the binary from scratch. In this tutorial, we will install it using go which is just a simple command. Running nvkind should verify that it has been successfully installed.
go install github.com/NVIDIA/nvkind/cmd/nvkind@latest
nvkind
NAME:
nvkind - kind for use with NVIDIA GPUs
USAGE:
nvkind [global options] command [command options]
VERSION:
devel
COMMANDS:
cluster perform operations on cluster with NVIDIA GPUs
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--help, -h show help
--version, -v print the version
Now, there is no fundamental difference between the normal kind and nvkind. The underlying concepts and functionalities are still the same. However, one platform has a better native integration with GPUs.
Deploying your first cluster with nvkind
To deploy your first cluster with nvkind, you can use the following configuration file. What this configuration does is it creates a cluster with a single control plane node and a single worker node. The worker node is configured to have a GPU mounted to it.
nvkind cluster create --config kind-gpu.yaml
After a while, you should be able to see the cluster running with kubectl commands.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane 3m25s v1.27.0
kind-worker Ready <none> 3m17s v1.27.0
In order to see the GPU being used, you can use the following command:
kubectl describe node kind-worker
Allocatable:
cpu: 16
ephemeral-storage: 960300048Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 63414596Ki
nvidia.com/gpu: 1
pods: 110
Running your first GPU workload with a sample pod.
Now, you are able to run your first GPU workload with a sample pod. You can use the following configuration file to run a pod with a GPU.
What this configuration does is it runs a pod with a GPU. The pod is configured to run the vectoradd sample from the cuda-sample image. You can apply the configuration with the following command:
kubectl apply -f pod-gpu.yaml
After a while, you should be able to see the pod running with the following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
cuda-vectoradd 0/1 Completed 0 10s
And you can also verify that the GPU is being used with the following command:
kubectl logs cuda-vectoradd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Why limit yourself to a single GPU?
Now, you are able to run your first GPU workload with a sample pod. However, you might be wondering why limit yourself to a single GPU? A limitation of Kubernetes is that without any special configurations, you cannot run multiple pods on the same GPU. That means that if you had 1 GPU availble, and wanted to run 4 pods on it, then you would only be able to run 1 pod at a time. This inefficient since you are leaving compute time on the table and much of the gpu resources idle. That is where time splitting comes in. According to the Nvidia documentation:
The NVIDIA GPU Operator enables oversubscription of GPUs through a set of extended options for the NVIDIA Kubernetes Device Plugin. GPU time-slicing enables workloads that are scheduled on oversubscribed GPUs to interleave with one another.
For the rest of this tutorial, we will be using time splitting to “divide” up our GPU resource across multiple pods that run concurrently. First, you want to create a configmap which will detail the configuration options needed for the GPU operator:
This configmap is telling the GPU operator to divide the GPU resource across 4 pods. You can apply the configuration with the following command:kubectl apply -f gpu-configmap.yaml
Next, we will need to update our cluster device plugin to factor in the new configurations. You can do so by running the following command which essentially patches and restarts plugin and gpu-discovery pods:
kubectl patch clusterpolicies.nvidia.com/cluster-policy \
-n gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
After a while, you should be able to see the plugin and gpu-discovery pods restarted with the following command:
kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-operator-controller-manager-5665f76b7d-9p25z 2/2 Running 0 10m
gpu-operator-gpu-feature-discovery-2l5bf 1/1 Running 0 10m
gpu-operator-gpu-feature-discovery-j2rmq 1/1 Running 0 10m
gpu-operator-gpu-feature-discovery-v464l 1/1 Running 0 10m
gpu-operator-gpu-feature-discovery-x996s 1/1 Running 0 10m
gpu-operator-nvidia-driver-daemonset-q6vq2 1/1 Running 0 10m
However, now you notice that when you run kubectl describe node kind-worker, you will see that the GPU available number is now 4, showing that we have successfully “created” a configuration where the cluster thinks we have 4 GPUs available.
Allocatable:
...
nvidia.com/gpu: 4
...
Running multiple pods with GPU resources
The main source for this section is from the AWS documentation on running a sample workload at the documentation here. In this section, we will be using the same sample workload and running it with 4 pods. In this case, we will be running the cifar10 classification task.
Running this deployment will create 4 pods, each with a GPU resource. You can verify this by running the following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
tensorflow-cifar10-deployment-5d6bb86d9d-6c6h6 1/1 Running 0 2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-7h7h7 1/1 Running 0 2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-8h8h8 1/1 Running 0 2m24s
tensorflow-cifar10-deployment-5d6bb86d9d-9h9h9 1/1 Running 0 2m24s
Conclusion
In this tutorial, we have learned how to use nvkind to deploy a cluster with a GPU and run a sample workload on it. We have also learned how to use time splitting to “divide” up the GPU resource across multiple pods that run concurrently. This allows you to maximize the utilization of your GPU resources and run more workloads on it.