GPU enabled nodes—ArcGIS Enterprise on Kubernetes

Kubernetes includes support for managing graphical processing units (GPUs) across different nodes in a cluster, using device plugins.

In ArcGIS Enterprise on Kubernetes, you can implement a device plugin to enable GPU nodes in a cluster, to optimize GIS workflows, such as those pertaining to raster analytics and deep learning. By default, capabilities such as raster analytics are configured to run in CPU mode but also provide the flexibility to run in GPU mode when these resources are available.

Consideration for the availability and utilization of GPU in a cluster is optional, as it will incur additional cost.

To enable GPU, a NVIDIA device plugin for Kubernetes is required. The NVIDIA device plugin for Kubernetes is a daemonset that allows you to expose the number of GPUs on each node of a cluster, run GPU enabled containers, and track the health of the GPUs.

Note:

At this release, ArcGIS Enterprise on Kubernetes is only supported with NVIDIA GPUs.

Enable GPU

Steps to enable GPU for your organization include the following, which are specific to your environment and preferences.

Complete steps to configure raster analytics or another capability for which you want to use GPU-enabled nodes.
Verify whether your instance has the device plugin installed.
Many cloud environments are preconfigured with GPU nodes. If the device plugin is not installed, see the NVIDIA device plugin for Kubernetes documentation for details and installation steps. If you've deployed on-premises, your administrator must enable GPU on each node in a cluster.
To leverage GPU enabled nodes for your organization's GIS workflows, set requests and limits for GPU.
Optionally, to run GPU workloads exclusively on GPU nodes, configure node affinity and tolerations.

Set requests and limits for GPU

Use the ArcGIS Enterprise Administrator API Directory to set requests and limits for GPU for each of the following deployments:

system-rasteranalysistools-gpsyncserver (used for training models)
system-rasterprocessinggpu-dpserver (used for processing)

Sign in to the ArcGIS Enterprise Administrator API Directory as an administrator.
Click System > Deployments.
Locate the system-rasteranalysistools-gpsyncserver deployment, and click its corresponding ID.
Click Edit Deployment.

In the deployment JSON, locate the resources section for the deployment and the customResources parameter.

          
"containers": [
      {
        "name": "main-container",
        "resources": {
          "memoryMin": "4Gi",
          "memoryMax": "8Gi",
          "cpuMin": "0.125",
          "customResources": {
            "limits":{"nvidia.com/gpu": "1"},
            "requests":{"nvidia.com/gpu": "1"}
          },
          "cpuMax": "2"
        },

Update the customResources parameter for each container listed to include requests and limits for GPU.
Click Submit to save edits to the deployment.
Repeat the steps for the system-rasterprocessinggpu-dpserver deployment.

Learn how to edit system deployments in the Administrator Directory API documentation.

Configure node affinity and tolerations

GPU nodes can have both CPU and GPU workloads running on them. If your CPU workloads are allowed to run on a GPU node, no further steps are needed. However, to ensure that GPU workloads are run exclusively on GPU nodes, your administrator must take additional steps to configure node affinity and tolerations. Doing so involves the following steps to taint the nodes and apply tolerations to applicable services so that they can be scheduled on a tainted node:

Note:

From ArcGIS Enterprise Manager, you can choose the effect of NoSchedule and PreferNoSchedule. At this release, the NoExecute effect is not available because it can be disruptive to workflows that are currently running on a node. However, NoExecute can be applied when configuring node affinity and tolerations from the Admin API.

To ensure GPU workloads are scheduled exclusively on GPU nodes, taint the GPU nodes.
```
kubectl taint nodes <your-node-name> nvidia.com/gpu=Exists:NoSchedule
```
Label the GPU nodes. Alternatively, use an existing label that's already specified on the node.
```
kubectl label nodes <your-node-name> raster=GPU
```
Sign in to ArcGIS Enterprise Manager as an administrator.
Click the Services button on the sidebar.
Click System services, and select the RasterProcessingGPU service.
Click the Pod placement tab to manage pod placement for the service.
To apply a node affinity rule that ensures the service's pods are scheduled on GPU nodes, provide the following information in the Node Affinity section and click Add:
- Type—Required
- Key—raster
- Operator—In
- Value—GPU
To apply a toleration that allows the pods to run on tainted nodes, provide the following information in the Tolerations section and click Add:
- Effect—No schedule
- Key—nvidia.com/gpu
- Operator—Exists
Click Save.
Verify that GPU pods are running on the GPU nodes.

You can begin to use raster analysis tools and host imagery in your organization. Additionally, see recommendations for Tuning raster analytics.

Feedback on this topic?

Note:

Enable GPU

Set requests and limits for GPU

Configure node affinity and tolerations

Note:

In this topic