Configure GPU-enabled nodes

ArcGIS 11.5 | |  Help archive

Kubernetes includes support for managing graphical processing units (GPUs) across different nodes in a cluster, using device plugins.

In ArcGIS Enterprise on Kubernetes, you can implement a device plugin to enable GPU nodes in a cluster to optimize GIS workflows, such as those pertaining to raster analytics and deep learning. By default, capabilities such as raster analytics are configured to run in CPU mode but also provide the flexibility to run in GPU mode when these resources are available.

Consideration for the availability and use of GPU in a cluster is optional, as it will incur additional cost.

To enable GPU, an NVIDIA device plugin for Kubernetes is required. The NVIDIA device plugin for Kubernetes is a daemonset that allows you to expose the number of GPUs on each node of a cluster, run GPU-enabled containers, and track the health of the GPUs.

Note:

At this release, ArcGIS Enterprise on Kubernetes is only supported with NVIDIA GPUs.

Enable GPU

To enable GPU for your organization, complete the following steps, which are specific to your environment and preferences:

  1. Complete steps to configure raster analytics, notebook services, or another capability for which you want to use GPU-enabled nodes.
  2. Verify whether your instance has the device plugin installed.

    Many cloud environments are preconfigured with GPU nodes. If the device plugin is not installed, see the NVIDIA device plugin for Kubernetes documentation for details and installation steps. If you've deployed on-premises, your administrator must enable GPU on each node in a cluster.

  3. To use GPU-enabled nodes for your organization's GIS workflows, configure access to GPU resources.
  4. Optionally, to run GPU workloads exclusively on GPU nodes, configure node affinity and tolerations.

Configure access to GPU resources

If you are enabling GPU for notebook services, see View and edit runtimes for information on setting GPU units per node.

If you are enabling GPU for raster analytics, complete the following steps to use the ArcGIS Enterprise Administrator API Directory to set GPU requests and limits for each of the following deployments:

  • system-rasteranalysistools-gpsyncserver (used for training models)
  • system-rasterprocessinggpu-dpserver (used for processing)

  1. Sign in to the ArcGIS Enterprise Administrator API Directory as an administrator.
  2. Click System > Deployments.
  3. Locate the system-rasteranalysistools-gpsyncserver deployment, and click its corresponding ID.
  4. Click Edit Deployment.
  5. In the deployment JSON, locate the resources section for the deployment and the customResources parameter.
              
    "containers": [
          {
            "name": "main-container",
            "resources": {
              "memoryMin": "4Gi",
              "memoryMax": "8Gi",
              "cpuMin": "0.125",
              "customResources": {
                "limits":{"nvidia.com/gpu": "1"},
                "requests":{"nvidia.com/gpu": "1"}
              },
              "cpuMax": "2"
            },
    
  6. Update the customResources parameter for each container listed to include requests and limits for GPU.
  7. Click Submit to save edits to the deployment.
  8. Repeat the steps for the system-rasterprocessinggpu-dpserver deployment.

Learn how to edit system deployments in the Administrator Directory API documentation.

Configure node affinity and tolerations

GPU nodes can have both CPU and GPU workloads running on them. If your CPU workloads are allowed to run on a GPU node, no further steps are needed. However, to ensure that GPU workloads are run exclusively on GPU nodes, your administrator must take additional steps to configure node affinity and tolerations. Complete the following steps to taint the nodes and apply tolerations to applicable services so that they can be scheduled on a tainted node:

  1. To ensure GPU workloads are scheduled exclusively on GPU nodes, taint the GPU nodes.

    kubectl taint nodes <your-node-name> nvidia.com/gpu=Exists:NoExecute
    

  2. Label the GPU nodes. Alternatively, use an existing label that's already specified on the node.

    To label a node with the key raster and the value GPU for raster analytics, use the following command:

    kubectl label nodes <your-node-name> raster=GPU
    

    To label a node with the key notebook and the value NotebooksGPUNode for notebook services, use the following command:

    kubectl label nodes <your-node-name> notebook=NotebooksGPUNode
    

  3. Sign in to ArcGIS Enterprise Manager as an administrator.
  4. Open the pod placement settings.

    If you are enabling GPU for notebook services, see View and edit runtimes for information on accessing pod placement settings.

    To access the pod placement settings for raster analytics, complete the following steps:

    1. Click the Services button on the sidebar.
    2. Click System services, and select the RasterProcessingGPU service.
    3. Click the Pod placement tab.
  5. To apply a node affinity rule that ensures the service's pods are scheduled on GPU nodes, provide the following information in the Node Affinity section and click Add:
    • Type—Required
    • Key—Specify the key used for labeling the GPU node, for example raster.
    • Operator—In
    • Value—Specify the value used for labeling the GPU node, for example GPU.
  6. To apply a toleration that allows the pods to run on the nodes you will taint, provide the following information in the Tolerations section and click Add:
    • Effect—No Execute
    • Key—nvidia.com/gpu
    • Operator—Exists

  7. Click Save.
  8. Verify that GPU pods are running on the GPU nodes.

You can begin to use GPU resources for your workloads. Additionally, see recommendations for tuning raster analytics.