Configure ephemeral volumes

Ephemeral volumes are ideal for workflows, such as raster analytics, that need to load data to a temporary space for processing. Administrators can create persistent volume templates that generate multiple ephemeral volumes on demand for the pods of a specified service deployment. Each pod in the deployment can then use its own ephemeral volume, providing each pod with its own resources to draw from. Once the pods are deleted and the ephemeral volumes are no longer needed, the volumes are also removed.

Ephemeral volumes for raster analytics

Some raster analysis tools distribute computation across multiple worker pods and write temporary data while performing analysis. When processing large amounts of data, it is recommended that you configure ephemeral volumes to provide increased disk space to store temporary data as it is processed.

The necessary disk space to store the temporary data varies for different raster analysis capabilities; however, across all raster analysis capabilities, it is proportional to the number of cells in the raster to process and is evenly distributed across the number of worker pods configured. Larger available disk space may be required depending on the complexity of the analysis and larger inputs.

The cluster configuration must have the necessary disk space allocated for storing the temporary files associated with running a given analysis tool. Temporary data is managed internally by each tool and deleted after the processing has finished. The ephemeral storage configuration that you provide is used as a persisted volume template that is applied to the raster processing service deployment.

Storage guidelines for raster analytics

To support distributed raster analysis, particularly when running large analysis, it is recommended to configure ephemeral volumes. When determining how much storage to allocate, the amount of disk space required will vary based on the number of raster cells to process and the number of configured worker pods.

For example, to process a raster of 2.5 billion cells (50,000 rows and 50,000 columns), you may need 30 GB of disk space when using the Fill tool. Alternatively, to process a raster of 1 billion cells (approximately 30,000 rows and 30,000 columns), you may need 12 GB of disk space when using the Fill tool. In both cases, the total disk space required will be evenly distributed across the number of configured pods. If 10 pods are allocated for the RasterProcessing service, each pod will require 3 GB to process 2.5 billion cells and 1.2 GB to process 1 billion cells. In this scenario, the RasterProcessing service will need ephemeral storage set up with 3 GB or 1.2 GB of disk space. Each pod that spins up will get this amount of ephemeral (temporary) disk space. If you're using auto scaling, you will need to use the maximum number of pods as the basis for this calculation.

The total disk space required to process a raster of 2.5 billion cells for different types of tools is as follows:

  • 17 to 35 GB for hydrology analysis
  • 20 to 80 GB for distance analysis
  • 30 to 33 GB for generalization analysis

Some use cases may require more disk space based on the complexity of the analysis and the additional inputs and outputs specified. In such cases, to process an input raster of 2.5 billion cells, up to 90 GB of disk space may be used in a hydrology analysis workflow, and up to 170 GB in a distance analysis workflow.

When determining your organization's requirements for configuring ephemeral volumes, consider the following recommendations:

  • First, determine how many pods are needed to support the workflow. For example, if the RasterProcessing service is configured with 10 pods, distribute the total space needed across these pods. If the service is enabled with auto scaling, use the maximum number of pods as the basis for this calculation.
  • Next, determine an approximate amount of disk space needed for processing based on the size (number of rows and columns) of the raster dataset.
  • Divide the total disk space approximation by the number of pods that are allocated for the RasterProcessing service. This number is a good general estimate for the amount of storage needed to configure the ephemeral volume. When the ephemeral volume is attached to the RasterProcessing service, each pod will dynamically request this storage when it spins up. For example, if you have a 30 GB total disk space requirement and 10 pods running in the RasterProcessing service deployment, configure 3 GB for each ephemeral volume.

Configure ephemeral volumes for raster analytics

To configure ephemeral volumes in support of raster analytics, complete the steps below. You will use the ArcGIS Enterprise Administrator API Directory to create a persisted volume template and apply it to the RasterProcessing service deployment.

  1. Sign in to the ArcGIS Enterprise Administrator API Directory as an administrator.
  2. Click System > Volumes > Configurations.
  3. Click Create Volume Configuration.
  4. In the volume configuration JSON, include the specification for the ephemeral volume. Work with your IT administrator for this specification if necessary.

    {
      "name": "<user-provided-name>",
      "type": "PVC_TEMPLATE",
      "spec": {
        "storageClassName": "<user-provided-storageclass-name>",
        "resources": {"requests": {"storage": "<user-provided-size, i.e. 3Gi>"}},
        "accessModes": ["ReadWriteOnce"],
        "volumeMode": "Filesystem",
        "volumeName": "<user-provided-optional-volume-name>"
              }
    }
    

  5. Once the volume configuration is created, locate its associated VolumeID value.
  6. From the ArcGIS Enterprise Administrator API root, click Services > System > RasterProcessing (DPServer) > Scaling.
  7. Copy the deploymentId from the RasterProcessing (DPServer) service scaling JSON.
  8. From the root of the ArcGIS Enterprise Administrator API, click System > Deployments, and search for the deploymentId referenced above.
  9. Click the deploymentId for the RasterProcessing (DPServer) service.
  10. Click Edit Deployment.
  11. In the JSON, locate the replicas property.
  12. After the replicas property, add the volume specification, including the VolumeID value that you copied when creating the volume configuration:
     
    "volumes": [{
      "purpose": "GIS_SERVICE_TEMP",
      "volumeConfigId": "<volumeId>"
    }],
    
  13. Click Submit.

    The service deployment will take a few minutes to restart.

  14. Optionally, enable the Run asynchronously option.
  15. With your cluster admin, review the new persisted volume claims (PVCs) that were created for each pod to verify that the ephemeral volume was configured successfully.

    These PVCs will bind to PVs that are dynamically created, pursuant to the registered volume configuration in the cluster.

Use the ephemeral volumes to store temporary data for your raster analytics workflows.