Data loss and downtime minimization—ArcGIS Enterprise on Kubernetes

With ArcGIS Enterprise, your organization can create and share geographic information, maps, and applications, and perform geographic analyses in a collaborative environment. Because the content and tools in an ArcGIS Enterprise deployment are necessary for your organization's functions, ensure that the deployment is available to your users as much as possible with minimal data loss in the event of a disaster. You can do this with a disaster recovery or failover strategy.

You can do the following using built-in capabilities in ArcGIS Enterprise:

Select an architecture profile that meets your organization's requirements for redundancy and availability.
Maintain backups of the ArcGIS Enterprise deployment, so you can restore one if a disaster occurs.

When determining a viable option, take into account your organization's requirements, as well as your familiarity with each approach. Consider the following questions:

How much downtime is acceptable (if any)?
How much data loss is acceptable (if any)?
How many resources, such as hardware, licensing, and staff, can your organization devote to data loss and downtime prevention?

High availability in ArcGIS Enterprise on Kubernetes

ArcGIS Enterprise on Kubernetes provides resilience and built-in high availability using the native behavior of the Kubernetes cluster. Having Kubernetes organize and manage pods and container life cycles ensures that the pods that enable the deployment can be recovered quickly following a failure.

All containers in each pod are configured with liveness and readiness probes to ensure that the life cycle and monitoring of the pods are managed by the cluster. Based on anti-affinity settings, ArcGIS Enterprise on Kubernetes adheres to the basic principles of high availability, such as no single points of failure, and automatic detection and recovery of failed components.

Failover

When ArcGIS Enterprise on Kubernetes is deployed using a highly available architecture profile, the level of redundancy across pods is increased and the risk of unplanned downtime is minimized. All essential and critical pods are deployed in either stateful sets or replica sets. This behavior allows the Kubernetes cluster to automatically reschedule all failed pods on the same or different nodes in the cluster without administrator intervention.

The failure of a pod in a stateful set or a replica set containing two or more replicas typically has no impact on users' access to the deployment. While healthy pods continue to function, the Kubernetes cluster automatically reschedules the failed pods. However, the following two situations may impact business operations:

The detection of a failure of the primary relational data store and its subsequent recovery or the promotion of a standby relational data store may take a few minutes. During this time, users may not have access to the deployment.
If Standard availability is selected as the architecture profile, the failure of one of the object store pods or half or more of the persistent volumes associated with the object store results in the deployment being in read-only mode. When the deployment is in read-only mode, users can access existing content but cannot create or modify content.

Feedback on this topic?