Business Continuity Planning and Disaster Recovery

You're viewing Apigee Edge documentation.
Go to the Apigee X documentation. info

Apigee is a multitenant, self-service, cloud-based platform that runs in a fully redundant (live/live) configuration across multiple datacenters in multiple regions of the globe. Apigee uses Google Cloud Platform (GCP) and Amazon Web Services (AWS) for our cloud-based platform. As part of the services we build on GCP and AWS, we use multiple data centers within each region and service live traffic for our customers across these multiple data centers. We do not have a "live" data center and a "standby" (or "secondary" or "failover") data center. We have two (or more) data centers constantly and simultaneously servicing customer traffic in each region globally.

BCP/DR plan

Apigee Business Continuity Planning and Disaster Recovery (BCP/DR) is a platform-wide plan and does not contain detailed tasks for individual customers. Rather, the platform is configured to process customer data requests regardless of disruptions and outages. The data will continue to flow even if an entire data center is offline. If an entire region were to go offline, a single-region customer could experience an outage of API processing services. For customers looking for more than "in-region" redundant services, Apigee offers a globally redundant level of redundant data centers where traffic can be serviced in multiple regions or countries so that if an entire region goes offline, the data still flows.

Single-region customer services are not automatically transferred to another region because of possible geographic restrictions on data processing and access. Apigee hosts services for customers in the region identified by the customer. Because there may be specific regulations or customer commitments to their users on geographic locations of data, Apigee will not automatically move services to an alternate region, as this could potentially compromise Apigee's commitments to its customers or Apigee customers' commitments to their customers.

Apigee does not share the full BCP/DR plan with any individual customer, as it contains Apigee internal sensitive information and references to our customers. Our privacy policy prevents sharing the platform BCP/DR plan with individual customers that could potentially expose other customer names. We offer this same level of privacy to each customer.

BCP/DR Management

Apigee Information Security team is responsible for the oversight of the Business Resiliency program while a rotating Incident Commander is responsible for management and resolution of all incidents. The Incident Commander has operational and engineering personnel on call at all times along with playbooks for all actions that may need to be taken.

BCP/DR Testing

Apigee performs operational processes that support BCP/DR testing of the platform on a more frequent cadence than our full annual BCP/DR tabletop testing. Each month Apigee performs load swings from our live/live environment while we perform updates to the systems running the service. This process involves taking down one entire data center's worth of systems while the load is handled by the peer datacenter. During this process, after any updates are performed, the first data center is brought back up and services are run live/live again to verify that no issues were introduced. Then the peer datacenter is brought down for the same updates and then brought back online again. Apigee uses tools and techniques to drain traffic and send a small percentage of traffic to recently updated services to check for any issues or errors before going back to full load processing.

This consistent operational process exceeds industry-standard bi-annual resiliency "testing" of our service by making it an operational task that occurs more frequently.

In addition to the operational processes described above, Apigee also conducts tabletop BCP/DR exercises at least once annually where engineering and operations team members are brought together with other Apigee business units to logically simulate and walk through issues, responses, and the impact of decisions made in a mock disaster scenario. This provides additional training and experience for our personnel on our larger BCP/DR plans for the enterprise as a whole in addition to the service itself.

The BCP/DR testing done by Apigee does not use "failover exercises" or "secondary locations" because all of that is built into the running system.

Apigee does maintain Playbooks for use by all operational and engineering teams. These playbooks are reviewed and updated at least annually and used in all of our BCP/DR testing and training exercises.

Apigee does not share BCP/DR test reports with individual customers, because these tests are done at a platform level, not a customer level. We share the results of our operational tasks and annual tabletop exercise test reports with our third-party auditors, and these form the basis for the auditor's review of our compliance with PCI, HIPAA, contractual, and other requirements.

Customer BCP/DR tests

Customers are encouraged to have their own DR plans incorporate Apigee Edge services. Customer can and should consider how Apigee can redirect traffic as needed for customers to maintain end-user services even during a customer data center outage or other disaster event. However, this level of testing is outside the scope of the Apigee DR plan. We encourage customers to perform BCP/DR testing on their own applications and include Apigee Edge in the test.

RTO/RPO

Apigee does not have recovery point and recovery time objectives (RPO/RTO) for our customers or in our contracts related to BCP/DR activities. Our SLAs are the cloud equivalent of the RTO/RPO data points. Because Apigee is a redundant cloud based service with both management and runtime services being architected with redundant live services, RTO and RPO can both be seen as ‘real-time’. Single region customers receive a minimum of redundant services in different datacenters with the same region. Customers desiring higher levels of redundancy can opt for multi-region services.

Pandemic plan

Apigee includes a pandemic plan as part of our overall BCP/DR plan and processes. Because Apigee is a cloud hosted service, there is no requirement for individuals to manage the data center. For business operations such as support, Apigee operates a 24x7 global support team across multiple offices and remote locations. If a pandemic in one area of the globe impacts one of our support locations, personnel in other offices will be alerted and cover the shifts normally handled by the impacted office. For other business services such as sales, the workforce is globally distributed. All teams at Apigee are equipped to work remotely if needed. Tools used within Apigee are cloud-based and lend themselves naturally to a pandemic response plan.

Updates

Apigee reviews and updates our BCP/DR plan at least annually. Information gathered from incidents, product changes, industry standards, risk analysis activities, and BCP/DB testing are used to update the plan.

Business Impact Analysis and Risk Assessments

Google conducts a business impact analysis and a Risk assessment annually. Results of the BIA and the RA are prioritized and documented in the issue tracking system.