Multi-region deployment on GKE and GKE on-prem

This topic describes a multi-region deployment for Apigee hybrid on GKE and Anthos GKE deployed on-prem.

Topologies for multi-region deployment include the following:

  • Active-Active: When you have applications deployed in multiple geographic locations and you require low latency API response for your deployments. You have the option to deploy hybrid in multiple geographic locations nearest to your clients. For example: US West Coast, US East Coast, Europe, APAC.
  • Active-Passive: When you have a primary region and a failover or disaster recovery region.

The regions in a multi-region hybrid deployment communicate via Cassandra, as the following image shows:

Load balancing the MART connection

Each regional cluster must have its own MART IP and hostname; however, you only need to connect the management plane to one of them. Cassandra propagates information to all of the clusters. The best option for high availability for MART is to load balance the individual MART IP addresses and configure your organization to talk to the load balanced MART URL.

Prerequisites

Before configuring hybrid for multiple regions, you must complete the following prerequisites:

  • Set up Kubernetes clusters in multiple regions with different CIDR blocks
  • Set up cross-region communication
  • Whitelist Cassandra ports 7000 and 7001 between Kubernetes clusters across all regions (7000 may be used as a backup option during troubleshooting). See also Configure ports.

For detailed information, see Kubernetes documentation.

Configure the multi-region seed host

This section describes how to expand the existing Cassandra cluster to a new region. This setup allows the new region to bootstrap the cluster and join the existing data center. Without this configuration, the multi-region Kubernetes clusters would not know about each other.

  1. Run the following kubectl command to identify a seed host address for Cassandra in the current region.

    A seed host address allows a new regional instance to find the original cluster on the very first startup to learn the topology of the cluster. The seed host address is designated as the contact point in the cluster.

    kubectl get pods -o wide -n apigee
    
    NAME                      READY   STATUS      RESTARTS   AGE   IP          NODE                                          NOMINATED NODE
    apigee-cassandra-0        1/1     Running     0          5d    10.0.0.11   gke-k8s-dc-2-default-pool-a2206492-p55d
    apigee-cassandra-1        1/1     Running     0          5d    10.0.2.4    gke-k8s-dc-2-default-pool-e9daaab3-tjmz
    apigee-cassandra-2        1/1     Running     0          5d    10.0.3.5    gke-k8s-dc-2-default-pool-e589awq3-kjch
  2. Decide which of the IPs returned from the previous command will be the multi-region seed host.
  3. The configuration in this step depends on whether you are on GKE or GKE on-prem:

    GKE Only: In data center 2, configure cassandra.multiRegionSeedHost and cassandra.datacenter in overrides.yaml, where multiRegionSeedHost is one of the IPs returned by the previous command:

    cassandra:
      multiRegionSeedHost: seed_host_IP
      datacenter: data_center_name
      rack: rack_name

    For example:

    cassandra:
      multiRegionSeedHost: 10.0.0.11
      datacenter: "dc-2"
      rack: "ra-1"

    GKE on-prem Only: In data center 2, configure cassandra.multiRegionSeedHost in your overrides file, where multiRegionSeedHost is one of the IPs returned by the previous command:

    cassandra:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      multiRegionSeedHost: seed_host_IP
    

    For example:

    cassandra:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      multiRegionSeedHost: 10.0.0.11
    

  4. In the new data center/region, before you install hybrid, set the same TLS certificates and credentials in overrides.yaml as you set in the first region.

Set up the new region

After you configure the seed host, you can set up the new region.

To set up the new region:

  1. Copy your certificate from the existing cluster to the new cluster. The new CA root is used by Cassandra and other hybrid components for mTLS. Therefore, it is essential to have consistent certificates across the cluster.
    1. Set the context to the original namespace:
      kubectl config use-context original-cluster-name
    2. Export the current namespace configuration to a file:
      $ kubectl get namespace  -o yaml > apigee-namespace.yaml
    3. Export the apigee-ca secret to a file:
      kubectl -n cert-manager get secret apigee-ca -o yaml > apigee-ca.yaml
    4. Set the context to the new region's cluster name:
      kubectl config use-context new-cluster-name
    5. Import the namespace configuration to the new cluster. Be sure to update the "namespace" in the file if you're using a different namespace in the new region:
      kubectl apply -f apigee-namespace.yaml
    6. Import the secret to the new cluster:

      kubectl -n cert-manager apply -f apigee-ca.yaml
  2. Install hybrid in the new region. Be sure that the overrides-DC_name.yaml file includes the same TLS certificates that are configured in the first region, as explained in the previous section.

    Execute the following two commands to install hybrid in the new region:

    apigeectl init -f overrides/overrides-DC_name.yaml
    apigeectl apply -f overrides/overrides-DC_name.yaml
  3. Expand all apigee keyspaces.

    The following steps expand the Cassandra data to the new data center:

    1. Open a shell in the Cassandra pod:
      kubectl run -i --tty --restart=Never --rm --image google/apigee-hybrid-cassandra-client:1.0.0 cqlsh
    2. Connect to the Cassandra server:
      cqlsh apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local -u ddl_user --ssl
      Password:
      
      Connected to apigeecluster at apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local:9042.
      [cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
      Use HELP for help.
    3. Get the available keyspaces:
      SELECT * from system_schema.keyspaces ;
       keyspace_name              | durable_writes | replication
      ----------------------------+----------------+--------------------------------------------------------------------------------------------------------
                      system_auth |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
                    system_schema |           True |                                                {'class': 'org.apache.cassandra.locator.LocalStrategy'}
       cache_hybrid_test_7_hybrid |           True |                  {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3'}
         kms_hybrid_test_7_hybrid |           True |                  {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3'}
         kvm_hybrid_test_7_hybrid |           True |                  {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3'}
               system_distributed |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
                           system |           True |                                                {'class': 'org.apache.cassandra.locator.LocalStrategy'}
                           perses |           True |                  {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3'}
       quota_hybrid_test_7_hybrid |           True |                  {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3'}
                    system_traces |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
      
      (10 rows)
    4. Update/expand the apigee keyspaces:
      ALTER KEYSPACE cache_hybrid_test_7_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
      ALTER KEYSPACE kms_hybrid_test_7_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
      ALTER KEYSPACE kvm_hybrid_test_7_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
      ALTER KEYSPACE perses WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
      ALTER KEYSPACE quota_hybrid_test_7_hybrid  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
    5. Validate the keyspace expansion:
      SELECT * from system_schema.keyspaces ;
       keyspace_name              | durable_writes | replication
      ----------------------------+----------------+--------------------------------------------------------------------------------------------------------
                      system_auth |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
                    system_schema |           True |                                                {'class': 'org.apache.cassandra.locator.LocalStrategy'}
       cache_hybrid_test_7_hybrid |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3', 'dc-2': '3'}
         kms_hybrid_test_7_hybrid |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3', 'dc-2': '3'}
         kvm_hybrid_test_7_hybrid |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3', 'dc-2': '3'}
               system_distributed |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
                           system |           True |                                                {'class': 'org.apache.cassandra.locator.LocalStrategy'}
                           perses |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3', 'dc-2': '3'}
       quota_hybrid_test_7_hybrid |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '3', 'dc-2': '3'}
                    system_traces |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'dc-1': '1', 'dc-2': '1'}
      
      (10 rows)
      ddl@cqlsh>
  4. Run nodetool rebuild sequentially on all the pods in new data center. This may take a few minutes to a few hours depending on the data size.
    kubectl exec apigee-cassandra-0 -n apigee  -- nodetool rebuild -- dc-1
  5. Verify the rebuild processes from the logs. Also, verify the data size using the nodetool status command:
    kubectl logs apigee-cassandra-0 -f -n apigee

    The following example shows example log entries:

    INFO  01:42:24 rebuild from dc: dc-1, (All keyspaces), (All tokens)
    INFO  01:42:24 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Executing streaming plan for Rebuild
    INFO  01:42:24 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Starting streaming to /10.12.1.45
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889, ID#0] Beginning stream session with /10.12.1.45
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Starting streaming to /10.12.4.36
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889 ID#0] Prepare completed. Receiving 1 files(0.432KiB), sending 0 files(0.000KiB)
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Session with /10.12.1.45 is complete
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889, ID#0] Beginning stream session with /10.12.4.36
    INFO  01:42:25 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Starting streaming to /10.12.5.22
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889 ID#0] Prepare completed. Receiving 1 files(0.693KiB), sending 0 files(0.000KiB)
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Session with /10.12.4.36 is complete
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889, ID#0] Beginning stream session with /10.12.5.22
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889 ID#0] Prepare completed. Receiving 3 files(0.720KiB), sending 0 files(0.000KiB)
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] Session with /10.12.5.22 is complete
    INFO  01:42:26 [Stream #3a04e810-580d-11e9-a5aa-67071bf82889] All sessions completed
  6. Update the seed hosts. Remove multiRegionSeedHost: 10.0.0.11 from overrides-DC_name.yaml and reapply.
    apigeectl apply -f overrides/overrides-DC_name.yaml

Check the Cassandra cluster status

The following command is useful to see if the cluster setup is successful in two data centers. The command checks the nodetool status for the two regions.

kubectl exec apigee-cassandra-0 -n apigee -- nodetool status


Datacenter: us-central1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.12.1.45  112.09 KiB  256          100.0%            3c98c816-3f4d-48f0-9717-03d0c998637f  ra-1
UN  10.12.4.36  95.27 KiB  256          100.0%            0a36383d-1d9e-41e2-924c-7b62be12d6cc  ra-1
UN  10.12.5.22  88.7 KiB   256          100.0%            3561f4fa-af3d-4ea4-93b2-79ac7e938201  ra-1
Datacenter: us-west1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.4.33   78.69 KiB  256          0.0%              a200217d-260b-45cd-b83c-182b27ff4c99  ra-1
UN  10.0.0.21   78.68 KiB  256          0.0%              9f3364b9-a7a1-409c-9356-b7d1d312e52b  ra-1
UN  10.0.1.26   15.46 KiB  256          0.0%              1666df0f-702e-4c5b-8b6e-086d0f2e47fa  ra-1