Roll back Apigee Edge 4.53.00

If you encounter an error during an update to Edge 4.53.00, you can roll back the component that caused the error and then try the update again.

You can roll back Edge 4.53.00 to the following minor release version:

  • Version 4.52.02

Rolling back a version involves rolling back every component that you may have upgraded. Additionally, you should take special considerations into account when rolling back Cassandra to version 4.52.02.

There are two scenarios where you might want to perform a rollback:

  1. Roll back to a previous major or minor release. For example from 4.53.00 to 4.52.02.
  2. Roll back to a previous patch release in the same release. For example, from 4.53.00.01 to 4.53.00.00.

For more information, see Apigee Edge release process.

Order of rollback

Rollback of components should be done in the reverse order they were upgraded, with the exception that management servers should be rolled back after Cassandra.

A typical general order of rollback for Private Cloud 4.53.00 will look like below:

  1. Rollback Postgres, Qpid, and other analytics-related components
  2. Rollback Routers and Message Processors
  3. Rollback Cassandra, Zookeeper
  4. Rollback Management server

For example, let’s say you had upgraded the entire Cassandra cluster, all your management servers, and a few RMPs to version 4.53.00 from version 4.52.02 and wish to rollback. In this case, you would:

  1. Rollback all RMPs one by one
  2. Rollback the entire Cassandra cluster using backups
  3. Rollback Edge Management server nodes one by one

Who can perform a rollback

The user performing a rollback should be the same as the user who originally updated Edge, or a user running as root.

By default, Edge components run as the user "apigee". In some cases, you might be running Edge components as different users. For example, if the Router has to access privileged ports, such as those below 1000, then you have to run the Router as root or as a user with access to those ports. Or, you might run one component as one user, and another component as another user.

Components with common code

The following Edge components share common code. Therefore, to roll back any one of these components on a node, you must roll back all of these components that are on that node.

  • edge-management-server (Management Server)
  • edge-message-processor (Message Processor)
  • edge-router (Router)
  • edge-postgres-server (Postgres Server)
  • edge-qpid-server (Qpid Server)

For example, if you have the Management Server, Router, and Message Processor installed on the node, to roll back any one of them you must roll back all three.

Rollback of Cassandra

Rollback of Cassandra

When a major upgrade of Cassandra is performed on a specific node, Cassandra modifies the schema of the data stored on that node. As a result, a direct in-place rollback is not feasible.

Rollback scenarios

Cassandra 4.0.X, available with Edge for Private Cloud 4.53.00, is compatible with other components of Private Cloud 4.52.02.

Please refer to the table below for a summary of the various rollback strategies you can use:

Scenario Rollback strategy
Single DC, some Cassandra nodes upgraded Use backups
Single DC, all Cassandra nodes upgraded Do not rollback Cassandra. Other components can be rolled back.
Single DC, all nodes (Cassandra and others) upgraded Do not rollback Cassandra. Other components can be rolled back.
Multiple DC, some nodes in one DC upgraded Rebuild from existing DC
Multiple DC, all Cassandra nodes in some DCs upgraded Rebuild from existing DC
Multiple DC, Cassandra nodes of the last DC being upgraded Try to finish the upgrade. If not feasible, rollback 1 DC using backup. Rebuild remaining DCs from the rolled-back DC.
Multiple DC, all Cassandra nodes upgraded Do not rollback Cassandra. Other components can be rolled back.
Multiple DC, all nodes (Cassandra and others) upgraded Do not rollback Cassandra. Other components can be rolled back.

General considerations

When considering a rollback, keep the following in mind:

  • Rollback of runtime or management components: If you want to rollback components like edge-management-server, edge-message-processor, or any non-Cassandra component to Private Cloud version 4.52.02, it is recommended that you do NOT rollback Cassandra. Cassandra shipped with Private Cloud 4.53.00 is compatible with all non-Cassandra components of Edge for Private Cloud 4.52.02. You can rollback non-Cassandra components using the methodology listed here while Cassandra remains on version 4.0.13.
  • Rollback after the entire Cassandra cluster is upgraded to 4.0.X: If your entire Cassandra cluster is upgraded to version 4.0.X as part of the upgrade to Private Cloud version 4.53.00, it is recommended that you continue with this cluster setup and NOT rollback Cassandra. Components like edge-management-server, edge-message-processor, edge-router, etc., of Private Cloud version 4.52.02 are compatible with Cassandra version 4.0.X.
  • Rollback of Cassandra during the Cassandra upgrade: If you encounter issues during the Cassandra upgrade, you may want to consider a rollback. The rollback strategies listed in this article can be followed based on the state you are in during the upgrade process.
  • Rollback using backups: Backups taken from Cassandra 4.0.X are not compatible with backups of Cassandra 3.11.X. To rollback Cassandra using backup restoration, you must take backups of Cassandra 3.11.X before attempting the upgrade.

Rollback Cassandra using rebuild

Prerequisites

  • You are operating an Edge for Private Cloud 4.52.02 cluster across multiple data centers.
  • You are in the process of upgrading Cassandra from 3.11.X to 4.0.X and have encountered issues during the upgrade.
  • You have at least one fully functional data center in the cluster still running the older version of Cassandra (Cassandra 3.11.X).

This procedure relies on streaming data from an existing data center. It could take a significant amount of time, depending on how much data is stored in Cassandra. You should be prepared to divert your runtime traffic away from this data center while the rollback is ongoing.

High-level steps

  1. Select one data center (either partially or fully upgraded) that you’d like to roll back. Divert runtime traffic to a different functioning data center.
  2. Identify the seed node in the data center and start with one of the seed nodes.
  3. Stop, uninstall, and clean up the Cassandra node.
  4. Install the older version of Cassandra on the node and configure it as needed.
  5. Remove the extra configurations that were added earlier.
  6. Repeat the above steps for all seed nodes in the data center, one by one.
  7. Repeat the above steps for all remaining Cassandra nodes in the data center, one by one.
  8. Rebuild the nodes from the existing functional data center, one by one.
  9. Restart all edge-* components in the data center that are connected to Cassandra.
  10. Test and divert traffic back to this data center.
  11. Repeat the steps for each data center, one by one.

Detailed steps

  1. Pick one data center where all or some Cassandra nodes are upgraded. Divert all runtime proxy traffic and management traffic from this data center while the Cassandra nodes in this data center are being rolled back. Ensure all Cassandra nodes are in the UN (Up/Normal) state when the nodetool ring command is executed on the nodes. If certain nodes are down, troubleshoot the issue and bring those nodes back up before continuing.

    See the example below:

    /opt/apigee/apigee-cassandra/bin/nodetool status
    Datacenter: dc-1
    ================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  DC1-1IP1  456.41 KiB  1            100.0%            78fc4ddd-2ed9-4a8c-98a2-63a38c2f1920  ra-1
    UN  DC1-1IP2  870.93 KiB  1            100.0%            160db01a-64ab-43a7-b9ea-3b7f8f66d52b  ra-1
    UN  DC1-1IP3  824.08 KiB  1            100.0%            21d61543-d59e-403a-bf5d-bfe7f664baa6  ra-1
    Datacenter: dc-2
    ================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  DC2-1IP1   802.08 KiB  1            100.0%            583e0576-336d-4ce7-9729-2ae74e0abde2  ra-1
    UN  DC2-1IP2   844.4 KiB   1            100.0%            fef794d5-f4c2-4a4e-bb05-9adaeb4aea4b  ra-1
    UN  DC2-1IP3   878.12 KiB  1            100.0%            3894b3d9-1f5a-444d-83db-7b1e338bbfc9  ra-1

    You can run nodetool describecluster on the nodes to understand the current state of the entire cluster. For example, the following shows an instance of a 2-data-center cluster where all DC-1 nodes are on Cassandra version 4, whereas all DC-2 nodes are on Cassandra version 3:

    # On nodes where Cassandra is upgraded
    /opt/apigee/apigee-cassandra/bin/nodetool describecluster
    Cluster Information:
        Name: Apigee
        Snitch: org.apache.cassandra.locator.PropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.RandomPartitioner
        Schema versions:
            2eadcd74-0245-309a-9992-3625afa70038: [DC-1-IP1, DC-1-IP2, DC-1-IP3]
            129dc15e-198e-3c11-b64c-701044a3a1ad: [DC-2-IP1, DC-2-IP2, DC-2-IP3]
    
    Stats for all nodes:
        Live: 6
        Joining: 0
        Moving: 0
        Leaving: 0
        Unreachable: 0
    
    Data Centers:
        dc-1 #Nodes: 3 #Down: 0
        dc-2 #Nodes: 3 #Down: 0
    
    Database versions:
        4.0.13: [DC-1-IP1:7000, DC-1-IP2:7000, DC-1-IP3:7000]
        3.11.16: [DC-2-IP1:7000, DC-2-IP2:7000, DC-2-IP3:7000]
    
    Keyspaces:
        system_schema -> Replication class: LocalStrategy {}
        system -> Replication class: LocalStrategy {}
        auth -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        cache -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        devconnect -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        dek -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        user_settings -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        apprepo -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        kms -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        identityzone -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        audit -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        analytics -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        keyvaluemap -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        counter -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        apimodel_v2 -> Replication class: NetworkTopologyStrategy {dc-2=3, dc-1=3}
        system_distributed -> Replication class: SimpleStrategy {replication_factor=3}
        system_traces -> Replication class: SimpleStrategy {replication_factor=2}
        system_auth -> Replication class: SimpleStrategy {replication_factor=1}
    
    # On nodes where Cassandra is not upgraded
    /opt/apigee/apigee-cassandra/bin/nodetool describecluster
    Cluster Information:
        Name: Apigee
        Snitch: org.apache.cassandra.locator.PropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.RandomPartitioner
        Schema versions:
            2eadcd74-0245-309a-9992-3625afa70038: [DC-1-IP1, DC-1-IP2, DC-1-IP3]
            129dc15e-198e-3c11-b64c-701044a3a1ad: [DC-2-IP1, DC-2-IP2, DC-2-IP3]
            
  2. Identify the seed nodes in the data center: Refer to the section How to identify seed nodes in the Appendix. Execute the steps below on one of the seed nodes:
  3. Stop, uninstall, and clean up data from the node of Cassandra. Pick the first seed node on Cassandra version 4 in this data center. Stop it.
    # Stop Cassandra service on the node
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra stop
    
    # Uninstall Cassandra software
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra uninstall
    
    # Wipe out Cassandra data
    rm -rf /opt/apigee/data/apigee-cassandra
            
  4. Install the older Cassandra software on the node and set some configurations. Execute the bootstrap file of Edge for Private Cloud 4.52.02.
  5. # Download bootstrap of 4.52.02
    curl https://software.apigee.com/bootstrap_4.52.02.sh -o /tmp/bootstrap_4.52.02.sh -u uName:pWord
    
    # Execute bootstrap of 4.52.02
    sudo bash /tmp/bootstrap_4.52.02.sh apigeeuser=uName apigeepassword=pWord
        

Set Cassandra configs

  1. Create or edit the file /opt/apigee/customer/application/cassandra.properties.
  2. Add the following contents to the file. ipOfNode is the IP address of the node that Cassandra uses to communicate with other Cassandra nodes:
    conf_jvm_options_custom_settings=-Dcassandra.replace_address=ipOfNode -Dcassandra.allow_unsafe_replace=true
  3. Ensure the file is owned and readable by the apigee user:
    chown apigee:apigee /opt/apigee/customer/application/cassandra.properties
  4. Install and set up Cassandra:
    • Install Cassandra version 3.11.X:
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra install
    • Set up Cassandra by passing the standard configuration file:
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra setup -f configFile
    • Ensure that Cassandra 3.11.X is installed and the service is running:
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra version
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra status
  5. Verify that the node has started. Check the following command on this node and other nodes in the cluster. The node should report that it is in the "UN" (Up/Normal) state:
    /opt/apigee/apigee-cassandra/bin/nodetool status
  6. Remove the extra configurations added earlier from the file /opt/apigee/customer/application/cassandra.properties.
  7. Repeat steps 3 to 6 on all Cassandra seed nodes in the data center, one by one.
  8. Repeat steps 3 to 6 on all remaining Cassandra nodes in the data center, one by one.
  9. Rebuild all the nodes in the data center from a data center running the older Cassandra version. Perform this step one node at a time:
    /opt/apigee/apigee-cassandra/bin/nodetool rebuild -dc <name of working DC>
    This procedure may take some time. You can adjust the streamingthroughput if necessary. Check the status using:
    /opt/apigee/apigee-cassandra/bin/nodetool netstats
  10. Restart all edge-* components in the data center, one by one:
    /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
    /opt/apigee/apigee-service/bin/apigee-service edge-router restart
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
    /opt/apigee/apigee-service/bin/apigee-service edge-qpid-server restart
    /opt/apigee/apigee-service/bin/apigee-service edge-postgres-server restart
  11. Validate and divert traffic back to this data center. Run some validations for runtime traffic and management APIs in this data center, and start rerouting proxy and management API traffic back to it.
  12. Repeat the above steps for each data center you want to roll back.

Rollback Cassandra using Backup

Prerequisites

  1. You are in the process of upgrading Cassandra from 3.11.X to 4.0.X and have encountered issues during the upgrade.
  2. You have backups for the node you are rolling back. The backup was taken before the upgrade from 3.11.X to 4.0.X was attempted.

Steps

  1. Select one node you want to roll back. If you are rolling back all nodes in a data center using backups, start with the seed nodes first. Refer to the section "How to Identify Seed Nodes" in the Appendix.

  2. Stop, uninstall, and clean up the Cassandra node:

    # Stop Cassandra service on the node
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra stop
    
    # Uninstall Cassandra software
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra uninstall
    
    # Wipe Cassandra data
    rm -rf /opt/apigee/data/apigee-cassandra
  3. Install the older Cassandra software on the node and configure it:

    • Execute the bootstrap file for Edge for Private Cloud 4.52.02:
    • # Download bootstrap for 4.52.02
      curl https://software.apigee.com/bootstrap_4.52.02.sh -o /tmp/bootstrap_4.52.02.sh -u uName:pWord
      
      # Execute bootstrap for 4.52.02
      sudo bash /tmp/bootstrap_4.52.02.sh apigeeuser=uName apigeepassword=pWord
    • Create or edit the file /opt/apigee/customer/application/cassandra.properties:
    • conf_jvm_options_custom_settings=-Dcassandra.replace_address=ipOfNode -Dcassandra.allow_unsafe_replace=true
    • Ensure the file is owned by the apigee user and is readable:
    • chown apigee:apigee /opt/apigee/customer/application/cassandra.properties
    • Install and set up Cassandra:
    • # Install Cassandra version 3.11.X
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra install
      
      # Set up Cassandra with the standard configuration file
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra setup -f configFile
      
      # Verify Cassandra version and check service status
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra version
      /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra status

    Verify that the node has started. Check the following command on this node and other nodes in the cluster. Nodes should report that this node is in the "UN" state:

    /opt/apigee/apigee-cassandra/bin/nodetool status
  4. Stop the Cassandra service and restore the backup. Refer to the backup and restore documentation for more details:

    # Stop Cassandra service on the node
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra stop
    
    # Wipe the data directory in preparation for restore
    rm -rf /opt/apigee/data/apigee-cassandra/data
    
    # Restore the backup taken before the upgrade attempt
    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra restore backupFile
            
  5. Once the backup is restored, remove the additional configurations:

    Remove the configuration added earlier from the file /opt/apigee/customer/application/cassandra.properties.

  6. Start the Cassandra service on the node:

    /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra start
  7. Repeat the steps on each Cassandra node you wish to roll back using backups, one at a time.

  8. Once all Cassandra nodes are restored, restart all edge-* components one by one:

    /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
    /opt/apigee/apigee-service/bin/apigee-service edge-router restart
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
    /opt/apigee/apigee-service/bin/apigee-service edge-qpid-server restart
    /opt/apigee/apigee-service/bin/apigee-service edge-postgres-server restart
            

Backup optimizations (advanced option)

You can potentially minimize (or eliminate) data loss while restoring backups if you have replicas available that contain the latest data. If replicas are available, after restoring the backup, run a repair on the node that was restored.

Appendix

How to identify seed nodes

On any Cassandra node in a data center, run the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-cassandra configure -search conf_cassandra_seeds

The command will output multiple lines. Look for the last line of the output. The IP addresses listed in the last line are the seed nodes. In the example below, DC-1-IP1, DC-1-IP2, DC-2-IP1, and DC-2-IP2 are the seed node IPs:

Found key conf_cassandra_seeds, with value, "127.0.0.1", in /opt/apigee/apigee-cassandra/token/default.properties

Found key conf_cassandra_seeds, with value, 127.0.0.1, in /opt/apigee/apigee-cassandra/token/application/cassandra.properties

Found key conf_cassandra_seeds, with value, "DC-1-IP1, DC-1-IP2, DC-2-IP1, DC-2-IP2", in /opt/apigee/token/application/cassandra.properties
apigee-configutil: apigee-cassandra: # OK

Roll back to a previous major or minor release

To roll back to a previous major or minor release, do the following on each node that hosts the component:

  1. Download the bootstrap.sh file for the version to which you want to roll back:

    • To roll back to 4.52.02, download bootstrap_4.52.02.sh:
      curl https://software.apigee.com/bootstrap_4.52.02.sh -o /tmp/bootstrap_4.52.02.sh 
  2. Stop the component to roll back:
    1. To roll back any of the components with common code on the node, you must stop them all, as the following example shows:
      /opt/apigee/apigee-service/bin/apigee-service edge-management-server stop
      /opt/apigee/apigee-service/bin/apigee-service edge-router stop
      /opt/apigee/apigee-service/bin/apigee-service edge-message-processor stop
      /opt/apigee/apigee-service/bin/apigee-service edge-qpid-server stop
      /opt/apigee/apigee-service/bin/apigee-service edge-postgres-server stop
    2. To roll back any other component on the node, stop just that component:
      /opt/apigee/apigee-service/bin/apigee-service component stop
  3. If you are rolling back Monetization, uninstall it from all Management Server and Message Processor nodes:
    /opt/apigee/apigee-service/bin/apigee-service edge-mint-gateway uninstall
  4. Uninstall the component to roll back on the node:
    1. To roll back any of the components with common code on the node, you must uninstall them all by uninstalling the edge-gateway component group, as the following example shows:
      /opt/apigee/apigee-service/bin/apigee-service edge-gateway uninstall
    2. To roll back any other component on the node, uninstall just that component, as the following example shows:
      /opt/apigee/apigee-service/bin/apigee-service component uninstall

      Where component is the component name.

    3. To roll back the Edge Router, you must delete the contents of the /opt/nginx/conf.d file in addition to uninstalling the edge-gateway component group:
      cd /opt/nginx/conf.d
      rm -rf *
  5. Uninstall the 4.53.00 version of apigee-setup:
    /opt/apigee/apigee-service/bin/apigee-service apigee-setup uninstall
  6. Install the 4.52.02 version of the apigee-service utility and its dependencies. The following example installs the 4.52.02 version of the apigee-service:
    sudo bash /tmp/bootstrap_4.52.02.sh apigeeuser=uName apigeepassword=pWord

    Where uName and pWord are the username and password you received from Apigee. If you omit pWord, you will be prompted to enter it.

    If you get an error, be sure you downloaded the bootstrap.sh file in step 1.

  7. Install apigee-setup:
    /opt/apigee/apigee-service/bin/apigee-service apigee-setup install
  8. Install the older version of the component:
    /opt/apigee/apigee-setup/bin/setup.sh -p component -f configFile

    Where component is the component to install and configFile is your configuration file for the older version.

  9. If you are rolling back Qpid, flush iptables:
    sudo iptables -F
  10. Repeat this process for each node that hosts the component you are rolling back.

Roll back to a previous patch release

To roll back a component to a specific patch release, do the following on each node that hosts the component:

  1. Download the specific component version:
    /opt/apigee/apigee-service/bin/apigee-service component_version install

    Where component_version is the component and patch release to install. For example:

    /opt/apigee/apigee-service/bin/apigee-service edge-ui-4.53.00-0.0.20254 install

    If you are using the Apigee online repo, you can determine the available component versions by using the following command:

    yum --showduplicates list comp

    For example:

    yum --showduplicates list edge-ui
  2. Use apigee-setup to install the component:
    /opt/apigee/apigee-setup/bin/setup.sh -p comp -f configFile

    For example:

    /opt/apigee/apigee-setup/bin/setup.sh -p ui -f configFile

    Note that you specify only the component name when you install it, not the version.

  3. Repeat this process for each node that hosts the component you are rolling back.

Roll back mTLS

To roll back the mTLS update, do the following steps on all hosts:

  1. Stop Apigee:
    apigee-all stop
  2. Stop mTLS:
    apigee-service apigee-mtls uninstall
  3. Reinstall mTLS:
    apigee-service apigee-mtls install
    apigee-service apigee-mtls setup -f /opt/silent.conf