Cassandra backup and recovery

This topic discusses how to configure backups and perform data recovery for the Cassandra database that runs in the Apigee hybrid runtime plane.

What you need to know about Cassandra backups

  • Backups for Cassandra are not enabled by default for hybrid.
  • Cassandra is a replicated database. For hybrid, it is configured to have at least 3 copies of the data in each region or data center.
  • Cassandra uses streaming replication and read repairs to maintain the data replicas in each region or data center at any given point.

Why do you need backups?

You need Cassandra backups to recover data primarily in case the data is accidentally deleted.

What is backed up?

When you configure Cassandra backups, as described in this topic, the following is backed up:

  • Cassandra schema including the user schema (apigee keyspace definitions)
  • Cassandra partition token information (per node)
  • A snapshot of the Cassandra data

Where is backup data stored?

As explained in this topic, you must first create a Google Cloud Storage (GCS) storage bucket for your backup data.

Setting up Cassandra backups

This section explains how to configure Cassandra backups for your hybrid runtime plane. This configuration uses Google Cloud Storage (GCS) to store the backed up data.

Prerequisites

To set up your backups, you must have or create:

  • A GCS storage bucket. You will need the storage bucket name and storage class to store the backups.
  • A GCP service account with permission to write to the GCS storage bucket.
  • Your Cassandra database username/password. This credential is required to dump the database schema. Apigee recommends that you use the admin_user and password credentials you created when setting up Cassandra TLS. See Configuring TLS for Cassandra.

Detailed steps for scheduling backups

To schedule Cassandra backups, do the following:

  1. Create a GCP service account with the roles/storage.objectAdmin role. You can create the service account using the hybrid CLI command create-service-account:.
    ./tools/create-service-account my-cassandra-svc-account apigee-cassandra
    For more information about GCP service accounts, see Creating and managing service accounts.
  2. The create-service-account command saves a key on your system as a .json file. Note the path to the file. You will need the path in the following steps.
  3. Create a GCS bucket. For example, apigee_cassandra_backup
  4. Set the data retention policy. Apigee recommends a period of 15 days.
  5. Open your overrides.yaml file.
  6. Backup is disabled by default. Enable it with this configuration:
    cassandra:
      enableBackup: true
  7. Make the following configurations for backup:
    cassandra:
      enableBackup: true
      backup:
        serviceAccountPath: sa_json_file_path
        schedule: backup_schedule_code
        dbStorageBucket: gcs_bucket_path
        user: cassandra_backup_username
        password: cassandra_backup_password
    where:
    • serviceAccountPath: The path on your filesystem to the service account JSON file that you just downloaded from GCP.
    • schedule: The time when the backup starts. Default: 0 2 * * * ( In the default case, backup starts once every day on the 2nd hour of the day)
    • dbStorageBucket: GCS storage bucket path in this format: gs://bucket_name. The gs:// is required.
    • user: Cassandra backup user's username. If not provided, reuses the value from cassandra.auth.admin.user
    • password: Cassandra backup user's password. If not provided, reuses the value from cassandra.auth.admin.password
  8. Apply the configuration changes. For example:
    apply all -c 2_cassandra -v beta

Restoring backups

Restoration takes the files from the backup location with a timestamp label and restores them into a new Cassandra cluster with the same number of pods. The new Cassandra should have the same pod name as the old one. See also Configure Cassandra for creating and configuring a new Cassandra ring.

Detailed steps for restoring backups

To restore Cassandra backups:

  1. Create a GCP service account. For example, you could name it dbbackup_svc.
  2. Grant permissions to the service account for viewing storage objects.
  3. Execute this command to generate a secret using the downloaded key file. You can use the same service account key file that you generated when you configured the backup:
     kubectl create secret generic dbbackup-credentials \
        --from-file=dbbackup_key.json=your_downloaded-backup-service-credentials.json
  4. Upload the Cassandra database username/password to use for dumping the cassandra schema.
     kubectl create secret generic apigee-dba-credentials \
      --from-literal=username=cassandra --from-literal=password=cassandra
  5. Create a config file called dbrestore-job.yaml as follows:
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: dbbackup
      namespace: default
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: dbbackup-clusterrole
      namespace: default
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "list"]
    - apiGroups: [""]
      resources: ["pods/exec"]
      verbs: ["create"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: dbbackup-clusterrole-binding
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: dbbackup-clusterrole
    subjects:
    - kind: ServiceAccount
      name: dbbackup
      namespace: default
      apiGroup: ""
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: dbrestore
      namespace: default
    spec:
      template:
        spec:
          serviceAccountName: dbbackup
          containers:
            - name: dbrestore
              command:
                - /bin/bash
                - -c
                - /usr/bin/RestoreDBMaster.sh
              image:gcr.io/APIGEE-HYBRID-ALPHA/apigee-cassandra-backup-utility:beta
              imagePullPolicy: Always
              env:
                - name: CASSANDRA_CLUSTER_NAME
                  value: apigeecluster
                - name: CASSANDRA_DC
                  value: dc-1
                - name: APIGEE_CLOUDPROVIDER
                  value: "GCP"
                - name: DBSTORAGE_BUCKET
                  value: "gs://your-project-dbbackup"
                - name: BACKUP_SNAPSHOT_TIMESTAMP
                  #the dataset label you want to restore
                  value: "dataset_label"
                - name: CASSANDRA_DB_USER
                  valueFrom:
                    secretKeyRef:
                      name: apigee-dba-credentials
                      key: username
                - name: CASSANDRA_DB_PASS
                  valueFrom:
                    secretKeyRef:
                      name: apigee-dba-credentials
                      key: password
              volumeMounts:
                - name: dbbackup-key
                  mountPath: /var/secrets/google
          volumes:
            - name: dbbackup-key
              secret:
                secretName: dbbackup-credentials
          restartPolicy: OnFailure
  6. Run the restore job:
    kubectl apply -f dbrestore-job.yaml

Viewing the restore logs

You can check from the restore job logs and grep for error to make sure the restore log has no errors. Then you can use cqlsh.sh to verify your restored dataset is intact.

Verify the restore completed

To check if the restore operation completed:

kubectl get pods

NAME                 READY     STATUS      RESTARTS   AGE
apigee-cassandra-0   1/1       Running     0          1h
apigee-cassandra-1   1/1       Running     0          1h
apigee-cassandra-2   1/1       Running     0          59m
dbrestore-b4lgf      0/1       Completed   0          51m

View the restore logs

To view the restore logs:

kubectl logs -f dbrestore-b4lgf

Restore Logs:

Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
to download file gs://gce-myusername-dbbackup/apigeecluster/dc-1/backup_20190405011309_schema.tgz
INFO: download sucessfully extracted the backup files from gs://gce-myusername-dbbackup/apigeecluster/dc-1
finished downloading schema.cql
to create schema from 10.32.0.28

Warnings :
dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0


Warnings :
dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

INFO: the schema has been restored
starting apigee-cassandra-0 in default
starting apigee-cassandra-1 in default
starting apigee-cassandra-2 in default
84 95 106
waiting on waiting nodes $pid to finish  84
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-dbbackup/apigeecluster/dc-1
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-dbbackup/apigeecluster/dc-1
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-dbbackup/apigeecluster/dc-1
INFO  12:02:28 Configuration location: file:/etc/cassandra/cassandra.yaml
…...

INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed

Summary statistics:
   Connections per host    : 3
   Total files transferred : 2
   Total bytes transferred : 0.378KiB
   Total duration          : 5048 ms
   Average transfer rate   : 0.074KiB/s
   Peak transfer rate      : 0.075KiB/s

progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s)
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s)
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully
INFO: Restore 20190405011309 completed
INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully
INFO: Restore 20190405011309 completed
waiting on waiting nodes $pid to finish  106
Restore finished

Verify backup job

You can also verify your backup job after your backup cronjob is scheduled. After the cronjob has been scheduled, you should see something like this:

kubectl get pods
NAME                        READY     STATUS      RESTARTS   AGE
apigee-cassandra-0          1/1       Running     0          2h
apigee-cassandra-1          1/1       Running     0          2h
apigee-cassandra-2          1/1       Running     0          2h
dbbackup-1554515580-pff6s   0/1       Running     0          54s

Check the backup logs

The backup job:

  • Creates a schema.cql file
  • Uploads it to your storage bucket.
  • Echoes the node to backup the data and uploads it at the same time.
  • Waits until all of the data is uploaded.
kubectl logs -f dbbackup-1554515580-pff6s

myusername-macbookpro:cassandra-backup-utility myusername$ kubectl logs -f dbbackup-1554577680-f9sc4
starting apigee-cassandra-0 in default
starting apigee-cassandra-1 in default
starting apigee-cassandra-2 in default
35 46 57
waiting on process  35
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Snapshot directory: 20190406190808
INFO: backup created cassandra snapshot 20190406190808
tar: Removing leading `/' from member names
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Snapshot directory: 20190406190808
INFO: backup created cassandra snapshot 20190406190808
tar: Removing leading `/' from member names
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/manifest.json
……
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-CompressionInfo.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/schema.cql
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-CompressionInfo.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-CompressionInfo.db
……
/tmp/tokens.txt
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
INFO: backup created tarball and transfered the file to gs://gce-myusername-dbbackup/apigeecluster/dc-1
INFO: removing cassandra snapshot
INFO: backup created tarball and transfered the file to gs://gce-myusername-dbbackup/apigeecluster/dc-1
INFO: removing cassandra snapshot
Requested clearing snapshot(s) for [all keyspaces]
INFO: Backup 20190406190808 completed
waiting on process  46
Requested clearing snapshot(s) for [all keyspaces]
INFO: Backup 20190406190808 completed
Requested clearing snapshot(s) for [all keyspaces]
waiting on process  57
INFO: Backup 20190406190808 completed
waiting result
to get schema from 10.32.0.28
INFO: /tmp/schema.cql has been generated
Activated service account credentials for: [dbbackup-svc@gce-myusername.iam.gserviceaccount.com]
tar: removing leading '/' from member names
tmp/schema.cql
Copying from ...
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
INFO: backup created tarball and transfered the file to gs://gce-myusername-dbbackup/apigeecluster/dc-1
finished uploading schema.cql