Export data from Analytics

Apigee Analytics collects and analyzes a broad spectrum of data that flows across your APIs and provides visualization tools, including interactive dashboards, custom reports, and other tools that identify trends in API proxy performance.

Now, you can unlock this rich content by exporting analytics data from Apigee Analytics to your own data repository, such as Google Cloud Storage or BigQuery. You can then take advantage of the powerful query and machine learning capabilities offered by Google Cloud BigQuery and TensorFlow to perform your own data analysis. You can also combine the exported analytics data with other data, such as web logs, to gain new insights into your users, APIs, and applications.

Export data format

Export analytics data to one of the following formats:

  • Comma-separated values (CSV)

    The default delimiter is a comma (,) character. Supported delimiter characters include comma (,), pipe (|), and tab (\t). Configure the value using the csvDelimiter property, as described in Export request property reference .

  • JSON

The exported data includes all the analytics metrics and dimensions built into Edge, and any custom analytics data that you add. For a description of the exported data, see Analytics metrics, dimensions, and filters reference.

You can export analytics data to the following data repositories:

Overview of the export process

The following steps summarize the process used to export your analytics data:

  1. Configure your data repository (GCS or BigQuery) for data export. You must ensure that your data repository has been configured correctly, and that the service account used to write data to the data repository has the correct permissions.

  2. Create a data store that defines the properties of the data repository (GCS or BigQuery) where you export your data, including the credentials used to access the data repository.

    When you create a data store, you upload the data repository credentials to the Edge Credentials Vault to securely store them. The data export mechanism then uses those credentials to write data to your data repository.

  3. Use the data export API to initiate the data export. The data export runs asynchronously in the background.

  4. Use the data export API to determine when the export completes.

  5. When the export completes, access the exported data in your data repository.

The following sections describe these steps in more detail.

Configure your data repository

The analytics data export mechanism writes data to GCS or BigQuery. In order for that write to occur, you must:

  • Create a Google Cloud Platform service account.
  • Set the role of the service account so that it can access GCS or BigQuery.

Create a service account for GCS or BigQuery

A service account is a type of Google account that belongs to your application instead of to an individual user. Your application then uses the service account to access a service.

A service account has a service account key represented by a JSON string. When you create the Edge data store that defines the connection to your data repository, you pass it this key. The data export mechanism then uses the key to access your data repository.

The service account associated with the key must be a Google Cloud Platform project owner and have write access to the Google Cloud Storage bucket. To create a service key and download the required payload, see Creating and Managing Service Account Keys in the Google Cloud Platform documentation.

For example, when you first download your key it will be formatted as a JSON object:

{ 
  "type": "service_account", 
  "project_id": "myProject", 
  "private_key_id": "12312312", 
  "private_key": "-----BEGIN PRIVATE KEY-----\n...", 
  "client_email": "client_email@developer.gserviceaccount.com", 
  "client_id": "879876769876", 
  "auth_uri": "https://accounts.google.com/organizations/oauth2/auth", 
  "token_uri": "https://oauth2.googleapis.com/token", 
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2", 
  "client_x509_cert_url": "https://www.googleapis.com" 
}

Configure Google Cloud Storage

Before you can export data to Google Cloud Storage:

  • Ensure that the BigQuery API is enabled in your Google Cloud Platform project. See Enabling and Disabling APIs for instructions. The BigQuery API is necessary for export to GCS because Apigee leverages BigQuery export features.
  • Ensure that the service account is assigned to the following roles:

    • BigQuery Job User
    • Storage Object Creator
    • Storage Admin (required only for testing the data store as described in Test a data store configuration. If this role is too broad, you can add the storage.buckets.get permission to an existing role instead.)

    Alternatively, if you want to modify an existing role, or create a custom role, add the following permissions to the role:

    • bigquery.jobs.create
    • storage.objects.create
    • storage.buckets.get (required only for testing the data store as described in Test a data store configuration)

Configure Google BigQuery

Before you can export data to Google BigQuery:

  • Ensure that you have enabled BigQuery in your Google Cloud Platform project.
  • Ensure that the BigQuery API is enabled in your Google Cloud Platform project. See Enabling and Disabling APIs for instructions.
  • Ensure that the service account is assigned to the following roles:

    • BigQuery Job User
    • BigQuery Data Editor

    If you want to modify an existing role, or create a custom role, add the following permissions to the role:

    • bigquery.datasets.create
    • bigquery.datasets.get
    • bigquery.jobs.create
    • bigquery.tables.create
    • bigquery.tables.get
    • bigquery.tables.updateData

Create a data store

The data store defines the connection to your export data repository (GCS, BigQuery), including the credentials used to access the data repository.

About the Edge Credentials Vault

Edge uses the Credentials Vault to securely store the credentials used to access your export data repository. For a service to be able to access the credentials in the Edge Credentials Vault, you must define a credential consumer.

When creating a data store by using the Edge UI, as described below, Edge automatically creates the consumer used to access the credentials.

Test a data store configuration

When you create the data store, Edge does not test or validate that your credentials and data repository configuration are valid. That means you can create the data store and not detect any errors until you run your first data export.

Alternatively, test the data store configuration before creating it. Testing is useful because a large data export process can take a long time to execute. By testing your credentials and data store configuration before you start downloading large amounts of data, you can quickly fix any issues with your settings.

If the test succeeds, then create the data store. If the test fails, fix the errors then retest the configuration. You only create the data store after the tests are successful.

To enable the test feature you must:

  • Ensure that the Cloud Resource Manager API is enabled in your Google Cloud Platform project. See Enabling and Disabling APIs for instructions.

Create a data store

To create a data store in the UI:

  1. Log in to https://apigee.com/edge as an org administrator and select your organization.

    Note: You must be an Edge org administrator to be able to create a data store.

  2. Select Admin > Analytics Datastores from the left navigation bar. The Analytics Datastores page displays.

  3. Select the + Add Datastore button. You are prompted to select the data store type:

  4. Choose a export data target type:

    • Google Cloud Storage
    • Google BigQuery

    The configuration page appears:

  5. Enter the data store Name.

  6. Select a credential used to access the data repository. A drop-down list of available credentials appears.

    The credentials are specific to a data repository type. See Create a service account for GCS or BigQuery for more.

    • If you have already uploaded the credentials, select the credentials from the drop-down list. Ensure that you select credentials appropriate for the data repository type.

    • If you are adding new credentials to the data store, select Add new. In the dialog box, enter:

      1. The Credentials name.
      2. The Credentials content is the JSON service account key specific to your data repository as defined by Create a service account for GCS or BigQuery.
      3. Select Create.
  7. Enter the properties specific to the data repository type:

    • For Google Cloud Storage:
      Property Description Required?
      Project ID Google Cloud Platform project ID.

      To create a Google Cloud Platform project, see Creating and Managing Projects in the Google Cloud Platform documentation.

      Yes
      Bucket Name Name of the bucket in Google Cloud Storage to which you want to export analytics data. The bucket must exist before you perform a data export.

      To create a Google Cloud Storage bucket, see Creating Storage Buckets in the Google Cloud Platform documentation.

      Yes
      Path Directory in which to store the analytics data in the Google Cloud Storage bucket. If not specified, defaults to root. Yes
    • For BigQuery:
      Property Description Required?
      Project ID Google Cloud Platform project ID.

      To create a Google Cloud Platform project, see Creating and Managing Projects in the Google Cloud Platform documentation.

      Yes
      Dataset Name Name of the BigQuery dataset to which you want to export analytics data. Ensure that the dataset is created before requesting data export.

      To create a BigQuery dataset, see Creating and Using Datasets in the Google Cloud Platform documentation.

      Yes
      Table Prefix The prefix for the names of the tables created for the analytics data in the Google BigQuery dataset. Yes
  8. Select Test connection to ensure that the credentials can be used to access the data repository.

    If the test is successful, save your data store.

    If the test fails, fix any issues and retry the test. Move the mouse over the error message in the UI to display additional information in a tooltip.

  9. After the connection test passes, Save the data store.

Modify a data store

To modify a data store:

  1. Log in to https://apigee.com/edge as an org administrator and select your organization.

  2. Select Admin > Analytics Datastores from the left navigation bar. The Analytics Datastores page displays.

  3. Move the mouse pointer over the Modified column of the report to modify. An edit and delete icon appears.

  4. Edit or delete the data store.

  5. If you edited the data store, select Test connection to ensure that the credentials can be used to access the data store.

    If the test is successful, you can view the sample data in your data repository.

    If the test fails, fix any issues and retry the test.

  6. After the connection test passes, Update the data store.

Export analytics data

To export analytics data, issue a POST request to the /analytics/exports API. Pass the following information in the request body:

  • Name and description of the export request
  • Date range of exported data
  • Format of exported data
  • Data store name
  • Whether monetization is enabled on the organization

For a complete description of the request body properties, see Export request property reference.

The response from the POST is in the form:

{
    "self": "/organizations/myorg/environments/test/analytics/exports/a7c2f0dd-1b53-4917-9c42-a211b60ce35b",
    "created": "2017-09-28T12:39:35Z",
    "state": "enqueued"
}

Note that the state property in the response is set to enqueued. The POST request works asynchronously. That means it continues to run in the background after the request returns a response. Possible values for state include: enqueued, running, completed, failed.

Use the URL returned in the self property to view the status of the data export request, as described in Viewing the status of an analytics export request. When the request completes, the value of the state property in the response is set to completed. You can then access the analytics data in your data repository.

Example 1: Export data to Google Cloud Storage

The following request exports a complete set of raw data for the last 24 hours from the test environment in the myorg organization. The content is exported to Google Cloud Storage in JSON:

curl -X POST -H "Content-Type:application/json" \
"https://api.enterprise.apigee.com/v1/organizations/myorg/environments/test/analytics/exports" \
  -d \
  '{
    "name": "Export raw results to GCS",
    "description": "Export raw results to GCS for last 24 hours",
    "dateRange": {
      "start": "2018-06-08", 
      "end": "2018-06-09"
    },
    "outputFormat": "json",
    "datastoreName": "My gcs data repository"
  }' \
  -u orgAdminEmail:password

Use the URI specified by the self property to monitor the job status as described in Viewing the status of an analytics export request.

Example 2: Export data to Big Query

The following request exports a comma-delimited CSV file to Big Query:

curl -X POST -H "Content-Type:application/json"  \
  "https://api.enterprise.apigee.com/v1/organizations/myorg/environments/test/analytics/exports" \
  -d \
  '{
    "name": "Export query results to Big Query",
    "description": "One-time export to Big Query",
    "dateRange": {
      "start": "2018-06-08", 
      "end": "2018-06-09"
    },
    "outputFormat": "csv",
    "csvDelimiter": ",", 
    "datastoreName": "My bq data repository"
  }' \
  -u orgAdminEmail:password

Use the URI specified by the self property to monitor the job status as described in Viewing the status of an analytics export request.

Example 3: Export monetization data

If monetization is enabled on an environment in the organization, you can perform two types of data exports:

  • Standard data export as shown in the previous two examples.
  • Monetization data export to export data specific to monetization.

To perform a monetization data export, specify "dataset":"mint" in the request payload. The organization and environment must support monetization to set this option, otherwise omit the dataset property from the payload:

  '{
    "name": "Export raw results to GCS",
    "description": "Export raw results to GCS for last 24 hours",
    "dateRange": {
      "start": "2018-06-08", 
      "end": "2018-06-09"
    },
    "outputFormat": "json",
    "datastoreName": "My gcs data repository",
    "dataset":"mint"
  }'

About export API quotas

To prevent overuse of expensive data export API calls, Edge enforces a quota on calls to the /analytics/exports API:

  • For organizations and environments that do not have monetization enabled, the quota is:

    • 70 calls per month per organization/environment.

    For example, if you have two environments in your org, prod and test, you can make 70 API calls per month for each environment.

  • For organizations and environments with monetization enabled, the quota is:

    • 70 calls per month for each organization and environment for standard data.
    • 70 calls per month for each organization and environment for monetization data.

    For example, if you enable monetization on your prod org, you can make 70 API calls for standard data and 70 additional API calls for monetization data.

If you exceed the call quota, the API returns an HTTP 429 response.

Viewing the status of all analytics export requests

To view the status for all analytics export requests, issue a GET request to /analytics/exports.

For example, the following request returns the status of all analytics export requests for the test environment in the myorg organization:

curl -X GET \
  "https://api.enterprise.apigee.com/v1/organizations/myorg/environments/test/analytics/exports" \
  -u email:password

The following provides an example of the response listing two export requests, one enqueued (created and in the queue) and one completed:

[
  {
    "self":
"/v1/organizations/myorg/environments/test/analytics/exports/e8b8db22-fe03-4364-aaf2-6d4f110444ba",
    "name": "Export results To GCS",
    "description": "One-time export to Google Cloud Storage",
    "userId": "my@email.com",
    "datastoreName": "My gcs data store",
    "executionTime": "36 seconds",
    "created": "2018-09-28T12:39:35Z",
    "updated": "2018-09-28T12:39:42Z",
    "state": "enqueued"
  },
  {
    "self":
"/v1/organizations/myorg/environments/test/analytics/exports/9870987089fe03-4364-aaf2-6d4f110444ba"
    "name": "Export raw results to BigQuery",
    "description": "One-time export to BigQuery",
    ... 
  }
]

Viewing the status of an analytics export request

To view the status of a specific analytics export request, issue a GET request to /analytics/exports/{exportId}, where {exportId} is the ID associated with the analytics export request.

For example, the following request returns the status of the analytics export request with the ID 4d6d94ad-a33b-4572-8dba-8677c9c4bd98.

curl -X GET \
"https://api.enterprise.apigee.com/v1/organizations/myorg/environments/test/analytics/exports/4d6d94ad-a33b-4572-8dba-8677c9c4bd98" \
-u email:password

The following provides an example of the response:

{
  "self":
"/v1/organizations/myorg/environments/test/analytics/exports/4d6d94ad-a33b-4572-8dba-8677c9c4bd98",
  "name": "Export results To GCS",
  "description": "One-time export to Google Cloud Storage",
  "userId": "my@email.com",
  "datastoreName": "My gcs data store",
  "executionTime": "36 seconds",
  "created": "2018-09-28T12:39:35Z",
  "updated": "2018-09-28T12:39:42Z",
  "state": "enqueued"
}

If the analytics export returns no analytics data, then executionTime is set to "0 seconds".

Export request property reference

The following table describes the properties that you can pass in the request body in JSON format when exporting analytics data.

Property Description Required?
description Description of the export request. No
name Name of the export request. Yes
dateRange

Specify the start and end date of the data to export, in the format yyyy-mm-dd. For example:

"dateRange": {
    "start": "2018-07-29",
    "end": "2018-07-30"
}

The dateRange value can only span one day. The date range begins at 00:00:00 UTC on the start date and ends at 00:00:00 UTC on the end date.

Yes
outputFormat Specify as either json or csv. Yes
csvDelimiter

Delimiter used in the CSV output file, if outputFormat is set to csv. Defaults to the , (comma) character. Supported delimiter characters include comma (,), pipe (|), and tab (\t).

No
datastoreName The name of the data store containing the definition of your data store. Yes

For example:

{
  "name": "Export raw results to GCS",
  "description": "Export raw results to Google Cloud Storage for last 24 hours",
  "datastoreName": "My gcs data store"  
}