Set up alerts and notifications

Alert conditions define specific status code (2xx/4xx/5xx), latency, and fault code thresholds that when exceeded trigger visual alerts in the UI and send notifications through a variety of channels, such as email, slack, pagerduty, or webhooks. You can set up alerts at the environment, API proxy or target service, or region level. When an alert is triggered, you will receive a notification using the method you defined when adding alerts and notifications.

For example, you may want to trigger an alert and send a notification to the Operations team when the 5xx error rate exceeds 23% for a period of 5 minutes for the orders-prod API proxy deployed to your production environment.

The following figure shows how alerts display in the UI:

The following provides an example of an email notification that you may receive when an alert is triggered.

Within the body of the alert notification, click the following links for more information:

  • View details to view more details, including the alert settings and activity for each condition over the last hour.
  • Alert definition to view the definition of the alert.
  • Alert history to view more information about the particular alert.
  • View playbook to view the recommended actions, if provided.
  • View API Analytics Report to view a custom report for the alert condition.

The following sections describe how to set up and manage alerts and notifications.

View alert settings

To view alert settings that are currently defined, click Alerts on the API Monitoring dashboard.

The Alert page displays, as shown in the following figure:

Alert email

As highlighted in the figure, the Alert page enables you to:

View the history of alerts that have been triggered for your organization

To view the full history of alerts that have been triggered for your organization, click Alerts on the API Monitoring dashboard and click the History tab.

The Alert History page is displayed.

Alert history

Click the name of the alert to view the details of the alert in the Investigate dashboard. You can filter the list by searching on all or part of the alert name.

Add alerts and notifications

To add alerts and notifications:

  1. Click Alerts on the API Monitoring dashboard.
  2. Click +Alert.
  3. Enter the following general information about the alert:
    Field Description
    Alert Name Name of the alert. Use a name that describes the trigger and that will be meaningful to you. The name cannot exceed 128 characters.
    Description Description of the alert.
    Environment Select the environment from the drop-down list.
    Status Toggle to enable or disable the alert.
  4. Define the metric, threshold, and dimension for the first condition that will trigger the alert.
    Condition Field Description
    Metric Select one of the following metrics:
    • Fault Code: Select a category, subcategory, and fault code from the list. Or select one of the following within a category or subcategory:

      • All - Combined total across all fault codes in this category/subcategory must meet the metric criteria.
      • Any - Single fault code in this category/subcategory must meet the metric criteria.

      See Fault code reference for more information.

    • Latency: Select a latency value from the drop-down list. Specifically: p50, p90, p95, or p99.
    • Status Code: Select a 2xx, 4xx, or 5xx HTTP status code from the list.

      Note: For rate limiting alerts (HTTP status code 429), set the metric to a Spike Arrest fault code.

      Note: In Apigee Edge, you can use the Assign Message policy to rewrite the HTTP response code, either from a proxy error or a target error. API Monitoring ignores any rewritten codes and logs the actual HTTP response codes.

    Threshold Configure threshold for the selected metric:

    • Fault Code: Set the threshold as a percentage rate, count, or transactions per second (TPS) over time.
    • Latency: Select the threshold as a total or target latency duration (ms) over time. In this case, an alert is fired if the specified percentile observed latency, which is updated each minute if traffic is present, exceeds the threshold condition for the timespan covering the specified time duration. That is, the threshold condition is not aggregated over the full time duration.
    • Status Code: Set the threshold as a percentage rate, count, or transactions per second (TPS) over time.
    Dimension Click +Add Dimension and specify the dimension details for which to return results including the API proxy, target service, or developer app, and region.

    If you set a specific dimension to:

    • All - All entities in the dimension must meet the metric criteria. You cannot select All for a metric of type Latency.
    • Any - Applicable to region only. An entity in the dimension must meet the metric criteria for any single region.
      Note: For API proxies or target services, select a Collection to support Any functionality.
    • Collections - Select a collection from the list to specify the set of API proxies or target services. In this case, any entity in the collection must meet the criteria.
  5. Click Show condition data to show recent data for the condition over the last hour.
    The error rate in the graph displays red when it exceeds the alert condition threshold.
    Show conditions data

    Click Hide condition data to hide the data.

  6. Click + Add Condition to add additional conditions and repeat steps 4 and 5.

    Note: If you specify multiple conditions, the alert will be triggered when all the conditions are met.

  7. Click Create an API analytics reports based on alert conditions if you want to create a custom report based on the alert conditions that you configured. This option is greyed out if you are not an organization administrator.

    For more information, see Create a custom report from an alert.

    Note: You can modify the custom report after you save the alert, as described in Managing custom reports.

  8. Click + Add Notification to add an alert notification.
    Notification Details Description
    Channel Select the notification channel that you want to use and specify the destination: Email, Slack, PagerDuty, or Webhook.
    Destination Specify the destination based on the selected channel type:
    • Email - Email address, such as joe@company.com
    • Slack - Slack channel URL, such as https://hooks.slack.com/services/T00000000/B00000000/XXXXX
    • PagerDuty - PagerDuty code, such as abcd1234efgh56789
    • Webhook - Webhook URL, such as https://apigee.com/test-webhook

      Note: You can specify only one destination per notification. To specify multiple destinations for the same channel type, add additional notifications.

  9. To add additional notifications, repeat step 8.
  10. If you added a notification, set the following fields:

    Field Description
    Playbook (Optional) Free-form text field to provide a short description of recommend actions for resolving the alerts when they fire. You can also specify a link to your internal wiki or community page where you reference best practices. The information in this field will be included in the notification. The contents in this field cannot exceed 1500 characters.
    Throttle Frequency with which to send notifications. Select a value from the drop-down list. Valid values include: 15 minutes, 30 minutes, and 1 hour.
  11. Click Save.

Create a custom report from an alert

To create a custom report from an alert:

  1. When creating an alert, click Create an API analytics reports based on alert conditions, as described in Adding alerts and notifications.

    After you save the alert, the UI displays the following message:

    Alert alertName saved successfully. To customize the report generated, click here.

    Click the message to open the report in a new tab with relevant fields pre-populated. By default, the custom report is named: API Monitoring Generated alertName

  2. Edit the custom report, as desired, and click Save.
  3. Click the name of the report on the list and run the custom report.

To manage the custom report created based on alert conditions:

  1. Click Alerts on the API Monitoring dashboard.
  2. Click the Settings tab.
  3. In the Reports column, click the custom report associated with the alert that you want to manage.

    The custom report page displays in a new tab. If the Reports column is blank, a custom report has not yet been created. You can edit the alert to add a custom report, if desired.

  4. Edit the custom report, as desired, and click Save.
  5. Click the name of the report on the list and run the custom report.

Enable or disable an alert

To enable or disable an alert:

  1. Click Alerts on the API Monitoring dashboard.
  2. Click the toggle in the Status column associated with the alert that you want to enable or disable.

Edit an alert

To edit an alert:

  1. Click Alerts on the API Monitoring dashboard.
  2. Click the name of the alert you want to edit.
  3. Edit the alert, as required.
  4. Click Save.

Delete an alert

To delete an alert:

  1. Click Alerts on the API Monitoring dashboard.
  2. Position the cursor over the alert you want to delete and click in the actions menu.

Apigee recommends that you set up the following alerts to be notified about common issues.

Recommended alert UI Example API Example
5xx status codes for all/any APIs Set up a 5xx status code alert for an API proxy Set up a 5xx status code alert for anAPI proxy using the API
P95 latency for an API proxy Set up a P95 latency alert for an API proxy Set up a P95 latency alert for an API proxy using the API
404 (Application Not Found) status codes for all API proxies Set up a 404 (Application Not Found) status code alert for all API proxies Set up a 404 (Application Not Found) status code alert for all API proxies using the API
API proxy count for mission-critical APIs Set up an API proxy count alert for mission-critical APIs Set up an API proxy count alert for mission-critical APIs using the API
Error rates for mission-critical target services Set up an error rate alert for mission-critical target services Set up an error rate alert for mission-critical target services using the API
Specific fault codes, including:
  • API protocol errors (typically 4xx)
    • UI: API Protocol > All
    • API:
      "faultCodeCategory":"API Protocol",
      "faultCodeSubCategory":"ALL"
  • Catch-all HTTP errors
    • UI: Gateway > Other > Gateway HTTPErrorResponseCode
    • API:
      "faultCodeCategory": "Gateway",
      "faultCodeSubCategory": "Others",
      "faultCodeName": "Gateway HTTPErrorResponseCode"
  • Java service callout execution errors
    • UI: Execution Policy > Java Callout > JavaCallout ExecutionFailed
    • API:
      "faultCodeCategory": "Execution Policy",
      "faultCodeSubCategory": "Java Callout",
      "faultCodeName": "JavaCallout ExecutionFailed"
  • Node script execution errors
    • UI: Execution Policy > Node Script > NodeScript ExecutionError
    • API:
      "faultCodeCategory": "Execution Policy",
      "faultCodeSubCategory": "Node Script",
      "faultCodeName": "NodeScript ExecutionError"
  • Quota violations
    • UI: Traffic Mgmt Policy > Quota > Quota Violation
    • API:
      "faultCodeCategory": "Traffic Mgmt Policy",
      "faultCodeSubCategory": "Quota",
      "faultCodeName": "Quota Violation"
  • Security policy errors
    • UI: Security policy > Any
    • API:
      "faultCodeCategory": "Security Policy",
      "faultCodeName": "Any"
  • Sense errors (if applicable)
    • UI: Sense > Sense > Sense RaiseFault
    • API:
      "faultCodeCategory": "Sense",
      "faultCodeSubCategory": "Sense",
      "faultCodeName": "Sense RaiseFault"
  • Service callout execution errors
    • UI: Execution Policy > Service Callout > ServiceCallout ExecutionFailed
    • API:
      "faultCodeCategory": "Execution Policy",
      "faultCodeSubCategory": "Service Callout",
      "faultCodeName": "ServiceCallout ExecutionFailed"
  • Target errors
    • UI: Gateway > Target > Gateway TimeoutWithTargetOrCallout
    • API:
      "faultCodeCategory": "Gateway",
      "faultCodeSubCategory": "Target",
      "faultCodeName": "Gateway TimeoutWithTargetOrCallout"
  • Target errors, no active targets
    • UI: Gateway > Target > Gateway TargetServerConfiguredInLoadBalancersIsDown
    • API:
      "faultCodeCategory": "Gateway",
      "faultCodeSubCategory": "Target",
      "faultCodeName": "Gateway TargetServerConfiguredInLoadBalancerIsDown
  • Target errors, unexpected EOF
    • UI: Gateway > Target > Gateway UnexpectedEOFAtTarget
    • API:
      "faultCodeCategory": "Gateway", "faultCodeSubCategory": "Target", "faultCodeName" : "Gateway UnexpectedEOFAtTarget"
  • Virtual host errors
    • UI: Gateway > Virtual Host > VirtualHost InvalidKeystoreOrTrustStore
    • API:
      "faultCodeCategory": "Gateway",
      "faultCodeSubCategory": "Virtual Host",
      "faultCodeName": "VirtualHost InvalidKeystoreOrTrustStore"
Set up a fault code alert for mission-critical APIs Set up a fault code alert for mission-critical APIs using the API

Set up a 5xx status code alert for an API proxy

The following provides an example of how to set up an alert using the UI that is triggered when the transactions per second (TPS) of 5xx status codes for the hotels API proxy exceeds 100 for 10 minutes for any region. For more information, see Add alerts and notifications.

For information about using the API, see Set up a 5xx status code alert for an proxy using the API.

Set up a P95 latency alert for an API proxy

The following provides an example of how to set up an alert using the UI that is triggered when total response latency for the 95th percentile is greater than 100 ms for 5 minutes for the hotels API proxy for any region. For more information, see Add alerts and notifications.

For information about using the API, see Set up a P95 latency alert for an API proxy using the API

Set up a 404 (Application Not Found) alert for all API proxies

The following provides an example of how to set up an alert using the UI that is triggered when the percentage of 404 status codes for all API proxies exceeds 5% for 5 minutes for any region. For more information, see Add alerts and notifications.

For information about using the API, see Set up a 404 (Application Not Found) alert for all API proxies using the API.

Set up an API proxy count alert for mission-critical APIs

The following provides an example of how to set up an alert using the UI that is triggered when the 5xx code count for mission-critical APIs exceeds 200 for 5 minutes for any region. In this example, the mission-critical APIs are captured in the Critical API Proxies collection. For more information, see:

For information about using the API, see Set up an API proxy count alert for mission-critical APIs using the API.

Set up an error rate alert for mission-critical target services

The following provides an example of how to set up an alert using the UI that is triggered when the 500 code rate for mission-critical target services exceeds 10% for 1 hour for any region. In this example, the mission-critical target services are captured in the Critical targets collection. For more information, see:

For information about using the API, see Set up an error rate alert for mission-critical target services using the API.

Set up a fault code alert for mission-critical APIs

The following provides an example of how to set up an alert using the UI that is triggered when the RaiseFaultException fault code count is greater than 10 for 5 minutes for mission-critical APIs. In this example, the mission-critical APIs are captured in the Critical API Proxies collection. For more information, see:

For information about using the API, see Set up a fault code alert for mission-critical APIs using the API.