Monitoring alerts
Apigee Edge allows you to forward alerts to syslogs or external monitoring systems/tools when an error or a failure occurs due to failure of an event. These alerts can be system-level or application-level alerts/events. Application level alerts are mostly custom alerts that are created based on events generated. The network administrator usually configures the custom conditions. For more information on alerts, contact Apigee Support.
Setting alert thresholds
Set a threshold after which an alert needs to be generated. What you set depends on your hardware configuration. Threshold should be set in relation to your capacity. For example, Apigee Edge might be too low if you only have 6GB capacity. You can assign threshold with equal to (=) or greater than (>) criterion. You can also specify a time interval between two consecutive alerts generation. You can use the hours/minutes/seconds option.
Criteria for Setting System-level Alerts
The following table describes the criteria:
Alert | Suggested Threshold | Description |
---|---|---|
Low memory |
500MB |
Memory is too low to start a component |
Low disk space (/var/log) |
8GB |
Disk space has fallen too low. |
High load |
3+ |
Processes waiting to run have increased unexpectedly |
Process stopped |
N/A, a Boolean value of true or false |
Apigee Java process in the system has stopped |
Checking on Apigee-specific and Third-party Ports
Monitor the following ports to make sure they're active
- Port 4526, 4527 and 4528 on Management Server, Router and Message Processor
- Port 1099, 1100 and 1101 on Management Server, Router and Message Processor
- Port 8081 and 15999 on Routers
- Port 8082 and 8998 on Message Processors
- Port 8080 on Management Server
Check the following third-party ports to make sure they’re active:
- Qpid port 5672
- Postgres port 5432
- Cassandra port 7000, 7199, 9042, 9160
- ZooKeeper port 2181
- OpenLDAP port 10389
In order to determine which port each Apigee component is listening for API calls on, issue the following API calls to the Management Server (which is generally on port 8080):
curl -v -u username:password http://host:port/v1/servers?pod=gateway®ion=dc-1curl -v -u username:password http://host:port/v1/servers?pod=central®ion=dc-1
curl -v -u username:password http://host:port/v1/servers?pod=analytics®ion=dc-1
The output of these commands will contain sections similar to that shown below. The
http.management.port
section gives the port number for the specified component.
{ "externalHostName" : "localhost", "externalIP" : "111.222.333.444", "internalHostName" : "localhost", "internalIP" : "111.222.333.444", "isUp" : true, "pod" : "gateway", "reachable" : true, "region" : "default", "tags" : { "property" : [ { "name" : "Profile", "value" : "Router" }, { "name" : "rpc.port", "value" : "4527" }, { "name" : "http.management.port", "value" : "8081" }, { "name" : "jmx.rmi.port", "value" : "1100" } ] }, "type" : [ "router" ], "uUID" : "2d4ec885-e20a-4173-ae87-10be38b35750" }
Viewing Logs
Log files keep track of messages regarding the event/operation of the system. Messages appear in the log when processes begin and complete or when an error condition occurs. By viewing log files, you can obtain information about system components, for example, CPU, memory, disk, load, processes, so on, before and after attaining a failed state. This also allows you to identify and diagnose the source of current system problems or help you predict potential system problems.
For example, a typical system log of a component contains following entries as seen below:
TimeStamp = 25/01/13 19:25 ; NextDelay = 30 Memory HeapMemoryUsage = {used = 29086176}{max = 64880640} ; NonHeapMemoryUsage = {init = 24313856}{committed = 57278464} ; Threading PeakThreadCount = 53 ; ThreadCount = 53 ; OperatingSystem SystemLoadAverage = 0.25 ;
You can edit the /opt/apigee/conf/logback.xml
file to control the logging mechanism without
having to restart a server. The logback.xml file contains the following property that sets the
frequency that the logging mechanism checks the logback.xml file for configuration changes:
<configuration scan="true" scanPeriod="30 seconds" >
By default, the logging mechanism checks for changes every minute. If you omit the time units
to the scanPeriod
attribute, it defaults to milliseconds.
The following table tells the log files location of Apigee Edge Private Cloud components.
Components | Location |
---|---|
Management Server |
|
Router |
|
Message Processor |
|
Qpid Server |
|
Apigee Postgres Server |
|
Edge UI |
|
ZooKeeper |
|
OpenLDAP |
|
Cassandra |
|
Qpidd |
|
PostgreSQL database |
|
Enabling debug logs for the Message Processor and Edge UI
To enable debug logs for Message Processor:
- On the Message Processor node, edit
/opt/apigee/customer/application/messsage-processor.properties
. If that file does not exist, create it. - Add the following property to the file:
conf_system_log.level=DEBUG
- Restart the Message Processor:
/opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
To enable debug logs for Edge UI:
- On the Edge UI node, edit
/opt/apigee/customer/application/ui.properties
. If that file does not exist, create it. - Add the following property to the file:
conf_application_logger.application=DEBUG
- Restart the Edge UI:
/opt/apigee/apigee-service/bin/apigee-service edge-ui restart
apigee-monit best practices
When using apigee-monit
, Apigee recommends that you:
- Stop monitoring a component before you perform any operation that starts or stops it such as a backup or an upgrade.
- Monitor
apigee-monit
by using a tool such ascron
. For more information, see Monitor apigee-monit.
Monitoring Tools
Monitoring tools such as Nagios, Collectd, Graphite, Splunk, Sumologic, and Monit can help you monitor your entire enterprise environment and business processes.
Component | Nagios | Collectd | Splunk | |
---|---|---|---|---|
System-level checks |
CPU utilization |
|||
Free/used memory |
||||
Disk space usage |
||||
Network statistics |
||||
Processes |
|
|
||
API checks |
||||
JMX |
||||
Java |
||||
Log files |
||||
Critical events |
Rate Limit hit |
|||
Backend server (Hybris or SharePoint) cannot be reached |
||||
FaaS (STS) cannot be reached |
||||
Warning events |
SMTP server cannot be reached |
|||
SLAs violated |