Generally in a production setup, there is a need to enable monitoring mechanisms within an Apigee Edge for Private Cloud deployment. These monitoring techniques warn the network administrators (or operators) of an error or a failure. Every error generated is reported as an alert in Apigee Edge. For more information on alerts, see Monitoring Best Practices.
For ease, Apigee components are classified mainly into two categories:
- Apigee-specific Java Server Services: These include Management Server, Message Processor, Qpid Server, and Postgres Server.
- Third-party Services: These include NGINX Router, Apache Cassandra, Apache ZooKeeper, OpenLDAP, PostgresSQL database, and Qpid.
In an on-premise deployment of Apigee Edge, the following table provides a quick glance into the parameters that you can monitor:
|Component||System Checks||Process-Level Stats||API-Level Checks||Message Flow Checks||Component Specific|
Apigee-specific Java Services
In general, after Apigee Edge has been installed, you can perform the following monitoring tasks to track the performance of an Apigee Edge for Private Cloud installation.
System health checks
It is very important to measure the system health parameters such as CPU utilization, memory utilization and port connectivity at a higher level. You can monitor the following parameters to get the basics of system health.
- CPU Utilization: Specifies the basic statistics (User/System/IO Wait/Idle) about the CPU utilization. For example, total CPU used by the system.
- Free/Used Memory: Specifies the system memory utilization as bytes. For example, physical memory used by the system.
- Disk Space Usage: Specifies the file system information based on the current disk usage. For example, hard disk space used by the system.
- Load Average: Specifies the number of processes waiting to run.
- Network Statistics: Network packets and/or bytes transmitted and received, along with the transmission errors about a specified component.
At the process level, you can view important information about all the processes that are running. For example, these include memory and CPU usage statistics that a process or application utilizes. For processes like qpidd, postgres postmaster, java and so on, you can monitor the following:
- Process identification: Identify a particular Apigee process. For example, you can monitor for the existence of an Apigee server java process.
- Thread statistics: View the underlying threading patterns that a process uses. For example, you can monitor peak thread count, thread count for all the processes.
- Memory utilization: View the memory usage for all the Apigee processes. For example, you can monitor the parameters like heap memory usage, non-heap memory usage used by a process.
At the API level, you can monitor whether a server is up and running for frequently used API
calls proxied by Apigee. For example, you can perform API check on the Management Server, Router,
and Message Processor by invoking the following
Where host is the IP address of the Apigee Edge component. The port number is specific to each Edge component. For example:
Management Server: 8080
- Router: 8081
- Message Processor: 8082
See the individual sections below for information on running this command for each component
This call returns the "true" and "false". For best results, you can also issue API calls directly on the backend (with which Apigee software interacts) in order to quickly determine whether an error exists within the Apigee software environment or on the backend.
Message flow checks
You can collect data from Routers and Message Processors about message flow pattern/statistics. This allows you to monitor the following:
- Number of active clients
- Number of responses (10X, 20X, 30X, 40X and 50X)
- Connect failures
This helps you to provide dashboards for the API message flow. For more, see How to Monitor.
Router health check of the Message Processor
The Router implements a health check mechanism to determine which of the Message Processors
are working as expected. If a Message Processor is detected as down or slow, the Router can
automatically take the Message Processor out of rotation. If that happens, the Router writes a
"Mark Down" messages to the Router log file at
You can monitor the Router log file for these messages. For example, if the Router takes a Message Processor out of rotation, it writes message to the log in the form:
2014-05-06 15:51:52,159 org: env: RPCClientClientProtocolChildGroup-RPC-0 INFO CLUSTER - ServerState.setState() : State of 2a8a0e0c-3619-416f-b037-8a42e7ad4577 is now DISCONNECTED. handle = MP_IP at 1399409512159 2014-04-17 12:54:48,512 org: env: nioEventLoopGroup-2-2 INFO HEARTBEAT - HBTracker.gotResponse() : No HeartBeat detected from /MP_IP:PORT Mark Down
Where MP_IP:PORT is the IP address and port number of the Message Processor.
If later the Router performs a health check and determines that Message Processor is functioning properly, the Router automatically puts the Message Processor back into rotation. The Router also writes a "Mark Up" message to the log in the form:
2014-05-06 16:07:29,054 org: env: RPCClientClientProtocolChildGroup-RPC-0 INFO CLUSTER - ServerState.setState() : State of 2a8a0e0c-3619-416f-b037-8a42e7ad4577 is now CONNECTED. handle = IP at 1399410449054 2014-04-17 12:55:06,064 org: env: nioEventLoopGroup-4-1 INFO HEARTBEAT - HBTracker.updateHB() : HeartBeat detected from IP:PORT Mark Up