Edge for Private Cloud v4.18.05
This document describes the monitoring techniques of components supported by an on-premise deployment of Apigee Edge.
Overview
Edge supports several ways for getting details about services as well as checking their statuses. The following table lists the types of checks you can perform on each eligible service:
Service | JMX:* Memory Usage |
Mgmt API: Service Check |
Mgmt API: User/Org/ Deployment Status |
Mgmt API: axstatus |
Database check | apigee-service Status |
---|---|---|---|---|---|---|
Management Server | ||||||
Message Processor | ||||||
Postgres | ||||||
Qpid | ||||||
Router | ||||||
More Info | More Info | More Info | More Info | More Info | More Info | |
* Before you can use JMX, you must enable it, as described in Enable JMX. |
JMX and Management API monitoring ports
Each component supports JMX and Management API monitoring calls on different ports. The following table lists the JMX and Management API ports for each type of server:
Component | JMX Port | Management API Port |
---|---|---|
Management Server | 1099 | 8080 |
Router | 1100 | 8081 |
Message Processor | 1101 | 8082 |
Qpid | 1102 | 8083 |
Postgres | 1103 | 8084 |
Use JMX
The monitoring processes for the Management Server, Message Processor, Qpid, and Postgres all use JMX. However, JMX is enabled by default only for Cassandra, and disabled by default for all other Edge components. You must therefore enable JMX individually for each component before you can monitor them.
JMX authentication is not enabled by default. You can enable JMX authentication for all components except Cassandra.
Enable JMX
JMX is enabled by default only for Cassandra, and disabled by default for all other Edge components. This section describes how to enable JMX for the other Edge components.
To enable JMX:
- Edit the component's configuration file. This file is located at
opt/apigee/edge-component_name/bin/start
. In production environments, these configuration files will be on different machines.Choose from the following file locations on each server:
- Management Server:
/opt/apigee/edge-management-server/bin/start
- Message Processor:
/opt/apigee/edge-message-processor/bin/start
- Postgres:
/opt/apigee/edge-postgres-server/bin/start
- Qpid:
/opt/apigee/edge-qpid-server/bin/start
- Router:
/opt/apigee/edge-router/bin/start
For example, the Management Server's configuration file on its server is at
/opt/apigee/edge-management-server/bin/start
. - Management Server:
- Add the following
com.sun.management.jmxremote
options to theexec
line that starts the component:-Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=port_number \ -Dcom.sun.management.jmxremote.local.only=false \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false
Where port_number is the JMX port for the service. To get your service's JMX port number, see JMX and Management API monitoring ports.
For example, to enable JMX on the Management Server, add the following to the Management Server's configuration file:
exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts \ -Djava.security.auth.login.config=$conf_path/jaas.config \ -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path \ -Ddata.dir=$data_dir \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=1099 \ -Dcom.sun.management.jmxremote.local.only=false \ -Dcom.sun.management.jmxremote.authenticate=false \ -Dcom.sun.management.jmxremote.ssl=false \ $* $debug_options com.apigee.kernel.MicroKernel
This example specifies port 1099 for the Management Server. As stated previously, each service has its own port number.
The edited line in the configuration file looks like the following:
exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts -Djava.security.auth.login.config=$conf_path/jaas.config -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path -Ddata.dir=$data_dir -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false $* $debug_options com.apigee.kernel.MicroKernel
- Save the configuration file.
- Restart the component with the
restart
command.For example, to restart the Management Server, execute the following command:
/opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
Authentication for JMX is not enabled by default. You can enable JMX authentication for all components except Cassandra, as described in Enable JMX authentication.
Enable JMX authentication
JMX authentication is not enabled by default. You can enable JMX authentication for all components except Cassandra.
To enable JMX authentication, execute the following change_jmx_auth
action on all
nodes:
/opt/apigee/apigee-service/bin/apigee-service component change_jmx_auth [options|-f config_file]
Where:
- component is one of the following:
edge-management-server
edge-message-processor
edge-postgres-server
edge-qpid-server
edge-router
- options specifies the following:
-u username
-p password
-e [y|n]
(enable or disable)
- config_file specifies the location of a configuration file in which you define
the following:
JMX_USERNAME=username
JMX_ENABLED=y|n
JMX_PASSWORD=password
(if not set or not passed in with-p
, you are prompted)
You can either use the command line options or the configuration file to define the username, password, and enable/disable state. You do not specify both a set of options and a configuration file.
The following example enables JMX authentication for the Management Server using the command line options:
/opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -u foo -p bar -e y
The following example uses a configuration file rather than command line options:
/opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -f /tmp/my-config-file
If you are running Edge on multiple nodes, run the command on all nodes, specifying the same username and password.
To disable JMX authentication on the command line, use the "-e n" option, as the following example shows:
/opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -e n
Monitor with JConsole
Use JConsole (a JMX compliant tool) to manage and monitor health check and process statistics. With JConsole, you can consume JMX statistics exposed by your servers and display them in a graphical interface. For more information, see Using JConsole.
JConsole uses the following service URL to monitor the JMX attributes (MBeans) offered via JMX:
service:jmx:rmi:///jndi/rmi://IP_address:port_number/jmxrmi
Where:
- IP_address is the IP address of the server you want to monitor.
- port_number is the JMX port number of the server you want to monitor.
For example, to monitor the Management Server, issue a command like the following (assuming the server's IP address is 216.3.128.12):
service:jmx:rmi:///jndi/rmi://216.3.128.12:1099/jmxrmi
Note that this example specifies port 1099, which is the Management Server JMX port. For other ports, see JMX and Management API monitoring ports.
The following table shows the generic JMX statistics:
JMX MBeans | JMX Attributes |
---|---|
Memory |
HeapMemoryUsage |
NonHeapMemoryUsage |
|
Usage |
|
Monitor with the Management API
Edge includes several APIs that you can use to perform service checks on your servers as well as check your users, organizations, and deployments. This section describes these APIs.
Perform service checks
The Management API provides several endpoints for monitoring and diagnosing issues with your services. These endpoints include:
Endpoint | Description |
---|---|
/servers/self/up |
Checks to see if a service is running. This API call does not require you to authenticate. If the service is running, this endpoint returns the following response: <ServerField> <Up>true</Up> </ServerField> If the service is not running, you will get a response similar to the following (depending on which service it is and how you checked it): curl: Failed connect to localhost:port_number; Connection refused |
/servers/self |
Returns information about the service, including:
This API call requires you to authenticate with your Apigee admin credentials. |
To use these endpoints, invoke a utility such as curl
with commands that use the
following syntax:
curl http://host:port_number/v1/servers/self/up
-H "Accept: [application/json|application/xml]"
curl http://host:port_number/v1/servers/self -u username:password
-H "Accept: [application/json|application/xml]"
Where:
- host is the IP address of the server you want to check. If you are logged into the server, you can use "localhost"; otherwise, specify the IP address of the server as well as the username and password.
- port_number is the Management API port for the server you want to check. This is a different port for each type of component. For example, the Management Server's Management API port is 8080. For a list of Management API port numbers to use, see JMX and Management API monitoring ports
To change the format of the response, you can specify the Accept
header as
"application/json" or "application/xml".
The following example gets the status of the Router on localhost (port 8081):
curl http://localhost:8081/v1/servers/self/up -H "Accept: application/xml"
The following example gets information about the Message Processor at 216.3.128.12 (port 8082):
curl http://216.3.128.12:8082/v1/servers/self -u sysAdminEmail:password -H "Accept: application/xml"
Monitor user, organization, and deployment status
You can use the Management API to monitor user, organization, and deployment status of your proxies on Management Servers and Message Processors by issuing the following commands:
curl http://host:port_number/v1/users -u sysAdminEmail:passwordcurl http://host:port_number/v1/organizations -u sysAdminEmail:password
curl http://host:port_number/v1/organizations/orgname/deployments -u sysAdminEmail:password
Where port_number is either 8080 for the Management Server or 8082 for the Message Processor.
This call requires you to authenticate with your system administration username and password.
The server should return a "deployed" status for all calls. If these fail, do the following:
- Check the server logs for any errors. The logs are located at:
- Management Server:
opt/apigee/var/log/edge-management-server
- Message Processor:
opt/apigee/var/log/edge-message-processor
- Management Server:
- Make a call against the server to check whether it is functioning properly.
- Remove the server from the ELB and then restart it:
/opt/apigee/apigee-service/bin/apigee-service service_name restart
Where service_name is:
edge-management-server
edge-message-processor
Check status with the apigee-service
command
You can troubleshoot your Edge services by using the apigee-service
command when you are
logged into the server running the service.
To check the status of a service with apigee-service
:
- Log in to the server and run the following command:
/opt/apigee/apigee-service/bin/apigee-service service_name status
Where service_name is one of the following:
- Management Server:
edge-management-server
- Message Processor:
edge-message-processor
- Postgres:
edge-postgres-server
- Qpid:
edge-qpid-server
- Router:
edge-router
For example:
/opt/apigee/apigee-service/bin/apigee-service edge-message-processor status
- Management Server:
- If the service is not running, start the service:
/opt/apigee/apigee-service/bin/apigee-service service_name start
- After restarting the service, check that it is functioning, either by using the
apigee-service status
command you used previously or by using the Management API described in Monitor with the Management API.For example:
curl -v http://localhost:port_number/v1/servers/self/up
Where port_number is the Management API port for the service.
This example assumes you are logged into the server and can use "localhost" as the hostname. To check the status remotely with the Management API, you must specify the IP address of the server and include the system administrator username and password in your API call.
Postgres monitoring
Postgres supports several utilities that you can use to check its status. These utilities are described in the sections that follow.
Check organizations and environments on Postgres
You can check for organization and environment names that are onboarded on the Postgres Server
by issuing the following curl
command:
curl -v http://postgres_IP:8084/v1/servers/self/organizations
The system should display the organization and environment name.
Verify analytics status
You can verify the status of the Postgres and Qpid analytics servers by issuing the following
curl
command:
curl -u userEmail:password http://host:port_number/v1/organizations/orgname/environments/envname/provisioning/axstatus
The system should display a success status for all analytics servers, as the following example shows:
{ "environments" : [ { "components" : [ { "message" : "success at Thu Feb 28 10:27:38 CET 2013", "name" : "pg", "status" : "SUCCESS", "uuid" : "[c678d16c-7990-4a5a-ae19-a99f925fcb93]" }, { "message" : "success at Thu Feb 28 10:29:03 CET 2013", "name" : "qs", "status" : "SUCCESS", "uuid" : "[ee9f0db7-a9d3-4d21-96c5-1a15b0bf0adf]" } ], "message" : "", "name" : "prod" } ], "organization" : "acme", "status" : "SUCCESS" }
PostgreSQL database
This section describes techniques that you can use specifically for monitoring the Postgres database.
Use the check_postgres.pl
script
To monitor the PostgreSQL database, you can use a standard monitoring script,
check_postgres.pl
. For more information, see
http://bucardo.org/wiki/Check_postgres.
Before you run the script:
- You must install the check_postgres.pl script on each Postgres node.
- Ensure that you have installed
perl-Time-HiRes.x86_64
, a Perl module that implements high resolution alarm, sleep, gettimeofday, and interval timers. For example, you can install it by using the following command:
yum install perl-Time-HiRes.x86_64
- CentOS 7: Before using check_postgres.pl on CentOS v7, install the
perl-Data-Dumper.x86_64
RPM.
check_postgres.pl output
The default output of the API calls using check_postgres.pl
is Nagios
compatible. After you install the script, do the following checks:
- Check the database size:
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -include=apigee -action database_size --warning='800 GB' --critical='900 GB'
- Check the number of incoming connections to the database and compares with maximum allowed
connections:
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action backends
- Check if database is running and available:
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action connection
- Check the disk space:
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action disk_space --warning='80%' --critical='90%'
- Check the number of organization and environment onboarded in a Postgres node:
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action=custom_query --query="select count(*) as result from pg_tables where schemaname='analytics' and tablename like '%fact'" --warning='80' --critical='90' --valtype=integer
Run database checks
You can verify that the proper tables are created in PostgreSQL database. Log in to PostgreSQL database using the following command:
psql -h /opt/apigee/var/run/apigee-postgresql/ -U apigee -d apigee
Then run:
\d analytics."org.env.fact"
Check health status of postgres process
You can perform API checks on the Postgres machine by invoking the following curl
command:
curl -v http://postgres_IP:8084/v1/servers/self/health
This command returns the ACTIVE
status when postgres process is active. If the
Postgres process is not up and running, it returns the INACTIVE
status.
Postgres resources
For additional information about monitoring the Postgres service, see the following:
- http://www.postgresql.org/docs/9.0/static/monitoring.html
- http://www.postgresql.org/docs/9.0/static/diskusage.html
- http://bucardo.org/check_postgres/check_postgres.pl.html
Apache Cassandra
Use JConsole: Monitor task statistics
Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX:
service:jmx:rmi:///jndi/rmi://IP_address:7199/jmxrmi
Where IP_address is the IP of the Cassandra server.
JMX is enabled by default for Cassandra and remote JMX access to Cassandra does not require a password.
Cassandra JMX statistics
JMX MBeans | JMX Attributes |
---|---|
ColumnFamilies/apprepo/environments ColumnFamilies/apprepo/organizations ColumnFamilies/apprepo/apiproxy_revisions ColumnFamilies/apprepo/apiproxies ColumnFamilies/audit/audits ColumnFamilies/audit/audits_ref |
PendingTasks |
MemtableColumnsCount |
|
MemtableDataSize |
|
ReadCount |
|
RecentReadLatencyMicros |
|
TotalReadLatencyMicros |
|
WriteCount |
|
RecentWriteLatencyMicros |
|
TotalWriteLatencyMicros |
|
TotalDiskSpaceUsed |
|
LiveDiskSpaceUsed |
|
LiveSSTableCount |
|
BloomFilterFalsePositives |
|
RecentBloomFilterFalseRatio |
|
BloomFilterFalseRatio |
Use nodetool to manage cluster nodes
The nodetool
utility is a command line interface for Cassandra that manages
cluster nodes. The utility can be found at /opt/apigee/apigee-cassandra/bin
.
The following calls can be made on all Cassandra cluster nodes:
- General ring info (also possible for single Cassandra node): Look for the
"Up" and "Normal" for all nodes.
nodetool -h localhost ring
The output of the above command looks as shown below:
Datacenter: dc-1 ========== Address Rack Status State Load Owns Token 192.168.124.201 ra1 Up Normal 1.67 MB 33,33% 0 192.168.124.202 ra1 Up Normal 1.68 MB 33,33% 5671...5242 192.168.124.203 ra1 Up Normal 1.67 MB 33,33% 1134...0484
- General info about nodes (call per node)
nodetool -h localhost info
The output of the above command looks like the following:
ID : e2e42793-4242-4e82-bcf0-oicu812 Gossip active : true Thrift active : true Native Transport active: true Load : 273.71 KB Generation No : 1234567890 Uptime (seconds) : 687194 Heap Memory (MB) : 314.62 / 3680.00 Off Heap Memory (MB) : 0.14 Data Center : dc-1 Rack : ra-1 Exceptions : 0 Key Cache : entries 150, size 13.52 KB, capacity 100 MB, 1520781 hits, 1520923 requests, 1.000 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token : 0
- Status of the thrift server (serving client API)
nodetool -h localhost statusthrift
The output of the above command looks like the following:
running
- Status of data streaming operations: Observe traffic for cassandra nodes:
nodetool -h localhost netstats
The output of the above command looks like the following:
Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 151612 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Commands n/a 0 0 0 Responses n/a 0 0 n/a
For more info on nodetool
, see
About the nodetool utility.
Cassandra monitoring (UI)
Refer to the datastax opscenter URL: http://www.datastax.com/products/opscenter.
Cassandra resource
Refer to the following URL: http://www.datastax.com/docs/1.0/operations/monitoring.
Apache ZooKeeper
Check ZooKeeper status
- Ensure the ZooKeeper process is running. ZooKeeper writes a PID file to
opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.pid
. - Test ZooKeeper ports to ensure that you can establish a TCP connection to ports 2181 and 3888 on every ZooKeeper server.
- Ensure that you can read values from the ZooKeeper database. Connect using a ZooKeeper
client library (or
/opt/apigee/apigee-zookeeper/bin/zkCli.sh
) and read a value from the database. - Check the status:
/opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper status
Use ZooKeeper four-letter words
ZooKeeper can be monitored via a small set of commands (four-letter words) that are sent to the port 2181 using netcat (nc) or telnet.
For more info on ZooKeeper commands, see: Apache ZooKeeper command reference.
For example:
srvr
: Lists full details for the server.stat
: Lists brief details for the server and connected clients.
The following commands can be issued to the ZooKeeper port:
- Run the four-letter command ruok to test if server is running in a non-error state. A
successful response returns "imok".
echo ruok | nc host 2181
Returns:
imok
- Run the four-letter command,
stat
, to list server performance and connected clients statistics:echo stat | nc host 2181
Returns:
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT Clients: /0:0:0:0:0:0:0:1:33467[0](queued=0,recved=1,sent=0) /192.168.124.201:42388[1](queued=0,recved=8433,sent=8433) /192.168.124.202:42185[1](queued=0,recved=1339,sent=1347) /192.168.124.204:39296[1](queued=0,recved=7688,sent=7692) Latency min/avg/max: 0/0/128 Received: 26144 Sent: 26160 Connections: 4 Outstanding: 0 Zxid: 0x2000002c2 Mode: follower Node count: 283
- If netcat (nc) is not available, you can use the python as an alternative. Create a file
named
zookeeper.py
that contains the following:import time, socket, sys c = socket.socket(socket.AF_INET, socket.SOCK_STREAM) c.connect((sys.argv[1], 2181)) c.send(sys.argv[2]) time.sleep(0.1) print c.recv(512)
Now run the following python lines:
python zookeeper.py 192.168.124.201 ruok
python zookeeper.py 192.168.124.201 stat
LDAP level test
You can monitor OpenLDAP to see whether the specific requests are served properly. In other words, check for a specific search that returns the right result.
- Use
ldapsearch
(yum install openldap-clients
) to query the entry of the system admin. This entry is used to authenticate all API calls.ldapsearch -b "uid=admin,ou=users,ou=global,dc=apigee,dc=com" -x -W -D "cn=manager,dc=apigee,dc=com" -H ldap://localhost:10389 -LLL
You are then prompted for the LDAP admin password:
Enter LDAP Password:
After entering the password, you see a response in the form:
dn: uid=admin,ou=users,ou=global,dc=apigee,dc=com objectClass: organizationalPerson objectClass: person objectClass: inetOrgPerson objectClass: top uid: admin cn: admin sn: admin userPassword:: e1NTSEF9bS9xbS9RbVNXSFFtUWVsU1F0c3BGL3BQMkhObFp2eDFKUytmZVE9PQ= = mail: opdk@google.com
- Check whether Management Server is still connected to LDAP with the following command:
curl -u userEMail:password http://localhost:8080/v1/users/ADMIN
Returns:
{ "emailId" : ADMIN, "firstName" : "admin", "lastName" : "admin" }
You can also monitor the OpenLDAP caches, which help in reducing the number of disk accesses
and hence improve the performance of the system. Monitoring and then tuning the cache size in the
OpenLDAP server can heavily impact the performance of the directory server. You can view the log
files (opt/apigee/var/log
) to obtain information about cache.