Edge for Private Cloud v. 4.17.09
This document describes the monitoring techniques of components supported by an on-premise deployment of Apigee Edge.
Enabling JMX
JMX is enabled by default for Cassandra, and disabled by default for all other Edge components. You must therefore enable JMX individually for each component.
Each component supports JMX on a different port. The following table lists the JMX port and the file that you modify to enable JMX on that port:
| Component | JMX Port | File | 
|---|---|---|
| Management Server | 1099 | /opt/apigee/edge-management-server/bin/start | 
| Router | 1100 | /opt/apigee/edge-router/bin/start | 
| Message Processor | 1101 | /opt/apigee/edge-message-processor/bin/start | 
| Qpid | 1102 | /opt/apigee/edge-qpid-server/bin/start | 
| Postgres | 1103 | /opt/apigee/edge-postgres-server/bin/start | 
For example, to enable JMX on the Management Server, open /opt/apigee/edge-management-server/bin/start in an editor. You should see the following line used to start the Management Server:
exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts -Djava.security.auth.login.config=$conf_path/jaas.config -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path -Ddata.dir=$data_dir $* $debug_options com.apigee.kernel.MicroKernel
Edit this line to add the following:
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
Note that this line specifies the JMX port number as 1099 for the Management Server. Set the port number for each component as defined in the table above. For example:
exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.security.auth.login.config=$conf_path/jaas.config -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path -Ddata.dir=$data_dir $* $debug_options com.apigee.kernel.MicroKernel
Save the file and then restart the component. For example to restart the Management Server:
> /opt/apigee/apigee-service/bin/ apigee-service edge-management-server restart
Enabling JMX authentication and setting the JMX password
The monitoring process for the Management Server, Message Processor, Qpid, and Postgres all use JMX. JMX is enabled by default and remote JMX access does not require a password.
To enable JMX authentication, each component has a change_jmx_auth action that you use to enable/disable authentication and to set the JMX credentials.
To enable JMX authentication, use the following command:
> /opt/apigee/apigee-service/bin/apigee-service comp change_jmx_auth optionsOrConfigFile
where:
- comp is either edge-management-server, edge-message-processor, edge-qpid-server, or edge-postgres-server.
- Options are:
      - -u: username
- -p: password
- -e: y (enable) or n (dsiable)
 
- Config file includes:
      - JMX_USERNAME=username
- JMX_ENABLED=y/n
- JMX_PASSWORD=password (if not set or not passed in with -p, you are prompted)
 
For example, to use options on the command line:
> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -u foo -p bar -e y
If you have a config file:
> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -f configFile
If you are running Edge on multiple nodes, run this command on all nodes, specifying the same username and password.
To later disable JMX authentication, use the command:
> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -e n
Management Server
Using JConsole to monitor system health check and process information
Use JConsole (a JMX compliant tool) to manage and monitor health check and process statistics. Using JConsole, you can consume JMX statistics exposed by Management Server (or any server) and display them in a graphical interface. For more information on JConsole usage, see http://docs.oracle.com/javase/8/docs/technotes/guides/management/jconsole.html.
Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX.
service:jmx:rmi:///jndi/rmi://<ip address>:<port>/jmxrmi
where <ip address> is the IP address of Management Server (or respective server). By default the port is 1099 for the Management Server.
The following table shows the generic JMX statistics:
| JMX MBeans | JMX Attributes | 
|---|---|
| Memory | HeapMemoryUsage | 
| NonHeapMemoryUsage | |
| Usage | |
| Note: Attribute values will be displayed in four values: committed, init, max, and used. | |
Using Edge Application API checks
You can perform API check on the Management Server (or any server) by invoking the following CURL command:
curl http://<host>:8080/v1/servers/self/up -H "Accept: application/json"
Where, <host> is the IP address of Management Server. You can specify the Accept type as application/json or application/xml.
This call returns the "true" and "false". If true, that means node is up and Java service is running.
If you do not receive a HTTP 200 (OK) response, the Edge is unable to respond to port 8080 requests.
Troubleshooting
- Login to the server and run the following command:
 /opt/apigee/apigee-service/bin/apigee-service edge-management-server status
- If the service is not running start the service:
 /opt/apigee/apigee-service/bin/apigee-service edge-management-server start
Using Edge Application – Users, organization and deployment checks
Management Server plays a vital role in holding all other parcels together in each on-premises installation. You can check for user, organization and deployment status on the management server by issuing the following commands:
curl -u userEmail:password http://localhost:8080/v1/users curl -u userEmail:password http://localhost:8080/v1/organizations curl -u userEmail:password http://localhost:8080/v1/organizations/orgname/deployments
The system should display "deployed" status for all calls. If these fail, do the following:
- Check the Management Server logs (at opt/apigee/var/log/edge-management-server) for any errors.
- Make a call against Management Server to check whether it is functioning properly.
- Remove the server from the ELB and then restart the Management Server.
 /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
Router
You can perform API check on the Router (or any server) by invoking the following CURL command:
curl http://<host>:8081/v1/servers/self/up
Where, host is the IP address of Router.
This call returns the "true" and "false". If true, that means the node is up and Router service is running.
If you do not receive a HTTP 200 (OK) response, Edge is unable to respond to port 8081 requests.
Troubleshooting
- Login to the server and run the following commands:
 /<inst_root>/apigee/apigee-service/bin/apigee-service edge-router status
- If the service is not running start the service
 /<inst_root>/apigee/apigee-service/bin/apigee-service edge-router start
- After restart check that it is functioning
 curl -v http://localhost:port/v1/servers/self/up
 
 Where port is 8081 for Router and 8082 for Message Processor.
Using JConsole to monitor system health check and process information
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 1100.
Message Processor
Using JConsole to monitor system health check and process information
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 1101.
Using Edge Application API checks
Follow the same as described above for the Router.
Note: Ensure that you use port – 8082.
Using JMX message flow checks
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 1101.
Qpid Server
Using JConsole to monitor system health check and process information
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 1102.
Using Edge Application API checks
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 8083. The following CURL command is also supported for Qpid Server:
curl http://<qpid_IP>:8083/v1/servers/self
Postgres Server
Using JConsole to monitor system health check and process information
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 1103.
Using Edge Application API checks
Follow the same as described above for the Management Server.
Note: Ensure that you use port – 8084. The following CURL command is also supported for Postgres Server:
curl http://<postgres_IP>:8084/v1/servers/self
Using Edge Application organization and environment checks
You can check for organization and environment name that are onboarded on the Postgres Server by issuing the following CURL commands:
curl http:// <postgres_IP>:8084/v1/servers/self/organizations
Note: Ensure that you use port – 8084.
The system should display the organization and environment name.
Using Edge Application axstatus check
You can verify the status of the analytics servers by issuing the following CURL command.
curl -u userEmail:password http://<host>:<port>/v1/organizations/<orgname>/environments/<envname>/provisioning/axstatus
The system should display SUCCESS status for all analytics servers. The output of above CURL command is shown below:
{
  "environments" : [ {
    "components" : [ {
      "message" : "success at Thu Feb 28 10:27:38 CET 2013",
      "name" : "pg",
      "status" : "SUCCESS",
      "uuid" : "[c678d16c-7990-4a5a-ae19-a99f925fcb93]"
     }, {
      "message" : "success at Thu Feb 28 10:29:03 CET 2013",
      "name" : "qs",
      "status" : "SUCCESS",
      "uuid" : "[ee9f0db7-a9d3-4d21-96c5-1a15b0bf0adf]"
     } ],
    "message" : "",
    "name" : "prod"
   } ],
  "organization" : "acme",
  "status" : "SUCCESS"
}PostgreSQL Database
Using the check_postgres.pl script
To monitor the PostgreSQL database, you can use a standard monitoring script, check_postgres.pl which is available at http://bucardo.org/wiki/Check_postgres.
Note: The script, check_postgres.pl needs to be installed in each Postgres node.
Before you run the script:
- Ensure that you have installed perl-Time-HiRes.x86_64, a Perl module that
    implements high resolution alarm, sleep, gettimeofday, and interval timers. For example, you
    can install it by using the following command:
 yum install perl-Time-HiRes.x86_64
The default output of the API calls using the script, check_postgres.pl is Nagios compatible. After you install the script, do the following checks:
- Database size – check the database size:
 check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -include=apigee -action database_size --warning='800 GB' --critical='900 GB'
- Incoming connection to the database – checks the number of incoming connections to
    the database and compares with maximum allowed connections:
 check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action backends
- Database availability and performance – checks if database is running and
    available:
 check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action connection
- Disk space – checks the disk space:
 check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action disk_space --warning='80%' --critical='90%'
- Onboarded organizations/environments – checks the number of organization and
    environment onboarded in a Postgres node:
 check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action=custom_query --query="select count(*) as result from pg_tables where schemaname='analytics' and tablename like '%fact'" --warning='80' --critical='90' --valtype=integer
Note: Please refer to the http://bucardo.org/check_postgres/check_postgres.pl.html in case you need any help on using the above commands.
DB Checks
You can verify that the proper tables are created in PostgreSQL database. Login to PostgreSQL database using:
psql -h /opt/apigee/var/run/apigee-postgresql/ -U apigee -d apigee
and then run:
\d analytics."<org>.<env>.fact"
Check health status of postgres process
You can perform API check on the postgres machine by invoking the following CURL command:
http://<postgres_IP>:8084/v1/servers/self/health/
Note: Ensure that you use port 8084.
It returns the ‘ACTIVE’ status when postgres process is active. If the postgres process is not up and running, it returns the ‘INACTIVE’ status.
Postgres Resources
- http://www.postgresql.org/docs/9.0/static/monitoring.html
- http://www.postgresql.org/docs/9.0/static/diskusage.html
- http://bucardo.org/check_postgres/check_postgres.pl.html
Apache Cassandra
Using JConsole – monitor task statistics
Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX.
service:jmx:rmi:///jndi/rmi://<ip address>:7199/jmxrmi
where <ip address> is the IP of the Cassandra server.
JMX is enabled by default for Cassandra and remote JMX access to Cassandra does not require a password.
To enable JMX authentication to add a password:
- Edit /opt/apigee/customer/application/cassandra.properties. If the file does not exist, create it.
- Add the following to the file:
 conf_cassandra-env_com.sun.management.jmxremote.authenticate=true
- Save the file.
- Copy the following files from your $JAVA_HOME directory to
    /opt/apigee/data/apigee-cassandra/:
 cp ${JAVA_HOME}/lib/management/jmxremote.password.template $APIGEE_ROOT/data/apigee-cassandra/jmxremote.password
 
 cp ${JAVA_HOME}/lib/management/jmxremote.access $APIGEE_ROOT/data/apigee-cassandra/jmxremote.access
- Edit jmxremote.password and
    add username and password to the file:
 cassandra password
 
 where password is the JMX password.
- Edit jmxremote.access and
    add the following role:
 cassandra readwrite
- Make sure the files are owned by "apigee" and that the file mode is 400:
 > chown apigee:apigee /opt/apigee/data/apigee-cassandra/jmxremote.*
 > chmod 400 /opt/apigee/data/apigee-cassandra/jmxremote.*
- Run configure on
    Cassandra:
 > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra configure
- Restart Cassandra:
 > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra restart
- Repeat this process on all other Cassandra nodes.
To later disable authentication:
- Edit /opt/apigee/customer/application/cassandra.properties.
- Remove the following line in the file:
 conf_cassandra-env_com.sun.management.jmxremote.authenticate=true
- Run configure on Cassandra:
 > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra configure
- Restart Cassandra:
 > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra restart
- Repeat this process on all other Cassandra nodes.
Cassandra JMX statistics
| JMX MBeans | JMX Attributes | 
|---|---|
| ColumnFamilies/apprepo/environments ColumnFamilies/apprepo/organizations ColumnFamilies/apprepo/apiproxy_revisions ColumnFamilies/apprepo/apiproxies ColumnFamilies/audit/audits ColumnFamilies/audit/audits_ref | PendingTasks | 
| MemtableColumnsCount | |
| MemtableDataSize | |
| ReadCount | |
| RecentReadLatencyMicros | |
| TotalReadLatencyMicros | |
| WriteCount | |
| RecentWriteLatencyMicros | |
| TotalWriteLatencyMicros | |
| TotalDiskSpaceUsed | |
| LiveDiskSpaceUsed | |
| LiveSSTableCount | |
| BloomFilterFalsePositives | |
| RecentBloomFilterFalseRatio | |
| BloomFilterFalseRatio | 
Using nodetool utility to manage cluster nodes
The nodetool utility, which is a command line interface for Cassandra, is used to manage cluster nodes. The utility can be found at opt/apigee/apigee-cassandra/bin.
For more info on nodetool utility, see http://www.datastax.com/docs/1.0/references/nodetool.
The following calls can be made on all Cassandra cluster nodes:
- General ring info (also possible for single Cassandra node): Look for the
    "Up" and "Normal" for all nodes.
 [host]# nodetool -h localhost ring
 
 The output of the above command looks as shown below:
 Address DC Rack Status State Load Owns Token
 192.168.124.201 dc1 ra1 Up Normal 1.67 MB 33,33% 0
 192.168.124.202 dc1 ra1 Up Normal 1.68 MB 33,33% 56713727820156410577229101238628035242
 192.168.124.203 dc1 ra1 Up Normal 1.67 MB 33,33% 113427455640312821154458202477256070484
- General info about nodes (call per node)
 nodetool -h localhost info
 
 The output of the above command looks as shown below:
 Token : 0
 Gossip active : true
 Load : 1.67 MB
 Generation No : 1361968765
 Uptime (seconds) : 78108
 Heap Memory (MB) : 46,80 / 772,00
 Data Center : dc1
 Rack : ra1
 Exceptions : 0
- Status of the thrift server (serving client API)
 host]# nodetool -h localhost statusthrift
 
 The output of the above command displays status as "running".
- Status of data streaming operations: Observe traffic for cassandra
    nodes
 nodetool -h localhost netstats 192.168.124.203
 
 The output of the above command looks as shown below:
 Mode: NORMAL
 Nothing streaming to /192.168.124.203
 Nothing streaming from /192.168.124.203
 Pool Name Active Pending Completed
 Commands n/a 0 1688
 Responses n/a 0 292277
Cassandra Monitoring (UI)
Refer to the datastax opscenter URL: http://www.datastax.com/products/opscenter.
Cassandra Resource
Refer to the following URL: http://www.datastax.com/docs/1.0/operations/monitoring.
Apache ZooKeeper
Checking ZooKeeper status
- Ensure the ZooKeeper process is running. ZooKeeper writes a PID file to opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.pid.
- Test ZooKeeper ports to ensure that you can establish a TCP connection to ports 2181 and 3888 on every ZooKeeper server.
- Ensure that you can read values from the ZooKeeper database. Connect using a ZooKeeper client library (or /opt/apigee/apigee-zookeeper/bin/zkCli.sh) and read a value from the database.
- Check the status:
 > /opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper status
Using ZooKeeper Four Letter Words
ZooKeeper can be monitored via a small set of commands (four-letter words) that are sent to the port 2181 using netcat (nc) or telnet.
For more info on ZooKeeper commands, see: http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands.
For example:
- srvr: Lists full details for the server.
- stat: Lists brief details for the server and connected clients.
The following commands can be issued to the ZooKeeper port:
- Run the four-letter command ruok to test if server is running in a non-error state. A
    successful response returns "imok".
 echo ruok | nc <host> 2181
 
 Returns:
 imok
- Run the four-letter command, stat to list server performance and connected clients
    statistics.
 echo stat | nc <host> 2181
 
 Returns:
 Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
 Clients:
 /0:0:0:0:0:0:0:1:33467[0](queued=0,recved=1,sent=0)
 /192.168.124.201:42388[1](queued=0,recved=8433,sent=8433)
 /192.168.124.202:42185[1](queued=0,recved=1339,sent=1347)
 /192.168.124.204:39296[1](queued=0,recved=7688,sent=7692)
 Latency min/avg/max: 0/0/128
 Received: 26144
 Sent: 26160
 Connections: 4
 Outstanding: 0
 Zxid: 0x2000002c2
 Mode: follower
 Node count: 283
 Note: It is sometimes important to see whether a ZooKeeper is in Mode: leader, follower or observer.
- If netcat (nc) is not available, you can use the python as an alternative. Create a file
    named zookeeper.py that
    contains the following:
 import time, socket,
 sys c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 c.connect((sys.argv[1], 2181))
 c.send(sys.argv[2])
 time.sleep(0.1)
 print c.recv(512)
 
 Now run the following python lines:
 python zookeeper.py 192.168.124.201 ruok
 python zookeeper.py 192.168.124.201 stat
OpenLDAP
LDAP Level Test
You can monitor the OpenLDAP to see whether the specific requests are served properly. In other words, check for a specific search that returns the right result.
- Use ldapsearch
    (yum install openldap-clients)
    to query the entry of the system admin. This entry is used to authenticate all API calls.
 ldapsearch -b "uid=admin,ou=users,ou=global,dc=apigee,dc=com" -x -W -D "cn=manager,dc=apigee,dc=com" -H ldap://localhost:10389 -LLL
 
 You are then prompted for the LDAP admin password:
 Enter LDAP Password:
 
 After entering the password, you see a response in the form:
 dn: uid=admin,ou=users,ou=global,dc=apigee,dc=com
 objectClass: organizationalPerson
 objectClass: person
 objectClass: inetOrgPerson
 objectClass: top
 uid: admin
 cn: admin
 sn: admin
 userPassword:: e1NTSEF9bS9xbS9RbVNXSFFtUWVsU1F0c3BGL3BQMkhObFp2eDFKUytmZVE9PQ=
 =
 mail: opdk@google.com
- Check whether Management Server is still connected to LDAP issue:
 curl -u <userEMail>:<password> http://localhost:8080/v1/users/<ADMIN>
 
 Returns:
 {
 "emailId" : <ADMIN>,
 "firstName" : "admin",
 "lastName" : "admin"
 }
You can also monitor the OpenLDAP caches, which help in reducing the number of disk accesses and hence improve the performance of the system. Monitoring and then tuning the cache size in the OpenLDAP server can heavily impact the performance of the directory server. You can view the log files (opt/apigee/var/log) to obtain information about cache.