How to monitor

Edge for Private Cloud v4.18.05

This document describes the monitoring techniques of components supported by an on-premise deployment of Apigee Edge.

Overview

Edge supports several ways for getting details about services as well as checking their statuses. The following table lists the types of checks you can perform on each eligible service:

Service	JMX:^* Memory Usage	Mgmt API: Service Check	Mgmt API: User/Org/ Deployment Status	Mgmt API: axstatus	Database check	`apigee-service` Status
Management Server
Message Processor
Postgres
Qpid
Router
	More Info	More Info	More Info	More Info	More Info	More Info
^* Before you can use JMX, you must enable it, as described in Enable JMX.

JMX and Management API monitoring ports

Each component supports JMX and Management API monitoring calls on different ports. The following table lists the JMX and Management API ports for each type of server:

Component	JMX Port	Management API Port
Management Server	1099	8080
Router	1100	8081
Message Processor	1101	8082
Qpid	1102	8083
Postgres	1103	8084

Use JMX

The monitoring processes for the Management Server, Message Processor, Qpid, and Postgres all use JMX. However, JMX is enabled by default only for Cassandra, and disabled by default for all other Edge components. You must therefore enable JMX individually for each component before you can monitor them.

JMX authentication is not enabled by default. You can enable JMX authentication for all components except Cassandra.

Enable JMX

JMX is enabled by default only for Cassandra, and disabled by default for all other Edge components. This section describes how to enable JMX for the other Edge components.

To enable JMX:

Edit the component's configuration file. This file is located at opt/apigee/edge-component_name/bin/start. In production environments, these configuration files will be on different machines.
Choose from the following file locations on each server:
- Management Server: /opt/apigee/edge-management-server/bin/start
- Message Processor: /opt/apigee/edge-message-processor/bin/start
- Postgres: /opt/apigee/edge-postgres-server/bin/start
- Qpid: /opt/apigee/edge-qpid-server/bin/start
- Router: /opt/apigee/edge-router/bin/start
For example, the Management Server's configuration file on its server is at /opt/apigee/edge-management-server/bin/start.

Add the following com.sun.management.jmxremote options to the exec line that starts the component:

-Dcom.sun.management.jmxremote \
  -Dcom.sun.management.jmxremote.port=port_number \
  -Dcom.sun.management.jmxremote.local.only=false \
  -Dcom.sun.management.jmxremote.authenticate=false \
  -Dcom.sun.management.jmxremote.ssl=false

Where port_number is the JMX port for the service. To get your service's JMX port number, see JMX and Management API monitoring ports.

For example, to enable JMX on the Management Server, add the following to the Management Server's configuration file:

exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts \
  -Djava.security.auth.login.config=$conf_path/jaas.config \
  -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path \
  -Ddata.dir=$data_dir \
  -Dcom.sun.management.jmxremote \
  -Dcom.sun.management.jmxremote.port=1099 \
  -Dcom.sun.management.jmxremote.local.only=false \
  -Dcom.sun.management.jmxremote.authenticate=false \
  -Dcom.sun.management.jmxremote.ssl=false \
  $* $debug_options com.apigee.kernel.MicroKernel

This example specifies port 1099 for the Management Server. As stated previously, each service has its own port number.

The edited line in the configuration file looks like the following:

exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts -Djava.security.auth.login.config=$conf_path/jaas.config -Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path -Ddata.dir=$data_dir -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false $* $debug_options com.apigee.kernel.MicroKernel

Save the configuration file.
Restart the component with the restart command.
For example, to restart the Management Server, execute the following command:
```
/opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
```

Authentication for JMX is not enabled by default. You can enable JMX authentication for all components except Cassandra, as described in Enable JMX authentication.

Enable JMX authentication

JMX authentication is not enabled by default. You can enable JMX authentication for all components except Cassandra.

To enable JMX authentication, execute the following change_jmx_auth action on all nodes:

/opt/apigee/apigee-service/bin/apigee-service component change_jmx_auth [options|-f config_file]

Where:

component is one of the following:
- edge-management-server
- edge-message-processor
- edge-postgres-server
- edge-qpid-server
- edge-router
options specifies the following:
- -u username
- -p password
- -e [y|n] (enable or disable)
config_file specifies the location of a configuration file in which you define the following:
- JMX_USERNAME=username
- JMX_ENABLED=y|n
- JMX_PASSWORD=password (if not set or not passed in with -p, you are prompted)

You can either use the command line options or the configuration file to define the username, password, and enable/disable state. You do not specify both a set of options and a configuration file.

The following example enables JMX authentication for the Management Server using the command line options:

/opt/apigee/apigee-service/bin/apigee-service edge-management-server
    change_jmx_auth -u foo -p bar -e y

The following example uses a configuration file rather than command line options:

/opt/apigee/apigee-service/bin/apigee-service edge-management-server
    change_jmx_auth -f /tmp/my-config-file

If you are running Edge on multiple nodes, run the command on all nodes, specifying the same username and password.

To disable JMX authentication on the command line, use the "-e n" option, as the following example shows:

/opt/apigee/apigee-service/bin/apigee-service edge-management-server
    change_jmx_auth -e n

Monitor with JConsole

Use JConsole (a JMX compliant tool) to manage and monitor health check and process statistics. With JConsole, you can consume JMX statistics exposed by your servers and display them in a graphical interface. For more information, see Using JConsole.

JConsole uses the following service URL to monitor the JMX attributes (MBeans) offered via JMX:

service:jmx:rmi:///jndi/rmi://IP_address:port_number/jmxrmi

Where:

IP_address is the IP address of the server you want to monitor.
port_number is the JMX port number of the server you want to monitor.

For example, to monitor the Management Server, issue a command like the following (assuming the server's IP address is 216.3.128.12):

service:jmx:rmi:///jndi/rmi://216.3.128.12:1099/jmxrmi

Note that this example specifies port 1099, which is the Management Server JMX port. For other ports, see JMX and Management API monitoring ports.

The following table shows the generic JMX statistics:

JMX MBeans	JMX Attributes
Memory	HeapMemoryUsage
	NonHeapMemoryUsage
	Usage
NOTE Attribute values are displayed in four values: committed, init, max, and used.

Monitor with the Management API

Edge includes several APIs that you can use to perform service checks on your servers as well as check your users, organizations, and deployments. This section describes these APIs.

Perform service checks

The Management API provides several endpoints for monitoring and diagnosing issues with your services. These endpoints include:

Endpoint Description

Endpoint	Description
`/servers/self/up`	Checks to see if a service is running. This API call does not require you to authenticate. If the service is running, this endpoint returns the following response: <ServerField> <Up>true</Up> </ServerField> If the service is not running, you will get a response similar to the following (depending on which service it is and how you checked it): curl: Failed connect to localhost:`port_number`; Connection refused
`/servers/self`	Returns information about the service, including: Configuration properties Start time and up time Build, RPM, and UUID information Internal and external hostname and IP address Region and pod `<isUp>` property, indicating whether the service is running This API call requires you to authenticate with your Apigee admin credentials.

/servers/self/up

Checks to see if a service is running. This API call does not require you to authenticate.

If the service is running, this endpoint returns the following response:

<ServerField>
  <Up>true</Up>
</ServerField>

If the service is not running, you will get a response similar to the following (depending on which service it is and how you checked it):

curl: Failed connect to localhost:port_number; Connection refused

/servers/self

Returns information about the service, including:

Configuration properties
Start time and up time
Build, RPM, and UUID information
Internal and external hostname and IP address
Region and pod
<isUp> property, indicating whether the service is running

This API call requires you to authenticate with your Apigee admin credentials.

To use these endpoints, invoke a utility such as curl with commands that use the following syntax:

curl http://host:port_number/v1/servers/self/up
  -H "Accept: [application/json|application/xml]"
curl http://host:port_number/v1/servers/self -u username:password
  -H "Accept: [application/json|application/xml]"

Where:

host is the IP address of the server you want to check. If you are logged into the server, you can use "localhost"; otherwise, specify the IP address of the server as well as the username and password.
port_number is the Management API port for the server you want to check. This is a different port for each type of component. For example, the Management Server's Management API port is 8080. For a list of Management API port numbers to use, see JMX and Management API monitoring ports

To change the format of the response, you can specify the Accept header as "application/json" or "application/xml".

The following example gets the status of the Router on localhost (port 8081):

curl http://localhost:8081/v1/servers/self/up -H "Accept: application/xml"

The following example gets information about the Message Processor at 216.3.128.12 (port 8082):

curl http://216.3.128.12:8082/v1/servers/self -u sysAdminEmail:password
  -H "Accept: application/xml"

Monitor user, organization, and deployment status

You can use the Management API to monitor user, organization, and deployment status of your proxies on Management Servers and Message Processors by issuing the following commands:

curl http://host:port_number/v1/users -u sysAdminEmail:password
curl http://host:port_number/v1/organizations -u sysAdminEmail:password
curl http://host:port_number/v1/organizations/orgname/deployments -u sysAdminEmail:password

Where port_number is either 8080 for the Management Server or 8082 for the Message Processor.

This call requires you to authenticate with your system administration username and password.

The server should return a "deployed" status for all calls. If these fail, do the following:

Check the server logs for any errors. The logs are located at:
- Management Server: opt/apigee/var/log/edge-management-server
- Message Processor: opt/apigee/var/log/edge-message-processor
Make a call against the server to check whether it is functioning properly.
Remove the server from the ELB and then restart it:
```
/opt/apigee/apigee-service/bin/apigee-service service_name restart
```
Where service_name is:
- edge-management-server
- edge-message-processor

Check status with the `apigee-service` command

You can troubleshoot your Edge services by using the apigee-service command when you are logged into the server running the service.

To check the status of a service with apigee-service:

Log in to the server and run the following command:
```
/opt/apigee/apigee-service/bin/apigee-service service_name status
```
Where service_name is one of the following:
- Management Server: edge-management-server
- Message Processor: edge-message-processor
- Postgres: edge-postgres-server
- Qpid: edge-qpid-server
- Router: edge-router
For example:
```
/opt/apigee/apigee-service/bin/apigee-service edge-message-processor status
```

If the service is not running, start the service:

/opt/apigee/apigee-service/bin/apigee-service service_name start

After restarting the service, check that it is functioning, either by using the apigee-service status command you used previously or by using the Management API described in Monitor with the Management API.
For example:
```
curl -v http://localhost:port_number/v1/servers/self/up
```
Where port_number is the Management API port for the service.

This example assumes you are logged into the server and can use "localhost" as the hostname. To check the status remotely with the Management API, you must specify the IP address of the server and include the system administrator username and password in your API call.

Postgres monitoring

Postgres supports several utilities that you can use to check its status. These utilities are described in the sections that follow.

Check organizations and environments on Postgres

You can check for organization and environment names that are onboarded on the Postgres Server by issuing the following curl command:

curl -v http://postgres_IP:8084/v1/servers/self/organizations

The system should display the organization and environment name.

Verify analytics status

You can verify the status of the Postgres and Qpid analytics servers by issuing the following curl command:

curl -u userEmail:password http://host:port_number/v1/organizations/orgname/environments/envname/provisioning/axstatus

The system should display a success status for all analytics servers, as the following example shows:

{
  "environments" : [ {
    "components" : [ {
      "message" : "success at Thu Feb 28 10:27:38 CET 2013",
      "name" : "pg",
      "status" : "SUCCESS",
      "uuid" : "[c678d16c-7990-4a5a-ae19-a99f925fcb93]"
     }, {
      "message" : "success at Thu Feb 28 10:29:03 CET 2013",
      "name" : "qs",
      "status" : "SUCCESS",
      "uuid" : "[ee9f0db7-a9d3-4d21-96c5-1a15b0bf0adf]"
     } ],
    "message" : "",
    "name" : "prod"
   } ],
  "organization" : "acme",
  "status" : "SUCCESS"
}

PostgreSQL database

This section describes techniques that you can use specifically for monitoring the Postgres database.

Use the `check_postgres.pl` script

To monitor the PostgreSQL database, you can use a standard monitoring script, check_postgres.pl. For more information, see http://bucardo.org/wiki/Check_postgres.

Before you run the script:

You must install the check_postgres.pl script on each Postgres node.
Ensure that you have installed perl-Time-HiRes.x86_64, a Perl module that implements high resolution alarm, sleep, gettimeofday, and interval timers. For example, you can install it by using the following command:
```
yum install perl-Time-HiRes.x86_64
```
CentOS 7: Before using check_postgres.pl on CentOS v7, install the perl-Data-Dumper.x86_64 RPM.

check_postgres.pl output

The default output of the API calls using check_postgres.pl is Nagios compatible. After you install the script, do the following checks:

Check the database size:

check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -include=apigee -action database_size --warning='800 GB' --critical='900 GB'

Check the number of incoming connections to the database and compares with maximum allowed connections:
```
check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action backends
```

Check if database is running and available:

check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action connection

Check the disk space:

check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action disk_space --warning='80%' --critical='90%'

Check the number of organization and environment onboarded in a Postgres node:

check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action=custom_query --query="select count(*) as result from pg_tables where schemaname='analytics' and tablename like '%fact'" --warning='80' --critical='90' --valtype=integer

Run database checks

You can verify that the proper tables are created in PostgreSQL database. Log in to PostgreSQL database using the following command:

psql -h /opt/apigee/var/run/apigee-postgresql/ -U apigee -d apigee

Then run:

\d analytics."org.env.fact"

Check health status of postgres process

You can perform API checks on the Postgres machine by invoking the following curl command:

curl -v http://postgres_IP:8084/v1/servers/self/health

This command returns the ACTIVE status when postgres process is active. If the Postgres process is not up and running, it returns the INACTIVE status.

Postgres resources

For additional information about monitoring the Postgres service, see the following:

Apache Cassandra

Use JConsole: Monitor task statistics

Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX:

service:jmx:rmi:///jndi/rmi://IP_address:7199/jmxrmi

Where IP_address is the IP of the Cassandra server.

JMX is enabled by default for Cassandra and remote JMX access to Cassandra does not require a password.

Cassandra JMX statistics

JMX MBeans	JMX Attributes
ColumnFamilies/apprepo/environments ColumnFamilies/apprepo/organizations ColumnFamilies/apprepo/apiproxy_revisions ColumnFamilies/apprepo/apiproxies ColumnFamilies/audit/audits ColumnFamilies/audit/audits_ref	PendingTasks
	MemtableColumnsCount
	MemtableDataSize
	ReadCount
	RecentReadLatencyMicros
	TotalReadLatencyMicros
	WriteCount
	RecentWriteLatencyMicros
	TotalWriteLatencyMicros
	TotalDiskSpaceUsed
	LiveDiskSpaceUsed
	LiveSSTableCount
	BloomFilterFalsePositives
	RecentBloomFilterFalseRatio
	BloomFilterFalseRatio

Use nodetool to manage cluster nodes

The nodetool utility is a command line interface for Cassandra that manages cluster nodes. The utility can be found at /opt/apigee/apigee-cassandra/bin.

The following calls can be made on all Cassandra cluster nodes:

General ring info (also possible for single Cassandra node): Look for the "Up" and "Normal" for all nodes.

nodetool -h localhost ring

The output of the above command looks as shown below:

Datacenter: dc-1
==========
Address            Rack     Status State   Load    Owns    Token
192.168.124.201    ra1      Up     Normal  1.67 MB 33,33%  0
192.168.124.202    ra1      Up     Normal  1.68 MB 33,33%  5671...5242
192.168.124.203    ra1      Up     Normal  1.67 MB 33,33%  1134...0484

General info about nodes (call per node)

nodetool -h localhost info

The output of the above command looks like the following:

ID                     : e2e42793-4242-4e82-bcf0-oicu812
Gossip active          : true
Thrift active          : true
Native Transport active: true
Load                   : 273.71 KB
Generation No          : 1234567890
Uptime (seconds)       : 687194
Heap Memory (MB)       : 314.62 / 3680.00
Off Heap Memory (MB)   : 0.14
Data Center            : dc-1
Rack                   : ra-1
Exceptions             : 0
Key Cache              : entries 150, size 13.52 KB, capacity 100 MB, 1520781 hits, 1520923 requests, 1.000 recent hit rate, 14400 save period in seconds
Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Token                  : 0

Status of the thrift server (serving client API)
```
nodetool -h localhost statusthrift
```
The output of the above command looks like the following:
```
running
```

Status of data streaming operations: Observe traffic for cassandra nodes:

nodetool -h localhost netstats

The output of the above command looks like the following:

Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 151612
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Commands                        n/a         0              0         0
Responses                       n/a         0              0       n/a

For more info on nodetool, see About the nodetool utility.

Cassandra monitoring (UI)

Refer to the datastax opscenter URL: http://www.datastax.com/products/opscenter.

Cassandra resource

Refer to the following URL: http://www.datastax.com/docs/1.0/operations/monitoring.

Apache ZooKeeper

Check ZooKeeper status

Ensure the ZooKeeper process is running. ZooKeeper writes a PID file to opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.pid.
Test ZooKeeper ports to ensure that you can establish a TCP connection to ports 2181 and 3888 on every ZooKeeper server.
Ensure that you can read values from the ZooKeeper database. Connect using a ZooKeeper client library (or /opt/apigee/apigee-zookeeper/bin/zkCli.sh) and read a value from the database.

Check the status:

/opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper status

Use ZooKeeper four-letter words

ZooKeeper can be monitored via a small set of commands (four-letter words) that are sent to the port 2181 using netcat (nc) or telnet.

For more info on ZooKeeper commands, see: Apache ZooKeeper command reference.

For example:

srvr: Lists full details for the server.
stat: Lists brief details for the server and connected clients.

The following commands can be issued to the ZooKeeper port:

Run the four-letter command ruok to test if server is running in a non-error state. A successful response returns "imok".
```
echo ruok | nc host 2181
```
Returns:
```
imok
```

Run the four-letter command, stat, to list server performance and connected clients statistics:

echo stat | nc host 2181

Returns:

Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/0:0:0:0:0:0:0:1:33467[0](queued=0,recved=1,sent=0)
/192.168.124.201:42388[1](queued=0,recved=8433,sent=8433)
/192.168.124.202:42185[1](queued=0,recved=1339,sent=1347)
/192.168.124.204:39296[1](queued=0,recved=7688,sent=7692)
Latency min/avg/max: 0/0/128
Received: 26144
Sent: 26160
Connections: 4
Outstanding: 0
Zxid: 0x2000002c2
Mode: follower
Node count: 283

If netcat (nc) is not available, you can use the python as an alternative. Create a file named zookeeper.py that contains the following:

import time, socket,
sys c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
c.connect((sys.argv[1], 2181))
c.send(sys.argv[2])
time.sleep(0.1)
print c.recv(512)

Now run the following python lines:

python zookeeper.py 192.168.124.201 ruok
python zookeeper.py 192.168.124.201 stat

LDAP level test

You can monitor OpenLDAP to see whether the specific requests are served properly. In other words, check for a specific search that returns the right result.

Use ldapsearch (yum install openldap-clients) to query the entry of the system admin. This entry is used to authenticate all API calls.

ldapsearch -b "uid=admin,ou=users,ou=global,dc=apigee,dc=com" -x -W -D "cn=manager,dc=apigee,dc=com" -H ldap://localhost:10389 -LLL

You are then prompted for the LDAP admin password:

Enter LDAP Password:

After entering the password, you see a response in the form:

dn:
uid=admin,ou=users,ou=global,dc=apigee,dc=com
objectClass: organizationalPerson
objectClass: person
objectClass: inetOrgPerson
objectClass: top
uid: admin
cn: admin
sn: admin
userPassword:: e1NTSEF9bS9xbS9RbVNXSFFtUWVsU1F0c3BGL3BQMkhObFp2eDFKUytmZVE9PQ=
 =
mail: opdk@google.com

Check whether Management Server is still connected to LDAP with the following command:

curl -u userEMail:password http://localhost:8080/v1/users/ADMIN

Returns:

{
  "emailId" : ADMIN,
  "firstName" : "admin",
  "lastName" : "admin"
}

You can also monitor the OpenLDAP caches, which help in reducing the number of disk accesses and hence improve the performance of the system. Monitoring and then tuning the cache size in the OpenLDAP server can heavily impact the performance of the directory server. You can view the log files (opt/apigee/var/log) to obtain information about cache.

How to monitor

Overview

JMX and Management API monitoring ports

Use JMX

Enable JMX

Enable JMX authentication

Monitor with JConsole

Monitor with the Management API

Perform service checks

Monitor user, organization, and deployment status

Check status with the apigee-service command

Postgres monitoring

Check organizations and environments on Postgres

Verify analytics status

PostgreSQL database

Use the check_postgres.pl script

Run database checks

Check health status of postgres process

Postgres resources

Apache Cassandra

Use JConsole: Monitor task statistics

Cassandra JMX statistics

Use nodetool to manage cluster nodes

Cassandra monitoring (UI)

Cassandra resource

Apache ZooKeeper

Check ZooKeeper status

Use ZooKeeper four-letter words

LDAP level test

Check status with the `apigee-service` command

Use the `check_postgres.pl` script