How to Monitor

Edge for Private Cloud v. 4.17.01

This document describes the monitoring techniques of components supported by an on-premise deployment of Apigee Edge.

Enabling JMX

JMX is enabled by default for Cassandra, and disabled by default for all other Edge components. You must therefore enable JMX individually for each component.

Each component supports JMX on a different port. The following table lists the JMX port and the file that you modify to enable JMX on that port:

Component JMX Port File
Management Server 1099 /opt/apigee/edge-management-server/bin/start
Message Processor 1101 /opt/apigee/edge-mesage-processor/bin/start
Qpid 1102 /opt/apigee/edge-qpid-server/bin/start
Postgres 1103 /opt/apigee/edge-postgres-server/bin/start

For example, to enable JMX on the Management Server, open /opt/apigee/edge-management-server/bin/start in an editor. You should see the following line used to start the Management Server:

exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts -Djava.security.auth.login.config=$conf_path/jaas.config 
-Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path 
-Ddata.dir=$data_dir $* $debug_options com.apigee.kernel.MicroKernel

Edit this line to add the following:

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 
-Dcom.sun.management.jmxremote.local.only=false  
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false 

Note that this line specifies the JMX port number as 1099 for the Management Server. Set the port number for each component as defined in the table above. For example:

exec $JAVA -classpath "$classpath" -Xms$min_mem -Xmx$max_mem $xx_opts 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 
-Dcom.sun.management.jmxremote.local.only=false  
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false 
-Djava.security.auth.login.config=$conf_path/jaas.config 
-Dinstallation.dir=$install_dir $sys_props -Dconf.dir=$conf_path -Ddata.dir=$data_dir $* $debug_options com.apigee.kernel.MicroKernel

Save the file and then restart the component. For example to restart the Management Server:

> /opt/apigee/apigee-service/bin/ apigee-service edge-management-server restart

Enabling JMX authentication and setting the JMX password

The monitoring process for the Management Server, Message Processor, Qpid, and Postgres all use JMX. JMX is enabled by default and remote JMX access does not require a password.

To enable JMX authentication, each component has a change_jmx_auth action that you use to enable/disable authentication and to set the JMX credentials.

To enable JMX authentication, use the following command:

>  /opt/apigee/apigee-service/bin/apigee-service comp change_jmx_auth optionsOrConfigFile

where:

  • comp is either edge-management-server, edge-message-processor, edge-qpid-server, or edge-postgres-server.
  • Options are:
    • -u: username
    • -p: password
    • -e: y (enable) or n (dsiable)
  • Config file includes:
    • JMX_USERNAME=username
    • JMX_ENABLED=y/n
    • JMX_PASSWORD=password (if not set or not passed in with -p, you are prompted)

For example, to use options on the command line:

> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -u foo -p bar -e y

If you have a config file:

> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -f configFile

If you are running Edge on multiple nodes, run this command on all nodes, specifying the same username and password.

To later disable JMX authentication, use the command:

> /opt/apigee/apigee-service/bin/apigee-service edge-management-server change_jmx_auth -e n

Management Server

Using JConsole to monitor system health check and process information

Use JConsole (a JMX compliant tool) to manage and monitor health check and process statistics. Using JConsole, you can consume JMX statistics exposed by Management Server (or any server) and display them in a graphical interface. For more information on JConsole usage, see http://docs.oracle.com/javase/8/docs/technotes/guides/management/jconsole.html.

Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX.

service:jmx:rmi:///jndi/rmi://<ip address>:<port>/jmxrmi

where <ip address> is the IP address of Management Server (or respective server). By default the port is 1099 for the Management Server.

The following table shows the generic JMX statistics:

JMX MBeans

JMX Attributes

Memory

HeapMemoryUsage

NonHeapMemoryUsage

Usage

Note: Attribute values will be displayed in four values: committed, init, max, and used.

Using Edge Application API checks

You can perform API check on the Management Server (or any server) by invoking the following CURL command:

curl http://<host>:8080/v1/servers/self/up

Where, <host> is the IP address of Management Server.

This call returns the "true" and "false". If true, that means node is up and Java service is running.

If you do not receive a HTTP 200 (OK) response, the Edge is unable to respond to port 8080 requests.

Troubleshooting

  1. Login to the server and run the following command:
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server status
  2. If the service is not running start the service:
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server start

Using Edge Application – Users, organization and deployment checks

Management Server plays a vital role in holding all other parcels together in each on-premises installation. You can check for user, organization and deployment status on the management server by issuing the following commands:

curl -u userEmail:password http://localhost:8080/v1/users
curl -u userEmail:password http://localhost:8080/v1/organizations
curl -u userEmail:password http://localhost:8080/v1/organizations/orgname/deployments

The system should display "deployed" status for all calls. If these fail, do the following:

  1. Check the Management Server logs (at opt/apigee/var/log/edge-management-server) for any errors.
  2. Make a call against Management Server to check whether it is functioning properly.
  3. Remove the server from the ELB and then restart the Management Server.
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart

Router

You can perform API check on the Router (or any server) by invoking the following CURL command:

curl http://<host>:8081/v1/servers/self/up

Where, host is the IP address of Router.

This call returns the "true" and "false". If true, that means the node is up and Router service is running.

If you do not receive a HTTP 200 (OK) response, Edge is unable to respond to port 8081 requests.

Troubleshooting

  1. Login to the server and run the following commands:
    /<inst_root>/apigee/apigee-service/bin/apigee-service edge-router status
  2. If the service is not running start the service
    /<inst_root>/apigee/apigee-service/bin/apigee-service edge-router start
  3. After restart check that it is functioning
    curl -v http://localhost:port/v1/servers/self/up

    Where port is 8081 for Router and 8082 for Message Processor.

Message Processor

Using JConsole to monitor system health check and process information

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 1101.

Using Edge Application API checks

Follow the same as described above for the Router.

Note: Ensure that you use port – 8082.

Using JMX message flow checks

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 1101.

Qpid Server

Using JConsole to monitor system health check and process information

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 1102.

Using Edge Application API checks

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 8083. The following CURL command is also supported for Qpid Server:

curl http://<qpid_IP>:8083/v1/servers/self

Postgres Server

Using JConsole to monitor system health check and process information

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 1103.

Using Edge Application API checks

Follow the same as described above for the Management Server.

Note: Ensure that you use port – 8084. The following CURL command is also supported for Postgres Server:

curl http://<postgres_IP>:8084/v1/servers/self

Using Edge Application organization and environment checks

You can check for organization and environment name that are onboarded on the Postgres Server by issuing the following CURL commands:

curl http:// <postgres_IP>:8084/v1/servers/self/organizations

Note: Ensure that you use port – 8084.

The system should display the organization and environment name.

Using Edge Application axstatus check

You can verify the status of the analytics servers by issuing the following CURL command.

curl -u userEmail:password http://<host>:<port>/v1/organizations/<orgname>/environments/<envname>/provisioning/axstatus

The system should display SUCCESS status for all analytics servers. The output of above CURL command is shown below:

{
  "environments" : [ {
    "components" : [ {
      "message" : "success at Thu Feb 28 10:27:38 CET 2013",
      "name" : "pg",
      "status" : "SUCCESS",
      "uuid" : "[c678d16c-7990-4a5a-ae19-a99f925fcb93]"
     }, {
      "message" : "success at Thu Feb 28 10:29:03 CET 2013",
      "name" : "qs",
      "status" : "SUCCESS",
      "uuid" : "[ee9f0db7-a9d3-4d21-96c5-1a15b0bf0adf]"
     } ],
    "message" : "",
    "name" : "prod"
   } ],
  "organization" : "acme",
  "status" : "SUCCESS"
}

PostgreSQL Database

Using the check_postgres.pl script

To monitor the PostgreSQL database, you can use a standard monitoring script, check_postgres.pl which is available at http://bucardo.org/wiki/Check_postgres.

Note: The script, check_postgres.pl needs to be installed in each Postgres node.

Before you run the script:

  1. Ensure that you have installed perl-Time-HiRes.x86_64, a Perl module that implements high resolution alarm, sleep, gettimeofday, and interval timers. For example, you can install it by using the following command:
    yum install perl-Time-HiRes.x86_64

The default output of the API calls using the script, check_postgres.pl is Nagios compatible. After you install the script, do the following checks:

  1. Database size – check the database size:
    check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -include=apigee -action database_size --warning='800 GB' --critical='900 GB'
  2. Incoming connection to the database – checks the number of incoming connections to the database and compares with maximum allowed connections:
    check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action backends
  3. Database availability and performance – checks if database is running and available:
    check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action connection
  4. Disk space – checks the disk space:
    check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action disk_space --warning='80%' --critical='90%'
  5. Onboarded organizations/environments – checks the number of organization and environment onboarded in a Postgres node:
    check_postgres.pl -H 10.176.218.202 -db apigee -u apigee -dbpass postgres -action=custom_query --query="select count(*) as result from pg_tables where schemaname='analytics' and tablename like '%fact'" --warning='80' --critical='90' --valtype=integer

Note: Please refer to the http://bucardo.org/check_postgres/check_postgres.pl.html in case you need any help on using the above commands.

DB Checks

You can verify that the proper tables are created in PostgreSQL database. Login to PostgreSQL database using:

psql  -h /opt/apigee/var/run/apigee-postgresql/  -U apigee -d apigee

and then run:

\d analytics."<org>.<env>.fact"

Check health status of postgres process

You can perform API check on the postgres machine by invoking the following CURL command:

http://<postgres_IP>:8084/v1/servers/self/health/

Note: Ensure that you use port 8084.

It returns the ‘ACTIVE’ status when postgres process is active. If the postgres process is not up and running, it returns the ‘INACTIVE’ status.

Postgres Resources

Apache Cassandra

Using JConsole – monitor task statistics

Use JConsole and the following service URL to monitor the JMX attributes (MBeans) offered via JMX.

service:jmx:rmi:///jndi/rmi://<ip address>:7199/jmxrmi

where <ip address> is the IP of the Cassandra server.

JMX is enabled by default for Cassandra and remote JMX access to Cassandra does not require a password.

To enable JMX authentication to add a password:

  1. Edit /opt/apigee/customer/application/cassandra.properties. If the file does not exist, create it.
  2. Add the following to the file:
    conf_cassandra-env_com.sun.management.jmxremote.authenticate=true
  3. Save the file.
  4. Copy the following files from your $JAVA_HOME directory to /opt/apigee/data/apigee-cassandra/:
    cp ${JAVA_HOME}/lib/management/jmxremote.password.template $APIGEE_ROOT/data/apigee-cassandra/jmxremote.password

    cp ${JAVA_HOME}/lib/management/jmxremote.access $APIGEE_ROOT/data/apigee-cassandra/jmxremote.access
  5. Edit jmxremote.password and add username and password to the file:
    cassandra password

    where password is the JMX password.
  6. Edit jmxremote.access and add the following role:
    cassandra readwrite
  7. Make sure the files are owned by "apigee" and that the file mode is 400:
    > chown apigee:apigee /opt/apigee/data/apigee-cassandra/jmxremote.*
    > chmod 400 /opt/apigee/data/apigee-cassandra/jmxremote.*
  8. Run configure on Cassandra:
    > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra configure
  9. Restart Cassandra:
    > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra restart

To later disable authentication:

  1. Edit /opt/apigee/customer/application/cassandra.properties.
  2. Remove the following line in the file:
    conf_cassandra-env_com.sun.management.jmxremote.authenticate=true
  3. Run configure on Cassandra:
    > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra configure
  4. Restart Cassandra:
    > /opt/apigee/apigee-service/bin/apigee-service apigee-cassandra restart

Cassandra JMX statistics

JMX MBeans

JMX Attributes

ColumnFamilies/apprepo/environments

ColumnFamilies/apprepo/organizations

ColumnFamilies/apprepo/apiproxy_revisions

ColumnFamilies/apprepo/apiproxies

ColumnFamilies/audit/audits

ColumnFamilies/audit/audits_ref

PendingTasks

MemtableColumnsCount

MemtableDataSize

ReadCount

RecentReadLatencyMicros

TotalReadLatencyMicros

WriteCount

RecentWriteLatencyMicros

TotalWriteLatencyMicros

TotalDiskSpaceUsed

LiveDiskSpaceUsed

LiveSSTableCount

BloomFilterFalsePositives

RecentBloomFilterFalseRatio

BloomFilterFalseRatio

Using nodetool utility to manage cluster nodes

The nodetool utility, which is a command line interface for Cassandra, is used to manage cluster nodes. The utility can be found at opt/apigee/apigee-cassandra/bin.

For more info on nodetool utility, see http://www.datastax.com/docs/1.0/references/nodetool.

The following calls can be made on all Cassandra cluster nodes:

  1. General ring info (also possible for single Cassandra node): Look for the "Up" and "Normal" for all nodes.
    [host]# nodetool -h localhost ring

    The output of the above command looks as shown below:
    Address DC Rack Status State Load Owns Token
    192.168.124.201 dc1 ra1 Up Normal 1.67 MB 33,33% 0
    192.168.124.202 dc1 ra1 Up Normal 1.68 MB 33,33% 56713727820156410577229101238628035242
    192.168.124.203 dc1 ra1 Up Normal 1.67 MB 33,33% 113427455640312821154458202477256070484
  2. General info about nodes (call per node)
    nodetool -h localhost info

    The output of the above command looks as shown below:
    Token : 0
    Gossip active : true
    Load : 1.67 MB
    Generation No : 1361968765
    Uptime (seconds) : 78108
    Heap Memory (MB) : 46,80 / 772,00
    Data Center : dc1
    Rack : ra1
    Exceptions : 0
  3. Status of the thrift server (serving client API)
    host]# nodetool -h localhost statusthrift

    The output of the above command displays status as "running".
  4. Status of data streaming operations: Observe traffic for cassandra nodes
    nodetool -h localhost netstats 192.168.124.203

    The output of the above command looks as shown below:
    Mode: NORMAL
    Nothing streaming to /192.168.124.203
    Nothing streaming from /192.168.124.203
    Pool Name Active Pending Completed
    Commands n/a 0 1688
    Responses n/a 0 292277

Cassandra Monitoring (UI)

Refer to the datastax opscenter URL: http://www.datastax.com/products/opscenter.

Cassandra Resource

Refer to the following URL: http://www.datastax.com/docs/1.0/operations/monitoring.

Apache ZooKeeper

Checking ZooKeeper status

  1. Ensure the ZooKeeper process is running. ZooKeeper writes a PID file to opt/apigee/var/run/apigee-zookeeper/apigee-zookeeper.pid.
  2. Test ZooKeeper ports to ensure that you can establish a TCP connection to ports 2181 and 3888 on every ZooKeeper server.
  3. Ensure that you can read values from the ZooKeeper database. Connect using a ZooKeeper client library (or /opt/apigee/apigee-zookeeper/bin/zkCli.sh) and read a value from the database.
  4. Check the status:
    > /opt/apigee/apigee-service/bin/apigee-service apigee-zookeeper status

Using ZooKeeper Four Letter Words

ZooKeeper can be monitored via a small set of commands (four-letter words) that are sent to the port 2181 using netcat (nc) or telnet.

For more info on ZooKeeper commands, see: http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands.

For example:

  • srvr: Lists full details for the server.
  • stat: Lists brief details for the server and connected clients.

The following commands can be issued to the ZooKeeper port:

  1. Run the four-letter command ruok to test if server is running in a non-error state. A successful response returns "imok".
    echo ruok | nc <host> 2181

    Returns:
    imok
  2. Run the four-letter command, stat to list server performance and connected clients statistics.
    echo stat | nc <host> 2181

    Returns:
    Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
    Clients:
    /0:0:0:0:0:0:0:1:33467[0](queued=0,recved=1,sent=0)
    /192.168.124.201:42388[1](queued=0,recved=8433,sent=8433)
    /192.168.124.202:42185[1](queued=0,recved=1339,sent=1347)
    /192.168.124.204:39296[1](queued=0,recved=7688,sent=7692)
    Latency min/avg/max: 0/0/128
    Received: 26144
    Sent: 26160
    Connections: 4
    Outstanding: 0
    Zxid: 0x2000002c2
    Mode: follower
    Node count: 283

    Note: It is sometimes important to see whether a ZooKeeper is in Mode: leader, follower or observer.
  3. If netcat (nc) is not available, you can use the python as an alternative. Create a file named zookeeper.py that contains the following:
    import time, socket,
    sys c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    c.connect((sys.argv[1], 2181))
    c.send(sys.argv[2])
    time.sleep(0.1)
    print c.recv(512)


    Now run the following python lines:
    python zookeeper.py 192.168.124.201 ruok
    python zookeeper.py 192.168.124.201 stat

OpenLDAP

LDAP Level Test

You can monitor the OpenLDAP to see whether the specific requests are served properly. In other words, check for a specific search that returns the right result.

  1. Use ldapsearch (yum install openldap-clients) to query the entry of the system admin. This entry is used to authenticate all API calls.
    ldapsearch -b "uid=admin,ou=users,ou=global,dc=apigee,dc=com" -x -W -D "cn=manager,dc=apigee,dc=com" -H ldap://localhost:10389 -LLL

    You are then prompted for the LDAP admin password:
    Enter LDAP Password:

    After entering the password, you see a response in the form:
    dn: uid=admin,ou=users,ou=global,dc=apigee,dc=com
    objectClass: organizationalPerson
    objectClass: person
    objectClass: inetOrgPerson
    objectClass: top
    uid: admin
    cn: admin
    sn: admin
    userPassword:: e1NTSEF9bS9xbS9RbVNXSFFtUWVsU1F0c3BGL3BQMkhObFp2eDFKUytmZVE9PQ=
    =
    mail: opdk@google.com
  2. Check whether Management Server is still connected to LDAP issue:
    curl -u <userEMail>:<password> http://localhost:8080/v1/users/<ADMIN>

    Returns:
    {
    "emailId" : <ADMIN>,
    "firstName" : "admin",
    "lastName" : "admin"
    }

You can also monitor the OpenLDAP caches, which help in reducing the number of disk accesses and hence improve the performance of the system. Monitoring and then tuning the cache size in the OpenLDAP server can heavily impact the performance of the directory server. You can view the log files (opt/apigee/var/log) to obtain information about cache.