Self healing with apigee-monit

Apigee Edge for Private Cloud includes apigee-monit, a tool based on the open source monit utility. apigee-monit periodically polls Edge services; if a service is unavailable, then apigee-monit attempts to restart it.

To use apigee-monit, you must install it manually. It is not part of the standard installation.

By default, apigee-monit checks the status of Edge services every 60 seconds.

Quick start

This section shows you how to quickly get up and running with apigee-monit.

If you are using Amazon Linux, first install Fedora. Otherwise, skip this step.

sudo yum install -y https://kojipkgs.fedoraproject.org/packages/monit/5.25.1/1.el6/x86_64/monit-5.25.1-1.el6.x86_64.rpm

To install apigee-monit, do the following steps:

  Install apigee-monit
/opt/apigee/apigee-service/bin/apigee-service apigee-monit install
/opt/apigee/apigee-service/bin/apigee-service apigee-monit configure
/opt/apigee/apigee-service/bin/apigee-service apigee-monit start

This installs apigee-monit and starts monitoring all components on the node by default.

  Stop monitoring components
/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c component_name
/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c all
  Start monitoring components
/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor -c component_name
/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor -c all
  Get summary status information
/opt/apigee/apigee-service/bin/apigee-service apigee-monit report
/opt/apigee/apigee-service/bin/apigee-service apigee-monit summary
  Look at the apigee-monit log files
cat /opt/apigee/var/log/apigee-monit/apigee-monit.log

Each of these topics and others are described in detail in the sections that follow.

About apigee-monit

apigee-monit helps ensure that all components on a node stay up and running. It does this by providing a variety of services, including:

  • Restarting failed services
  • Displaying summary information
  • Logging monitoring status
  • Sending notifications
  • Monitoring non-Edge services

Apigee recommends that you monitor apigee-monit to ensure that it is running. For more information, see Monitor apigee-monit.

apigee-monit architecture

During your Apigee Edge for Private Cloud installation and configuration, you optionally install a separate instance of apigee-monit on each node in your cluster. These separate apigee-monit instances operate independently of one another: they do not communicate the status of their components to the other nodes, nor do they communicate failures of the monitoring utility itself to any central service.

The following image shows the apigee-monit architecture in a 5-node cluster:

Architecture
  of Apigee monit in a 5 node cluster
Figure 1: A separate instance of apigee-monit runs in isolation on each node in a cluster

Supported platforms

apigee-monit supports the following platforms for your Private Cloud cluster. (The supported OS for apigee-monit depends on the version of Private Cloud.)

Operating System Private Cloud Version
v4.50.00 v4.51.00 v4.52.00
CentOS 7.5, 7.6, 7.7, 7.8 7.5, 7.6, 7.7, 7.8 7.5, 7.6, 7.7, 7.8
RedHat Enterprise Linux (RHEL) 7.5, 7.6, 7.7, 7.8 7.5, 7.6, 7.7, 7.8, 7.9, 8.0 7.5, 7.6, 7.7, 7.8, 7.9, 8.0
Oracle Linux 7.5, 7.6, 7.7, 7.8 7.5, 7.6, 7.7, 7.8 7.5, 7.6, 7.7, 7.8
* While not technically supported, you can install and use apigee-monit on CentOS/RHEL/Oracle version 6.9 for Apigee Edge for Private Cloud version 4.19.01.

Component configurations

apigee-monit uses component configurations to determine which components to monitor, which aspects of the component to check, and what action to take in the event of a failure.

By default, apigee-monit monitors all Edge components on a node using their pre-defined component configurations. To view the default settings, you can look at the apigee-monit component configuration files. You cannot change the default component configurations.

apigee-monit checks different aspects of a component, depending on which component it is checking. The following table lists what apigee-monit checks for each component and shows you where the component configuration is for each component. Note that some components are defined in a single configuration file, which others have their own configurations.

Component Configuration location What is monitored
Management Server /opt/apigee/edge-management-server/monit/default.conf apigee-monit checks:
  • Specified port(s) are open and accepting requests
  • Specified protocol(s) are supported
  • Status of the response

In addition, for these components apigee-monit:

  • Requires multiple failures within a given number of cycles before taking action
  • Sets a custom request path
Message Processor /opt/apigee/edge-message-processor/monit/default.conf
Postgres Server /opt/apigee/edge-postgres-server/monit/default.conf
Qpid Server /opt/apigee/edge-qpid-server/monit/default.conf
Router /opt/apigee/edge-router/monit/default.conf
Cassandra
Edge UI
OpenLDAP
Postgres
Qpid
Zookeeper
/opt/apigee/data/apigee-monit/monit.conf apigee-monit checks:
  • Service is running

The following example shows the default component configuration for the edge-router component:

check host edge-router with address localhost
  restart program = "/opt/apigee/apigee-service/bin/apigee-service edge-router monitrestart"
  if failed host 10.1.1.0 port 8081 and protocol http
    and request "/v1/servers/self/uuid"
    with timeout 15 seconds
    for 2 times within 3 cycles
  then restart

  if failed port 15999 and protocol http
    and request "/v1/servers/self"
    and status < 600
    with timeout 15 seconds
    for 2 times within 3 cycles
  then restart

The following example shows the default configuration for the Classic UI (edge-ui) component:

check process edge-ui
 with pidfile /opt/apigee/var/run/edge-ui/edge-ui.pid
 start program = "/opt/apigee/apigee-service/bin/apigee-service edge-ui start" with timeout 55 seconds
 stop program = "/opt/apigee/apigee-service/bin/apigee-service edge-ui stop"

This applies to the Classic UI, not the new Edge UI whose component name is edge-management-ui.

You cannot change the default component configurations for any Apigee Edge for Private Cloud component. You can, however, add your own component configurations for external services, such as your target endpoint or the httpd service. For more information, see Non-Apigee component configurations.

By default, apigee-monit monitors all components on a node on which it is running. You can enable or disable it for all components or for individual components. For more information, see:

Install apigee-monit

apigee-monit is not installed by default; you can install it manually after upgrading or installing version 4.19.01 or later of Apigee Edge for Private Cloud.

This section describes how to install apigee-monit on the supported platforms.

For information on uninstalling apigee-monit, see Uninstall apigee-monit.

Install apigee-monit on a supported platform

This section describes how to install apigee-monit on a supported platform.

To install apigee-monit on a supported platform:

  1. Install apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit install
  2. Configure apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit configure
  3. Start apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit start
  4. Repeat this procedure on each node in your cluster.

Stop and start monitoring components

When a service stops for any reason, apigee-monit attempts to restart the service.

This can cause a problem if you want to purposefully stop a component. For example, you might want to stop a component when you need to back it up or upgrade it. If apigee-monit restarts the service during the backup or upgrade, your maintenance procedure can be disrupted, possibly causing it to fail.

The following sections show the options for stopping the monitoring of components.

Stop a component and unmonitor it

To stop a component and unmonitor it, execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit stop-component -c component_name
component_name can be one of the following:
  • apigee-cassandra (Cassandra)
  • apigee-openldap (OpenLDAP)
  • apigee-postgresql (PostgreSQL database)
  • apigee-qpidd (Qpidd)
  • apigee-sso (Edge SSO)
  • apigee-zookeeper (ZooKeeper)
  • edge-management-server (Management Server)
  • edge-management-ui (new Edge UI)
  • edge-message-processor (Message Processor)
  • edge-postgres-server (Postgres Server)
  • edge-qpid-server (Qpid Server)
  • edge-router (Edge Router)
  • edge-ui (Classic UI)

Note that "all" is not a valid option for stop-component. You can stop and unmonitor only one component at a time with stop-component.

To re-start the component and resume monitoring, execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit start-component -c component_name

Note that "all" is not a valid option for start-component.

For instructions on how to stop and unmonitor all components, see Stop all components and unmonitor them.

Unmonitor a component (but don't stop it)

To unmonitor a component (but don't stop it), execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c component_name
component_name can be one of the following:
  • apigee-cassandra (Cassandra)
  • apigee-openldap (OpenLDAP)
  • apigee-postgresql (PostgreSQL database)
  • apigee-qpidd (Qpidd)
  • apigee-sso (Edge SSO)
  • apigee-zookeeper (ZooKeeper)
  • edge-management-server (Management Server)
  • edge-management-ui (new Edge UI)
  • edge-message-processor (Message Processor)
  • edge-postgres-server (Postgres Server)
  • edge-qpid-server (Qpid Server)
  • edge-router (Edge Router)
  • edge-ui (Classic UI)

To resume monitoring the component, execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor -c component_name

Unmonitor all components (but don't stop them)

To unmonitor all components (but don't stop them), execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c all

To resume monitoring all components, execute the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor -c all

Stop all components and unmonitor them

To stop all components and unmonitor them, execute the following commands:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c all
/opt/apigee/apigee-service/bin/apigee-all stop

To re-start all components and resume monitoring, execute the following commands:

/opt/apigee/apigee-service/bin/apigee-all start
/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor -c all

To stop monitoring all components, you can also disable apigee-monit, as described in Stop, start, and disable apigee-monit.

Stop, start, and disable apigee-monit

As with any service, you can stop and start apigee-monit using the apigee-service command. In addition, apigee-monit supports the unmonitor command, which lets you temporarily stop monitoring components.

Stop apigee-monit

To stop apigee-monit, use the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit stop

Start apigee-monit

To start apigee-monit, use the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit start

Disable apigee-monit

You can suspend monitoring all components on the node by using the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit unmonitor -c all

Alternatively, you can permanently disable apigee-monit by uninstalling it from the node, as described in Uninstall apigee-monit.

Uninstall apigee-monit

To uninstall apigee-monit:

  1. If you set up a cron job to monitor apigee-monit, remove the cron job before uninstalling apigee-monit:
    sudo rm /etc/cron.d/apigee-monit.cron
  2. Stop apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit stop
  3. Uninstall apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit uninstall
  4. Repeat this procedure on each node in your cluster.

Monitor a newly installed component

If you install a new component on a node that is running apigee-monit, you can begin monitoring it by executing apigee-monit's restart command. This generates a new monit.conf file that will include the new component in its component configurations.

The following example restarts apigee-monit:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit restart

Customize apigee-monit

You can customize various apigee-monit settings, including:

  1. Default apigee-monit control settings
  2. Global configuration settings
  3. Non-Apigee component configurations

Default apigee-monit control settings

You can customize the default apigee-monit control settings such as the frequency of status checks and the locations of the apigee-monit files. You do this by editing a properties file using the code with config technique. Properties files will persist even after you upgrade Apigee Edge for Private Cloud.

The following table describes the default apigee-monit control settings that you can customize:

Property Description
conf_monit_httpd_port The httpd daemon's port. apigee-monit uses httpd for its dashboard app and to enable reports/summaries. The default value is 2812.
conf_monit_httpd_allow Constraints on requests to the httpd daemon. apigee-monit uses httpd to run its dashboard app and enable reports/summaries. This value must point to the localhost (the host that httpd is running on.

To require that requests include a username and password, use the following syntax:

conf_monit_httpd_allow=allow username:"password"\nallow 127.0.0.1

When adding a username and password, insert a "\n" between each constraint. Do not insert actual newlines or carriage returns in the value.

conf_monit_monit_datadir The directory in which event details are stored.
conf_monit_monit_delay_time The amount of time that apigee-monit waits after it is first loaded into memory before it runs. This affects apigee-monit the first process check only.
conf_monit_monit_logdir The location of the apigee-monit log file.
conf_monit_monit_retry_time The frequency at which apigee-monit attempts to check each process; the default is 60 seconds.
conf_monit_monit_rundir The location of the PID and state files, which apigee-monit uses for checking processes.

To customize the default apigee-monit control settings:

  1. Edit the following file:
    /opt/apigee/customer/application/monit.properties

    If the file does not exist, create it and set the owner to the "apigee" user:

    chown apigee:apigee /opt/apigee/customer/application/monit.properties

    Note that if the file already exists, there may be additional configuration properties defined in it beyond what is listed in the table above. You should not modify properties other than those listed above.

  2. Set or replace property values with your new values.

    For example, to change the location of the log file to /tmp, add or edit the following property:

    conf_monit_monit_logdir=/tmp/apigee-monit.log
  3. Save your changes to the monit.properties file.
  4. Re-configure apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit configure
  5. Reload apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit reload

    If you cannot restart apigee-monit, check the log file for errors as described in Access apigee-monit log files.

  6. Repeat this procedure for each node in your cluster.

Global configuration settings

You can define global configuration settings for apigee-monit; for example, you can add email notifications for alerts. You do this by creating a configuration file in the /opt/apigee/data/apigee-monit directory and then restarting apigee-monit.

To define global configuration settings for apigee-monit:

  1. Create a new component configuration file in the following location:
    /opt/apigee/data/apigee-monit/filename.conf

    Where filename can be any valid file name, except "monit".

  2. Change the owner of the new configuration file to the "apigee" user, as the following example shows:
    chown apigee:apigee /opt/apigee/data/apigee-monit/my-mail-config.conf
  3. Add your global configuration settings to the new file. The following example configures a mail server and sets the alert recipients:
    SET MAILSERVER smtp.gmail.com PORT 465
      USERNAME "example-admin@gmail.com" PASSWORD "PASSWORD"
      USING SSL, WITH TIMEOUT 15 SECONDS
    
    SET MAIL-FORMAT {
      from: edge-alerts@example.com
      subject: Monit Alert -- Service: $SERVICE $EVENT on $HOST
    }
    SET ALERT fred@example.com
    SET ALERT nancy@example.com

    For a complete list of global configuration options, see the monit documentation.

  4. Save your changes to the component configuration file.
  5. Reload apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit reload

    If apigee-monit does not restart, check the log file for errors as described in Access apigee-monit log files.

  6. Repeat this procedure for each node in your cluster.

Non-Apigee component configurations

You can add your own configurations to apigee-monit so that it will check services that are not part of Apigee Edge for Private Cloud. For example, you can use apigee-monit to check that your APIs are running by sending requests to your target endpoint.

To add a non-Apigee component configuration:

  1. Create a new component configuration file in the following location:
    /opt/apigee/data/apigee-monit/filename.conf

    Where filename can be any valid file name, except "monit".

    You can create as many component configuration files as necessary. For example, you can create a separate configuration file for each non-Apigee component that you want to monitor on the node.

  2. Change the owner of the new configuration file to the "apigee" user, as the following example shows:
    chown apigee:apigee /opt/apigee/data/apigee-monit/my-config.conf
  3. Add your custom configurations to the new file. The following example checks the target endpoint on the local server:
    CHECK HOST localhost_validate_test WITH ADDRESS localhost
      IF FAILED
        PORT 15999
        PROTOCOL http
        REQUEST "/validate__test"
        CONTENT = "Server Ready"
        FOR 2 times WITHIN 3 cycles
      THEN alert

    For a complete list of possible configuration settings, see the monit documentation.

  4. Save your changes to the configuration file.
  5. Reload apigee-monit with the following command:
    /opt/apigee/apigee-service/bin/apigee-service apigee-monit reload

    If apigee-monit does not restart, check the log file for errors as described in Access apigee-monit log files.

  6. Repeat this procedure for each node in your cluster.

Note that this is for non-Edge components only. You cannot customize the component configurations for Edge components.

Access apigee-monit log files

apigee-monit logs all activity, including events, restarts, configuration changes, and alerts in a log file.

The default location of the log file is:

/opt/apigee/var/log/apigee-monit/apigee-monit.log

You can change the default location by customizing the apigee-monit control settings.

Log file entries have the following form:

'edge-message-processor' trying to restart
[UTC Dec 14 16:20:42] info     : 'edge-message-processor' trying to restart
'edge-message-processor' restart: '/opt/apigee/apigee-service/bin/apigee-service edge-message-processor monitrestart'

You cannot customize the format of the apigee-monit log file entries.

View aggregated status with apigee-monit

apigee-monit includes the following commands that give you aggregated status information about the components on a node:

Command Usage
report
/opt/apigee/apigee-service/bin/apigee-service apigee-monit report
summary
/opt/apigee/apigee-service/bin/apigee-service apigee-monit summary

Each of these commands is explained in more detail in the sections that follow.

report

The report command gives you a rolled-up summary of how many components are up, down, currently being initialized, or are currently unmonitored on a node. The following example invokes the report command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit report

The following example shows report output on an AIO (all-in-one) configuration:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit report
up:            11 (100.0%)
down:           0 (0.0%)
initialising:   0 (0.0%)
unmonitored:    1 (8.3%)
total:         12 services

In this example, 11 of the 12 services are reported by apigee-monit as being up. One service is not currently being monitored.

You may get a Connection refused error when you first execute the report command. In this case, wait for the duration of the conf_monit_monit_delay_time property, and then try again.

summary

The summary command lists each component and provides its status. The following example invokes the summary command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit summary

The following example shows summary output on an AIO (all-in-one) configuration:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit summary
Monit 5.25.1 uptime: 4h 20m
 Service Name                     Status                      Type
 host_name                        OK                          System
 apigee-zookeeper                 OK                          Process
 apigee-cassandra                 OK                          Process
 apigee-openldap                  OK                          Process
 apigee-qpidd                     OK                          Process
 apigee-postgresql                OK                          Process
 edge-ui                          OK                          Process
 edge-qpid-server                 OK                          Remote Host
 edge-postgres-server             OK                          Remote Host
 edge-management-server           OK                          Remote Host
 edge-router                      OK                          Remote Host
 edge-message-processor           OK                          Remote Host

If you get a Connection refused error when you first execute the summary command, try waiting the duration of the conf_monit_monit_delay_time property, and then try again.

Monitor apigee-monit

It is best practice to regularly check that apigee-monit is running on each node.

To check that apigee-monit is running, use the following command:

/opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor_monit

Apigee recommends that you issue this command periodically on each node that is running apigee-monit. One way to do this is with a utility such as cron that executes scheduled tasks at pre-defined intervals.

To use cron to monitor apigee-monit:

  1. Add cron support by copying the apigee-monit.cron directory to the /etc/cron.d directory, as the following example shows:
    cp /opt/apigee/apigee-monit/cron/apigee-monit.cron /etc/cron.d/
  2. Open the apigee-monit.cron file to edit it.

    The apigee-monit.cron file defines the cron job to execute as well as the frequency at which to execute that job. The following example shows the default values:

    # Cron entry to check if monit process is running. If not start it
    */2 * * * * root /opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor_monit

    This file uses the following syntax, in which the first five fields define the time at which apigee-monit executes its action:

    min hour day_of_month month day_of_week task_to_execute

    For example, the default execution time is */2 * * * *, which instructs cron to check the apigee-monit process every 2 minutes.

    You cannot execute a cron job more frequently than once per minute.

    For more information on using cron, see your server OS's documentation or man pages.

  3. Change the cron settings to match your organization's policies. For example, to change the execution frequency to every 5 minutes, set the job definition to the following:
    */5 * * * * root /opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor_monit
  4. Save the apigee-monit.cron file.
  5. Repeat this procedure for each node in your cluster.

If cron does not begin watching apigee-monit, check that:

  • There is a blank line after the cron job definition.
  • There is only one cron job defined in the file. (Commented lines do not count.)

If you want to stop or temporarily disable apigee-monit, you must disable this cron job, too, otherwise cron will restart apigee-monit.

To disable cron, do one of the following:

  • Delete the /etc/cron.d/apigee-monit.cron file:
    sudo rm /etc/cron.d/apigee-monit.cron

    You will have to re-copy it if you later want to re-enable cron to watch apigee-monit.

    OR

  • Edit the /etc/cron.d/apigee-monit.cron file and comment out the job definition by adding a "#" to the beginning of the line; for example:
    # 10 * * * * root /opt/apigee/apigee-service/bin/apigee-service apigee-monit monitor_monit