Analytics data stuck in Qpidd dead letter queue

Symptom

Analytics data is missing in the Edge UI due to the Qpidd server not transferring analytics messages to PostgreSQL. In Edge, the edge-qpid-server component corresponds to the Qpidd server.

Qpidd maintains two queues for each analytics group:

  • ax-q-axgroup001-consumer-group-001

    This queue holds analytics messages pushed from the Message Processors and Routers. Messages get pulled from here by the edge-qpid-server which parses messages and inserts them into PostgreSQL. Once messages are processed successfully they are removed from the queue.

  • ax-q-axgroup001-consumer-group-001-dl

    This queue is the dead letter queue. It acts as the destination for messages that edge-qpid-server failed to process and thus no longer wishes to receive. This typically gets populated when the maximum delivery count is exceeded, or if PostgreSQL rejected insertion of new data due to runtime errors.

Error Message

The root cause could be due to various runtime errors from the edge-qpid-server component. Typically if edge-qpid-server receives a runtime error from PostgreSQL, it creates the dead letter queue if it doesn't exist already, and then sends the following message there:

yyyy-MM-dd HH:mm:ss,SSS ax-q-axgroup001-consumer-group-001-persistpool-thread-6 WARN c.a.a.m.MessageConsumer - MessageConsumer.process() : Sending message batch to the DLQ.

Possible Causes

Cause Description Troubleshooting Instructions Applicable For
Messages stuck in dead letter queue of qpidd edge-qpid-server could not understand messages that it read from Qpidd broker or was unable to persist messages to PostgreSQL. Edge Private Cloud Users

Common Diagnosis Steps

Run the following command to view Qpidd queue stats:

qpid-stat -q

The output returns the set of queues registered with the broker. If the queue with name ending in "-dl" has messages populated, then there are messages stuck on the dead letter queue.

Queues
  queue                                     dur  autoDel  excl  msg   msgIn  msgOut  bytes  bytesIn  bytesOut   cons  bind
  ========================================================================================================================
  ax-q-axgroup-001-consumer-group-001       Y                   0     185    185     0       13.8m   13.8m      6      2
  ax-q-axgroup-001-consumer-group-001-dl    Y                   0     70     70      0        3.9m    3.9m      0      2

Cause: Messages stuck in dead letter queue of qpidd

Diagnosis

This condition can happen under the following scenarios:

  1. An upgrade had taken place in the past, during which time PostgreSQL was down.
  2. A temporary outage of PostgreSQL due to network issues.
  3. The edge-qpid-server attempted to send a message to PostgreSQL but PostgreSQL returned a runtime error.

Resolution

  1. Note down the name of the queues from the Common Diagnosis Steps. For example:

    • ax-q-axgroup-001-consumer-group-001
    • ax-q-axgroup-001-consumer-group-001-dl
  2. Run the qpid-tool command to enter an interactive qpidprompt:

    qpid-tool

    This comamnd returns the following:

    Management Tool for QPID
    qpid:
  3. Run list broker to obtain a list of active brokers:

    list broker

    This comamnd returns the following:

    Object Summary:
    ID   Created   Destroyed  Index
    =======================================
    125  21:00:00  -          amqp-broker

    Where the ID column specifies the ID of the borker.

  4. Note the ID of the broker. In the example it is 125.

  5. Run the following command to move the messages from the dead letter queue back to the actual queue:

    call 125 queueMoveMessages ax-q-axgroup-001-consumer-group-001-dl ax-q-axgroup-001-consumer-group-001 100000 {}

    This comamnd returns the following:

    OK (0) - {}

    If there is no output then there was nothing to do, meaining no messages to move. If you do not see OK(0) then you should contact Apigee Support.

  6. Quit the qpid-tool terminal.

    quit
  7. Wait 5 minutes and then run the diagnosis steps again from Common Diagnosis Steps. Verify that the messages on the actual queue are getting processed and ensure the dead letter msg count remains at 0.

If the problem still persists, go to the next section.

Must Gather Diagnostic Information

If the problem persists even after following the above instructions, please gather the following diagnostic information. Contact and share them with Apigee Support:

  • Qpidd logs: /opt/apigee/var/log/apigee-qpidd/apigee-qpidd.log
  • Postgresql logs: /opt/apigee/var/log/apigee-postgresql/apigee-postgresql.log
  • Edge-qpid-server logs: /opt/apigee/var/log/edge-qpid-server/logs/system.log
  • Edge-postgres-server logs:/opt/apigee/var/log/edge-postgres-server/logs/system.log
  • Qpidd queue stats:

    qpid-stat -q
  • Analytics group returned by the following curl command:

    curl -u sysadminEmail:password http://mgmt:8080/v1/analytics/groups/ax