503 Service Unavailable

Symptom

The client application receives an HTTP response status 503 with the message Service Unavailable following an API proxy call.

Error messages

You can see one of the following error messages:

HTTP/1.1 503 Service Unavailable
HTTP/1.1 503 Service Unavailable: Back-end server is at capacity

You can also see one of the following error messages in the HTTP response:

Service unavailable

{
   "fault": {
      "faultstring": "The Service is temporarily unavailable", 
      "detail": {
           "errorcode": "messaging.adaptors.http.flow.ServiceUnavailable"
       }
    }
}

No active targets

{
   "fault": {
      "faultstring": "The Service is temporarily unavailable",
      "detail": {
           "errorcode": "messaging.adaptors.http.flow.NoActiveTargets"
       }
    }
}

Possible causes

The HTTP status code 503 means that the server is currently unavailable. On Apigee Edge, this problem can occur either at the incoming (northbound) or outgoing (southbound) connection. Most often, the error occurs because a server is too busy or is down for some reason, such as for temporarily maintenance. It can also occur if the TLS/SSL handshake fails between the client and the server.

Possible causes for the 503 Service Unavailable response are:

Cause Description Who can perform the troubleshooting steps
Overloaded Server The server is overloaded and cannot handle any new incoming client requests. Private and Public Cloud users
Connection Errors due to Incorrect DNS Resolution The DNS resolution of the target server resulted in bad IP addresses that lead to connection errors. Edge Private and Public Cloud users
Connection Errors Network or connectivity issues prevent the client from connecting to the server. Private Cloud users only
SSL Handshake Failures
The TLS/SSL handshake failed between the client and server. (Troubleshooting for this class of problem is covered in a separate topic.)
Private and Public Cloud users

Overloaded server

The following error can occur when the server is overloaded or cannot handle any more requests:

HTTP/1.1 503 Service Unavailable: Back-end server is at capacity

Diagnosis

To diagose this issue, try to determine if the error occurs on the incoming (northbound) or outgoing (southbound) connection. To learn how to make this determination, see Determining the source of the problem.

If the error is on the incoming (northbound) connection:

  • Private Cloud users: Check if the Average Load/CPU/Memory usage is high on the Edge Router.
  • Public Cloud users: You do not have access to the Edge Router. Contact Apigee Support for assistance.

If the error is on the outgoing (southbound) connection:

  • All users: Check if the Average Load/CPU/Memory usage is high on the backend server.

Resolution

If the Edge Router is overloaded:

  • Private Cloud users: Restart the Edge Router and then monitor its usage to see if the problem is resolved. If the problem persists, contact Apigee Support for assistance.
  • Public Cloud users: You do not have access to the Edge Router. Contact Apigee support for assistance.

If the backend service is overloaded:

  • All users: Restart the appropriate backend server and then monitor it to see if the problem is resolved.
  • All users: If the problem persists, check if you need to increase the capacity of the appropriate backend server(s) and/or fix any issue with the backend server(s).

Were these troubleshooting steps helpful? Please send feedback to let us know.

Connection Errors due to Incorrect DNS Resolution

Diagnosis

  1. If you are able to capture the UI trace for the failing request, then determine the message id (X-Apigee.Message-ID) of the request as shown in the following figure.
    UI trace
  2. If you are unable to capture the UI trace for the failing request, then analyze the Nginx Access logs and identify the message id(s) of the specific request as shown in the below figure.
  3. Search for the specific request message id(s) in the Message Processor log (/opt/apigee/var/log/edge-message-processor/logs/system.log). You may observe the following errors:

    An onConnectTimeout error indicates that the Message Processor was unable to connect to the backend server within the preset connection timeout period (Default: 3 seconds).
    2019-08-14 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onTimeout() : ClientChannel[Connected:]@164162 useCount=1 bytesRead=0 bytesWritten=0 age=3001ms lastIO=3001ms .onConnectTimeout connectAddress=www.abc.com/11.11.11.11  resolvedAddress=www.abc.com/22.22.22.22
    
    2019-08-14 09:11:49,333 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 ERROR ADAPTORS.HTTP.FLOW - RequestWriteListener.onTimeout() : RequestWriteListener.onTimeout(HTTPRequest@6b393600)
    
  4. Note down the resolved IP address in the onConnectTimeout error and check if the IP address is valid for your backend server. If the IP address is valid, then move to Connection Errors.
  5. If the IP address is invalid, then it could most likely be caused due to issues with DNS resolution.
  6. Repeat step #3 and #4 for a few more failing API requests and verify if you are seeing the same or any other invalid IP addresses.
  7. Search through the Message Processor log (/opt/apigee/var/log/edge-message-processor/logs/system.log) for messages having the key word "DNS Refresh". Check if bad or invalid IP addresses are being added to the DNS cache on the Message Processor once in a while.
    2019-08-14 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 INFO c.a.p.h.d.DNSCachedAddress - DNSCachedAddress.reportDifferences() : DNS Refresh for host: apitarget-uat.schemeweb.co.uk:4436. Added 2 IPs [www.abc.com/22.22.22.22, www.abc.com/33.33.33.33] Removed 1 IPs [www.abc.com/11.11.11.11]
    
  8. This issue can happen if there are any issues with the authoritative DNS servers or the name servers configured in /etc/resolv.conf

    Typically, there could be one or more authoritative DNS servers configured to perform DNS resolution. If there are no authoritative DNS servers, then it would fall back to the configuration setup in /etc/resolv.conf and perform DNS resolution as appropriate. For ex: If the /etc/resolv.conf is configured to use specific name servers, then those name servers will be used to perform DNS resolution.
  9. If there are any issues with authoritative DNS servers or name servers specified in /etc/resolv.conf, then the backend server host names will be resolved to bad/invalid IP address(es). The bad/invalid IP address(es) will then be stored in the DNS cache of the Message Processor.
    1. If the issue with authoritative DNS servers or name servers specified in /etc/resolv.conf is persistent, then the bad/invalid IP address(es) will continue to remain in the DNS cache of the Message Processor. As long as the bad IP addresses are stored in the DNS cache of the Message Processor, the requests for all those APIs using the specific backend server will fail with 503 error.
    2. If the issue with authoritative DNS servers or name servers specified in /etc/resolv.conf is intermittent, then good and bad IP addresses will be stored intermittently in the DNS cache. In this case, you will see 503 errors intermittently for all those APIs using the specific backend server.
  10. If the issue with DNS servers is persistent, then you will see continuous failures. If the issue with DNS servers is intermittent, then you will see intermittent failures. That is, whenever the backend server host name gets resolved to bad IP addresses, then you observe 503 errors. And when the backend server host names are resolved to good IP addresses, then you will observe successful responses.

Resolution

Please work with your operating system administrator and fix the issues with the DNS servers.

  1. If there’s an issue with your authoritative DNS servers or name servers specified in /etc/resolv.conf, then fix the issue with the appropriate server to address this issue.
  2. If there’s any issue with the configuration in /etc/resolv.conf on the systems having Message Processors, then fix the configuration issue.

Connection errors

A connection error happens when an Apigee Edge Message Processor attempts to connect to a backend server and one of these problems occurs:

  • The Message Processor is unable to connect within the preset connection timeout period. (Default: 3 seconds)
  • The backend server refuses the connection.

Diagnosis

  1. Check the Message Processor log (/opt/apigee/var/log/edge-message-processor/logs/system.log) for any of the following errors:
    1. An onConnectTimeout error indicates that the Message Processor was unable to connect to the backend server within the preset connection timeout period.
      2016-06-23 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@2 ERROR HTTP.CLIENT - HTTPClient$Context.onTimeout() : ClientChannel[C:]@10 useCount=1 bytesRead=0 bytesWritten=0 age=3001ms lastIO=3001ms .onConnectTimeout connectAddress=www.abc.com/11.11.11.11:80 resolvedAddress=www.abc.com/11.11.11.11 
      2016-06-23 09:11:49,333 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@2 ERROR ADAPTORS.HTTP.FLOW - RequestWriteListener.onTimeout() : RequestWriteListener.onTimeout(HTTPRequest@6b393600)
      
    2. A java.net.ConnectException: Connection refused error indicates the connection was refused by the backend server.
      14:40:16.531 +0530      
      2016-06-17 09:10:16,531 org:myorg env:prod api:www.abc.com rev:1 rrt07eadn-22739-40983870-15 NIOThread@2 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to www.abc.com:11.11.11.11:443 failed with exception {} 
      java.net.ConnectException: Connection refused 
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_75] 
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) ~[na:1.7.0_75] 
      at com.apigee.nio.ClientChannel.finishConnect(ClientChannel.java:121) ~[nio-1.0.0.jar:na] 
      at com.apigee.nio.handlers.NIOThread.run(NIOThread.java:108) ~[nio-1.0.0.jar:na]
      
  2. Check if you are able to connect to the specific backend server directly from each of the Message Processors using the telnet command:
    1. If the backend server resolves to a single IP address, then use the following command:
      telnet BackendServer-IPaddress 443
      
    2. If the backend server resolves to multiple IP addresses, then use the hostname of the backend server in the telnet command as shown below:
      telnet BackendServer-HostName 443
      
  3. If you are able to connect to the backend server, then you might see a message like Connected to backend-server. If you are unable to connect to the backend server, this might be because the Message Processors' IP addresses are not whitelisted on the specific backend server.

Resolution

Whitelist the Message Processor's IP addresses on the specific backend server to allow traffic from the Edge Message Processors to your backend server. For example, On Linux, you could use iptables to whitelist or allow the traffic from the Message Processor's IP addresses on the backend server.

If the problem persists, work with your Network administrator to determine and fix the issue. If you need any further assistance from Apigee, contact Apigee Support.

Were these troubleshooting steps helpful? Please send feedback to let us know.

SSL Handshake Failures

An entire troubleshooting playbook is devoted to TLS/SSL handshake errors. See SSL Handshake Failures.

Determining the source of the problem

Certain types of errors can occur either on the incoming (northbound) or outgoing (southbound) connection. An incoming (northbound) error occurs between the client application and Edge. An outgoing (southbound) error occurs between Edge and the backend target server. To diagnose these kinds of problems, your first job is to figure out whether the error occurs on the northbound or southbound connection.

Understanding northbound and southbound connections

In Edge, you can encounter a 503 Service Unavailable error on either the incoming or outgoing connection:

  • Incoming (or northbound) connection - The connection between the client application and the Edge Router. The Router is the component of Apigee Edge that handles incoming requests made to the system.
  • Outgoing (or southbound) connection - The connection between the Edge Message Processor and the backend server. The Message Processor is a component of Apigee Edge that proxies API requests to backend target servers.

If you are an Edge Public Cloud user, you are probably unaware of internal components such as the Router or the Message Processor. These internal components are not visible or accessible to Public Cloud users. Where possible, we provide alternative ways to investigate the problem that do not require direct access to these components.

The following figure illustrates northbound and southbound connections for Apigee Edge.

Determining where the 503 Service Unavailable error occurred

Use one of the following procedures to determine if the 503 Service Unavailable error occurred at the northbound or southbound connection.

Procedure 1: Using UI Trace (For all users)

This procedure can be performed by Public or Private Cloud users:

  1. If the issue is still active, enable the UI trace for the affected API.
  2. If the UI trace for the failing API request shows that the 503 Service Unavailable error occurs during the target request flow or is sent by the backend server, then the issue is southbound (that is, between the Message Processor and the backend server).
  3. If you don't get the trace for the specific API call, then the issue is northbound, between the client application and the Router.

Procedure 2: Using API Monitoring (For Apigee Cloud users only)

If you are a Private Cloud user, skip this procedure.

API Monitoring enables you to isolate problem areas quickly to diagnose error, performance, and latency issues and their source, such as developer apps, API proxies, backend targets, or the API platform.

Step through a sample scenario that demonstrates how to troubleshoot 5xx issues with your APIs using API Monitoring. For example, you may want to set up an alert to be notified when the number of messaging.adaptors.http.flow.ServiceUnavailable faults exceeds a particular threshold.

Procedure 3: Using Nginx Access Logs (For Apigee Private Cloud users only)

If you are a Public Cloud user, skip this procedure.

If the issue has happened in the past or if the issue is intermittent and you are unable to capture the trace, then perform the following steps:

  1. Check the Nginx access logs (/opt/apigee/var/log/edge-router/nginx/ org-env.port_access_log ).
  2. Search if there are any 503 Errors for specific API proxy.
  3. If you can identify any 503 Errors for the specific API at the specific time, then the issue occurred at the southbound connection (between the Message Processor and the backend server).
  4. If not, then the issue occurred at the northbound connection (between the client application and the Router).