You're viewing Apigee Edge documentation.
Go to the
Apigee X documentation. info
Videos
See the following videos for more information on 503 errors:
Video | Description |
---|---|
Troubleshoot and Resolve 503 Service Unavailable Error due to DNS issue | Learn about the following:
|
Troubleshoot and Resolve 503 Service Unavailable Error due to Network issue | Troubleshooting and resolving a real-time 503 Service Unavailable Error caused by Network issue in Apigee Edge |
Symptom
The client application receives an HTTP response status 503 with the message Service Unavailable following an API proxy call.
Error messages
You can see the following error message:
HTTP/1.1 503 Service Unavailable
You can also see the following error message in the HTTP response:
Service unavailable
{ "fault": { "faultstring": "The Service is temporarily unavailable", "detail": { "errorcode": "messaging.adaptors.http.flow.ServiceUnavailable" } } }
Possible causes
The HTTP response 503 Service Unavailable with the error code messaging.adaptors.http.flow.ServiceUnavailable
occurs if the Apigee Edge’s Message Processor experiences errors due to connection time out, incorrect
host name, or SSL handshake failures while communicating with the backend server.
Possible causes for the 503 Service Unavailable response are:
Cause | Description | Who can perform the troubleshooting steps |
---|---|---|
Connection errors due to incorrect DNS resolution | The DNS resolution of the target server resulted in bad IP addresses that lead to connection errors. | Edge Private Cloud users |
Connection errors | Network or connectivity issues prevent the client from connecting to the server. | Edge Private Cloud users |
Incorrect target server host name | The target server host specified is incorrect or has unwanted characters (such as space). | Edge Public and Private Cloud users |
SSL handshake failures | The TLS/SSL handshake failed between the client and server. (Troubleshooting for this class of problem is covered in a separate topic.) | Edge Public and Private Cloud users |
Common diagnosis steps
Determine the Message ID of the failing request
Trace tool
To determine the message ID of the failing request using the Trace Tool:
- If the issue is still active, enable the trace session for the affected API.
- Make the API call and reproduce the issue - 503 Service Unavailable with error code
messaging.adaptors.http.flow.ServiceUnavailable.
- Select one of the failing requests.
- Navigate to the AX phase, and determine the message ID (
X-Apigee.Message-ID
) of the request by scrolling down in the Phase Details section as shown in the following figure.
NGINX access logs
To determine the message ID of the failing request using the NGINX access logs:
You can also refer to NGINX Access logs to determine the message ID for the 503 errors. This is particularly useful if the issue has occurred in the past or if the issue is intermittent and you are unable to capture the trace in the UI. Use the following steps to determine this information from NGINX access logs:
- Check the NGINX access logs: (
/opt/apigee/var/log/edge-router/nginx/ <org>~ <env>.<port#>_access_log
) - Search if there are any 503 Errors for the specific API proxy during a specific duration (if the problem happened in the past) or if there are any requests still failing with 503.
- If there are any 503 Errors with X-Apigee-fault-code messaging.adaptors.http.flow.ServiceUnavailable,
note the message ID for one or more such requests as shown in the following example:
Sample Entry showing the 503 Error
Connection errors due to incorrect DNS resolution
Diagnosis
- Determine the message ID of the failing request.
- Search for the specific request message ID in the Message Processor log (
/opt/apigee/var/log/edge-message-processor/logs/system.log
). You may observe the following errors:
An onConnectTimeout error indicates that the Message Processor was unable to connect to the backend server within the preset connection timeout period (Default: 3 seconds).2019-08-14 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onTimeout() : ClientChannel[Connected:]@164162 useCount=1 bytesRead=0 bytesWritten=0 age=3001ms lastIO=3001ms .onConnectTimeout connectAddress=www.abc.com/11.11.11.11 resolvedAddress=www.abc.com/22.22.22.22 2019-08-14 09:11:49,333 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 ERROR ADAPTORS.HTTP.FLOW - RequestWriteListener.onTimeout() : RequestWriteListener.onTimeout(HTTPRequest@6b393600)
- Note the resolved IP address in the onConnectTimeout error and check if the IP address is valid for your backend server. If the IP address is valid, then go to Connection Errors.
- If the IP address is invalid, then it could most likely be caused due to issues with DNS resolution.
- Repeat step 3 and step 4 for a few more failing API requests and verify if you are seeing the same or any other invalid IP addresses.
- Search through the Message Processor log (
/opt/apigee/var/log/edge-message-processor/logs/system.log
) for messages with the key word DNS Refresh. Check if bad or invalid IP addresses are being added to the DNS cache on the Message Processor once in a while.2019-08-14 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@0 INFO c.a.p.h.d.DNSCachedAddress - DNSCachedAddress.reportDifferences() : DNS Refresh for host: apitarget-uat.schemeweb.co.uk:4436. Added 2 IPs [www.abc.com/22.22.22.22, www.abc.com/33.33.33.33] Removed 1 IPs [www.abc.com/11.11.11.11]
- This issue can happen if there are any issues with the authoritative DNS servers or the name servers configured in
/etc/resolv.conf
.
Typically, there could be one or more authoritative DNS servers configured to perform DNS resolution. If there are no authoritative DNS servers, then it would fall back to the configuration setup in/etc/resolv.conf
and perform DNS resolution as appropriate. For example: If the/etc/resolv.conf
is configured to use specific name servers, then those name servers will be used to perform DNS resolution. - If there are any issues with authoritative DNS servers or name servers specified in
/etc/resolv.conf
, then the backend server host names will be resolved to bad/invalid IP addresses. The bad/invalid IP addresses will then be stored in the DNS cache of the Message Processor.- If the issue with authoritative DNS servers or name servers specified in
/etc/resolv.conf
is persistent, then the bad/invalid IP addresses will continue to remain in the DNS cache of the Message Processor. As long as the bad IP addresses are stored in the DNS cache of the Message Processor, the requests for all those APIs using the specific backend server will fail with 503 error. - If the issue with authoritative DNS servers or name servers specified in
/etc/resolv.conf
is intermittent, then good and bad IP addresses will be stored intermittently in the DNS cache. In this case, you will see 503 errors intermittently for all those APIs using the specific backend server.
- If the issue with authoritative DNS servers or name servers specified in
- If the issue with DNS servers is persistent, then you will see continuous failures. If the issue with DNS servers is intermittent, then you will see intermittent failures. That is, whenever the backend server host name gets resolved to bad IP addresses, then you observe 503 errors. And when the backend server host names are resolved to good IP addresses, then you will observe successful responses.
Resolution
Please work with your operating system administrator and fix the issues with the DNS servers.
- If there’s an issue with your authoritative DNS servers or name servers specified in
/etc/resolv.conf
, then fix the issue with the appropriate server to address this issue. - If there’s any issue with the configuration in
/etc/resolv.conf
on the systems having Message Processors, then fix the configuration issue.
Connection errors
A connection error happens when an Apigee Edge Message Processor attempts to connect to a backend server and one of these problems occurs:
- The Message Processor is unable to connect within the preset connection timeout period. (Default: 3 seconds)
- The backend server refuses the connection.
Diagnosis
- Determine the message ID of the failing request.
-
Search for the specific request message ID in the Message Processor log (
/opt/apigee/var/log/edge-message-processor/logs/system.log
). You may observe the following errors:-
An onConnectTimeout error indicates that the Message Processor was unable to
connect to the backend server within the preset connection timeout period.
2016-06-23 09:11:49,314 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@2 ERROR HTTP.CLIENT - HTTPClient$Context.onTimeout() : ClientChannel[C:]@10 useCount=1 bytesRead=0 bytesWritten=0 age=3001ms lastIO=3001ms .onConnectTimeout connectAddress=www.abc.com/11.11.11.11:80 resolvedAddress=www.abc.com/11.11.11.11 2016-06-23 09:11:49,333 org:myorg env:prod api:Employees rev:1 messageid:mo-96cf6757a-9401-21-1 NIOThread@2 ERROR ADAPTORS.HTTP.FLOW - RequestWriteListener.onTimeout() : RequestWriteListener.onTimeout(HTTPRequest@6b393600)
-
A java.net.ConnectException: Connection refused error indicates the connection
was refused by the backend server.
14:40:16.531 +0530 2016-06-17 09:10:16,531 org:myorg env:prod api:www.abc.com rev:1 rrt07eadn-22739-40983870-15 NIOThread@2 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to www.abc.com:11.11.11.11:443 failed with exception {} java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_75] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) ~[na:1.7.0_75] at com.apigee.nio.ClientChannel.finishConnect(ClientChannel.java:121) ~[nio-1.0.0.jar:na] at com.apigee.nio.handlers.NIOThread.run(NIOThread.java:108) ~[nio-1.0.0.jar:na]
-
An onConnectTimeout error indicates that the Message Processor was unable to
connect to the backend server within the preset connection timeout period.
- Check if you are able to connect to the specific backend server directly from each of the
Message Processors using the
telnet
command:- If the backend server resolves to a single IP address, then use the following command:
telnet BackendServer-IPaddress 443
- If the backend server resolves to multiple IP addresses, then use the hostname of
the backend server in the
telnet
command as shown below:telnet BackendServer-HostName 443
- If the backend server resolves to a single IP address, then use the following command:
- If you are able to connect to the backend server, then you might see a message like
Connected to backend-server
. If you are unable to connect to the backend server, this might be because the Message Processors' IP addresses are not allowlisted on the specific backend server.
Resolution
Give access to the Message Processor's IP addresses on the specific backend server to allow traffic from the Edge Message Processors to access your backend server. For example, On Linux, you could use iptables to allow the traffic from the Message Processor's IP addresses on the backend server.
If the problem persists, work with your Network administrator to determine and fix the issue. If you need any further assistance from Apigee, contact Apigee Support.
Incorrect target server host name
Diagnosis
If the host name specified in the target server is incorrect, then you can get 503 Service Unavailable response with the error code
messaging.adaptors.http.flow.ServiceUnavailable.
Trace tool
To diagnose using the Trace tool:
- If the issue is still active, enable the trace session for the affected API.
- Make the API call and reproduce the issue - 503 Service Unavailable with error code
messaging.adaptors.http.flow.ServiceUnavailable.
- Select one of the failing requests.
- Navigate through various phases of the trace and locate where the failure occurred.
- Select the FlowInfo which has the error. You may find more information in the error.cause field which can tell you the cause for failure as shown in the following example:
Sample request showing error.cause in the trace
- If you notice that error.cause shows Host not reachable, then the likely cause for the error is one of the following:
- The host name specified in the target server/target endpoint configuration is incorrect or has unwanted space or special characters.
For example, there’s an unwanted space in the host name as shown below:
"demo-target.apigee.net "
- The host name overwritten by the target.url variable in the API Proxy using AssignMessage or JavaScript policy is incorrect or has a space or any other unwanted special characters.
- The host name specified in the target server/target endpoint configuration is incorrect or has unwanted space or special characters.
- Check the target endpoint configuration and/or the target server definition to see if the target server host name is incorrect or has any unwanted space or special characters.
- If the target server host is dynamically created, then check the appropriate policy (AssignMessage/JavaScript policy, for example) used to create it. Check to see if the target server host name is incorrect or has any unwanted space or special characters.
- Once you’ve determined the target server host name, run the
nslookup/dig
command on the host name to see if it can be resolved.For example, running the
nslookup
command on the host name with an unwanted space returns the following output:nslookup "demo-target.apigee.net " Server: 49.205.75.2 Address: 49.205.75.2#53 ** server can't find demo-target.apigee.net\032: NXDOMAIN
- If the Operating system command
nslookup
also fails to resolve the host name, then the cause of this issue is the incorrect host name used for the target server.Go to Resolution.
Message processor logs
To diagnose using message processor logs:
- Determine the message ID of the failing request.
- Search for the message ID in the Message Processor log. (
/opt/apigee/var/log/edge-message-processor/logs/system.log
) - If you see the following warning/error messages, the Message Processor could not resolve the host name. Since the message will be snoozed, you may not see this
warning message for all the message IDs/requests.
org:myorg env:prod api:TestTargetServer rev:2 messageid:<messageid> NIOThread@0 WARN S.HTTPCLIENTSERVICE - DNSCache$2.failed() : Failed to resolve hostname www.somehost.com . Reason mocktarget.apigee.net : Name or service not known. This log message will snooze for 2 hours
- This will be followed by a warning message, where the Message Processor removes the address from the DNS cache, as the target server host could not be reached.
org:myorg env:prod api:TestTargetServer rev:2 messageid:<messageid> NIOThread@0 WARN c.a.p.h.d.DNSCachedAddress - DNSCachedAddress.addressNotReachable() : The last address has been removed from Address list null refreshing
- You may then see a message where the Message Processor fails with the exception “Host not reachable”. Sometimes it shows the host name as part of the error message:
org:myorg env:prod api:TestTargetServer rev:2 messageid:<messageid> NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to demo-target.apigee.net failed with exception {} java.lang.RuntimeException: Host not reachable at com.apigee.protocol.http.HTTPClient$Context.initConnect(HTTPClient.java:704) at com.apigee.protocol.http.HTTPClient$Context.send(HTTPClient.java:675) at com.apigee.messaging.adaptors.http.flow.data.TargetRequestSender.sendRequest(TargetRequestSender.java:234) …<snipped>
- Sometimes it may show it as null as the host name cannot be resolved or reachable as shown below:
org:myorg env:prod api:TestTargetServer rev:2 messageid:<messageid> NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to null failed with exception {} java.lang.RuntimeException: Host not reachable at com.apigee.protocol.http.HTTPClient$Context.initConnect(HTTPClient.java:704) at com.apigee.protocol.http.HTTPClient$Context.send(HTTPClient.java:675) at com.apigee.messaging.adaptors.http.flow.data.TargetRequestSender.sendRequest(TargetRequestSender.java:234) …<snipped>
- The
Host not reachable
error usually occurs in one of the following cases:- The host name specified in the target server/target endpoint configuration is incorrect or has unwanted space or special characters.
For example, there’s an unwanted space in the host name "demo-target.apigee.net " in the following error message:NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to demo-target.apigee.net failed with exception
- The host name overwritten by the target.url variable in the API Proxy using AssignMessage or JavaScript policy is incorrect or has a space or any other unwanted special characters.
- The host name specified in the target server/target endpoint configuration is incorrect or has unwanted space or special characters.
- Determine the target server host name to which the Message Processor is trying to communicate by using one of the following:
- Examine the error message containing
Host not reachable
carefully. - If the error message shows the host name, then copy the host name including any spaces or any special characters.
- If the error message shows null for the host name as seen in the following error message,
org:myorg env:prod api:TestTargetServer rev:2 messageid:<messageid> NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context.onConnectFailure() : connect to null failed with exception {}
- Determine the host name by checking the target server definition used in the failing API Proxy.
- If the target server host is dynamically created, then check the appropriate policy (for example, AssignMessage/JavaScript policy) used to create it.
- Once you’ve determined the target server host name, run the nslookup/dig command on the host name and check to see if it can be resolved.
For example, run the nslookup command on the host name that has a space
nslookup "demo-target.apigee.net " Server: 49.205.75.2 Address: 49.205.75.2#53 ** server can't find demo-target.apigee.net\032: NXDOMAIN
- If the Operating system command nslookup also fails to resolve the host name, then the cause of this issue is the incorrect host name used for the target server.
Resolution
- Ensure that the target server host name specified in the target endpoint configuration or in the target server definition is correct and does not have any unwanted space or special characters.
- If you use any AssignMessage/JavaScript policy to dynamically generate the target server host name, then investigate policy definition and the code and ensure that the target server hostname is generated correctly.
SSL handshake failures
An entire troubleshooting playbook is devoted to TLS/SSL handshake errors. See SSL Handshake Failures.
Determining the source of the problem
Certain types of errors can occur either on the incoming (northbound) or outgoing (southbound) connection. An incoming (northbound) error occurs between the client application and Edge. An outgoing (southbound) error occurs between Edge and the backend target server. To diagnose these kinds of problems, your first job is to figure out whether the error occurs on the northbound or southbound connection.
Understanding northbound and southbound connections
In Edge, you can encounter a 503 Service Unavailable error on either the incoming or outgoing connection:
- Incoming (or northbound) connection - The connection between the client application and the Edge Router. The Router is the component of Apigee Edge that handles incoming requests made to the system.
- Outgoing (or southbound) connection - The connection between the Edge Message Processor and the backend server. The Message Processor is a component of Apigee Edge that proxies API requests to backend target servers.
If you are an Edge Public Cloud user, you are probably unaware of internal components such as the Router or the Message Processor. These internal components are not visible or accessible to Public Cloud users. Where possible, we provide alternative ways to investigate the problem that do not require direct access to these components.
The following figure illustrates northbound and southbound connections for Apigee Edge.
Determining where the 503 Service Unavailable error occurred
Use one of the following procedures to determine if the 503 Service Unavailable error occurred at the northbound or southbound connection.
UI trace
To determine where the error occurred using UI Trace:
- If the issue is still active, enable the UI trace for the affected API.
- If the UI trace for the failing API request shows that the 503 Service Unavailable error occurs during the target request flow or is sent by the backend server, then the issue is southbound (that is, between the Message Processor and the backend server).
- If you don't get the trace for the specific API call, then the issue is northbound, between the client application and the Router.
API monitoring
API Monitoring enables you to isolate problem areas quickly to diagnose error, performance, and latency issues and their source, such as developer apps, API proxies, backend targets, or the API platform.
Step through a sample scenario that demonstrates how to troubleshoot 5xx issues with your APIs using API Monitoring.
For example, you may want to set up an alert to be notified when the number of messaging.adaptors.http.flow.ServiceUnavailable
faults exceeds a particular threshold.
NGINX access logs
To determine where the error occurred using UI Trace:
If the issue has happened in the past or if the issue is intermittent and you are unable to capture the trace, then perform the following steps:
- Check the NGINX access logs (
/opt/apigee/var/log/edge-router/nginx/ org-env.port_access_log
). - Search if there are any 503 Errors for specific API proxy.
- If you can identify any 503 Errors for the specific API at the specific time, then the issue occurred at the southbound connection (between the Message Processor and the backend server).
- If not, then the issue occurred at the northbound connection (between the client application and the Router).