502 Bad Gateway

Symptom

The client application gets an HTTP status code of 502 with the message "Bad Gateway" as a response for API calls.

The HTTP status code 502 means that the client is not receiving a valid response from the backend servers that should actually fulfill the request.

Error Messages

Client application gets the following response code:

HTTP/1.1 502 Bad Gateway

In addition, you may observe the following error messages:

<html>
<head>
<title>Error</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>An error occurred.</h1>
<p>Sorry, the page you are looking for is currently unavailable.<br/>
Please try again later.</p>
</body>
</html>

If the error comes from the backend server, then you may see something like this. The error message from backend completely depends on its implementation.

<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
</body>
</html>

Possible Causes

Here are a few possible causes that can lead to 502 Bad Gateway error for APIs going through Apigee Edge:

Cause Description Troubleshooting Instructions Applicable For
No MPs available in the pool This error is observed if all the MPs in the pool are unavailable, that is, they are either down or busy and hence not responding. Edge Private Cloud users
Incorrect SSL configuration between Routers and MPs This error is observed if the client’s CA signed root certificate is missing in the truststore of Edge's Router. Edge Private Cloud users
Error from the backend server This error will be observed if the backend server fails and sends this response. Edge Public and Private Cloud users

Cause: No MPs available in the pool

This error will occur if Router finds that all the Message Processors in a given region/data center are unavailable (for example, if they are all down).

Apigee Edge is configured in such a way that the incoming API traffic (requests) in a given region/data center are always routed from the Routers to the Message Processors (MPs) in the same region/data center. In some cases, Apigee Edge components may be setup in just one region/data center and in some cases, they might be setup in more than one region/data center. In each region/data center there will be two or more Routers and Message Processors configured.

Diagnosis

  1. Determine the region/data center (s) in which the API requests are failing with 502 Bad Gateway error, if there is more than one region/data center. You can find this either by identifying the region in which the users are observing 502 errors or by checking the Nginx Access logs in /opt/apigee/var/log/edge-router/nginx/ directory on each of the Routers belonging to different regions.
  2. You will see the following error in the Nginx Error logs (/opt/apigee/var/log/edge-router/nginx/ORG-Env._error_log)
    2019/06/24 15:26:00 [error] 4796#4796: *56357443 no live upstreams while connecting to upstream, client: <Router_IP_address>, server: <HostAlias>, request: "PUT <BasePath> HTTP/1.1", upstream: "http://<ListOfMP-IP_R-MP-Port>/<BasePath>", host: "<HostAlias>"
    

Scenario 1: All the Message Processors are down

  1. Check if the Message Processors in the specific region/data center are up and running.
  2. If all the Message Processors are down, restart them.

Resolution

Restart all the Message Processors using the following command:

/opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart

Scenario 2: All the Message Processors are busy processing ongoing requests

This error will occur if the Routers finds that all the all the Message Processors in a given region/data center are unavailable as they are all busy processing ongoing requests.

  1. Check if the Message Processors in the specific region/data center are up and running.
  2. If all the Message Processors are up and active, then check if the Message Processor(s) is experiencing high CPU usage, then generate three thread dumps every 30 seconds using the following command:
    <JAVA_HOME>/bin/jstack -l <pid> > <filename>
    
  3. If the Message Processor(s) is experiencing high memory usage then generate a heap dump using the following command:
    sudo -u apigee /bin/jmap -dump:live,format=b,file= 
    
  4. Restart the Message Processor using the below command. It should bring down the CPU and Memory:
    /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
    
  5. Monitor the API calls to confirm if the problem still exists.
  6. Contact Apigee Support and provide the thread dumps, heap dump, and Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log)to help investigate the cause for the high CPU/memory usage.

Cause: Incorrect SSL configuration between Routers and MPs

Diagnosis

  1. Check the Nginx Access logs (/opt/apigee/var/log/edge-router/nginx/ORG-Env._access_log). You will see 502 response as shown below:
        2019-07-23T12:13:42+03:00	sc-10-254-226-23	10.X.X.X:53634	10.X.X.X:8998	0.000	-	-	502	502	189	344	GET <path> curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2	<host alias>	mp-10-254-226-23-23706-8552529-1	10.129.107.101	-	-	-1	-	-	dc-2	gateway-2	green	-	gateway-2	dc-2	op	pilot	http	-
    
  2. Check the Nginx Error logs (/opt/apigee/var/log/edge-router/nginx/ORG-Env._error_log). You will see errors like this:
    	2019/07/30 17:02:24 [error] 7691#7691: *11753633 peer closed connection in SSL handshake while SSL handshaking to upstream, client: X.X.X.X, server: <HostAlias>, request: "GET /no-target HTTP/1.1", upstream: "https://X.X.X.X:8998/no-target", host: "<HostAlias>"
    
  3. This shows the SSL handshake fails between Router and Message Processor.
  4. If you notice carefully in the error message in step #1 and #2, the port # used for communicating with the Message Processor is 8998 which is a non secure port but the protocol is SSL (https). Usually the secure port # used is 8443. Since a non secure port is used for secure communication it causes the SSL handshake failure.
  5. Typically this can happen if you have missed out any steps or set any incorrect values while configuring SSL between Router and Message Processor. Refer to the steps outlined here.
    For example, this error can occur if
    1. The port # is specified as 8998 instead of 8443 in /opt/apigee/customer/application/message-processor.properties as shown below
              conf/message-processor-communication.properties+local.http.port=8998
      
    2. The Router config files under the directory /opt/nginx/conf.d/* are not deleted and the Router has not been restarted while doing the SSL configuration. In this scenario, you can notice that the port# of the Message Processors will remain 8998 in the config files.

Resolution

  1. Ensure that all the steps provided in Configuring TLS between a Router and a Message Processor are followed properly.
  2. If the problem persists, go to Gather Diagnostic Information.

Cause: Error from the backend server

Diagnosis

  1. If the error occurs every time, then you can capture the UI trace for the failing requests. Select a failing request and navigate through various phases in the trace. If you notice that you get the “502 Bad Gateway” from the backend server itself, then the issue could be because some failure could have happened on the backend server.
    Trace showing 502 Bad Gateway coming from the backend server
  2. If the issue is intermittent and you are unable to capture the trace,
    1. If you are a Public Cloud user, then you could use API Monitoring and check the details about the 502 errors.
      1. If you observe the Fault Code is messaging.adaptors.http.flow.ErrorResponseCode and the Fault Source is target, then the error is caused by the backend server.
    2. If you are a Private Cloud user, then you could analyze the Nginx Access logs
      /opt/apigee/var/log/edge-router/nginx/ORG-Env._access_log.
      You will see the entry for the failing request as follows:
      2017-02-24T14:42:12+00:00	rt-01	192.8.155.2:18118	192.168.84.166:8998	10.225	-	-	502	502	440	0	GET /adv-eadlg-test/documents?type=doctype HTTP/1.1	rt-02efawae234-1234	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36	myorg-dev.apigee.net	 rt-02efawae234-1234	6	-	false	target	messaging.adaptors.http.flow.ErrorResponseCode	null/null	-	/organizations/myorg/environments/dev/apiproxies/api123
      
      1. If you observe the Fault Code is messaging.adaptors.http.flow.ErrorResponseCode and the Fault Source is target, then the error is caused by the backend server.

Resolution

  1. Work with your backend server team to fix this issue in the backend.

Gather Diagnostic Information

  1. Nginx Access logs
    (/opt/apigee/var/log/edge-router/nginx/ORG-Env._access_log)
    and Error logs
    (/opt/apigee/var/log/edge-router/nginx/ORG-Env._error_log).
  2. Message Processor logs
    (/opt/apigee/var/log/edge-message-processor/logs/system.log).