502 Bad Gateway - Socket hang up

You're viewing Apigee Edge documentation.
Go to the Apigee X documentation.
info

Symptom

The client application receives an HTTP status code of 502 Bad Gateway with the code ECONNRESET as a response for API calls in Edge Microgateway.

Error message

The client will see the following response code:

HTTP/1.1 502 Bad Gateway

The response will include the following error message:

{"message":"socket hang up","code":"ECONNRESET"}

Possible causes

Cause Description Troubleshooting instructions applicable for
Incorrectly configured keep-alive timeout Keep-alive timeouts configured incorrectly between Edge Microgateway and the target server. Edge Public and Private Cloud users
Target server prematurely closes connection The target server prematurely closes the connection while the Edge Microgateway is sending the request payload. Edge Public and Private Cloud users

Common diagnosis steps

  1. Check the Edge Microgateway logs:
    /var/tmp/edgemicro-`hostname`-*.log
    
  2. Search to see if there are any 502 errors with code ECONNRESET during a specific duration (if the problem happened in the past) or if there are any requests still failing with 502.
    2021-06-23T03:52:24.110Z [error][0:8000][3][myorg][test]
    [emg_badtarget/flakey/hangup][][][6b089a00-d3d6-11eb-95aa-911f1ee6c684]
    [microgateway-core][][GET][502][socket hang up][ECONNRESET][]
    
  3. If you have the logging level set to warn or info, there will also be a [warn] message including the target server hostname and port in the second element. In this example it is X.X.X.X:8080, and this can be used later to capture a tcpdump.
    2021-06-23T03:52:24.109Z
    [warn][X.X.X.X:8080][3][myorg][test][emg_badtarget/flakey/hangup]
    [][][6b089a00-d3d6-11eb-95aa-911f1ee6c684][plugins-middleware]
    [targetRequest error][GET][][socket hang up][ECONNRESET][395]
    
  4. The error code [socket hang up][ECONNRESET] indicates that the target server has closed the connection with Edge Microgateway. This can be searched in the logs to determine how often it is happening.

Cause: Incorrectly configured keep-alive timeout

Diagnosis

  1. Use the steps in Common diagnosis steps and verify if you got the [socket hang up][ECONNRESET] error.
  2. If yes, then investigate further with the help of tcpdump as explained below:

Using tcpdump

  1. Capture a tcpdump between Edge Microgateway and the backend server on the Edge Microgateway host operating system with the following command:
    tcpdump -i any -s 0 host TARGET_SERVER_HOSTNAME -w FILENAME.pcap
    
  2. Analyze the tcpdump captured:

    Sample tcpdump output: ( view larger image)

    In the above sample tcpdump, you can see the following:

    1. In packet 250288, the client sends a POST request.
    2. In packet 250371, the server responds with 200 OK.
    3. In packet 250559, the client sends an ACK.
    4. In packet 250560, the server sends the Continuation message.
    5. In packet 250561, the client sends an ACK.
    6. In packet 262436, the server sends a FIN, ACK to the client initiating the closure of the connection. Note this is roughly five seconds after the previous packet (250561).
    7. In packet 262441, the client sends another POST request. However, this fails because the server already initiated closure of the connection. It responds with a RST in packet 262441.

    The same connection was re-used at least once successfully in this example, but on the final request, the server initiates a closure of the connection after five seconds of idle time, which happens to be at the same time the client sent a new request. This suggests that the backend server keep-alive timeout is most likely shorter or equal to the value set in the client. To validate this, see Compare keep-alive timeout on Edge Microgateway and backend server.

Compare keep-alive timeouts

  1. Edge Microgateway does not have a specific keep-alive timeout property. It is determined by the operating system where it is running. Common examples are Windows, Linux, and Docker containers.
  2. It may be possible that this is customized in the operating system. Check with your system administrator. By default, Linux operating systems have a default keep-alive timeout of two hours.
  3. Next, check the keep-alive timeout property configured on your backend server. Let’s say your backend server is configured with a value of 10 seconds.
  4. If you determine that the value of the keep-alive timeout on the operating system is higher than the value of the keep-alive timeout property on the backend server as in the above example, then that is the cause for 502 errors.

Resolution

Ensure that the keep-alive timeout property is always lower on the operating system where Edge Microgateway is running compared to that on the backend server.

  1. Determine the value set for the keep-alive timeout on the backend server.
  2. Configure an appropriate value for the keep-alive timeout property in the operating system, such that the keep-alive timeout property is lower than the value set on the backend server, using the steps that are applicable to your operating system.

Best Practice

It is strongly advised that the downstream components always have a lesser keep-alive timeout threshold than configured on the upstream servers to avoid these kinds of race conditions and 502 errors. Each downstream hop should be lower than each upstream hop. In Edge Microgateway, it is good practice to use the following guidelines:

  1. The keep-alive timeout on the client application or load balancer should be less than the Edge Microgateway keep-alive timeout.

    To configure the keep-alive timeout on the Edge Microgateway, add the keep_alive_timeout value to your ~/.edgemicro/org-env-config.yaml file.

    edgemicro:
      keep_alive_timeout: 65000
    
  2. The Edge Microgateway operating system keep-alive timeout should be less than the target server keep-alive timeout.
  3. If you have any other hops in front of or behind Edge Microgateway, the same rule should be applied. You should always leave it as the responsibility of the downstream client to close the connection with the upstream.

Cause: Target server prematurely closes connection

Diagnosis

  1. Use the steps explained in Common diagnosis steps and verify if you got the [socket hang up][ECONNRESET] error.
  2. If yes, then investigate further with the help of tcpdump as explained below.

    The error message [targetRequest error][GET][][socket hang up][ECONNRESET] in the above example indicates that this error occurred while Edge Microgateway was sending the request to the backend (target) server. That is, Edge Microgateway sent the API request to the backend server and was waiting for the response. However, the backend server terminated the connection abruptly before Edge Microgateway received a response.

  3. Check your backend server logs and see if there are any errors or information that could have led the backend server to terminate the connection abruptly. If you find any errors or information, then go to Resolution and fix the issue appropriately in your backend server.
  4. If you don't find any errors or information in your backend server, collect the tcpdump output on the Edge Microgateway server:
    tcpdump -i any -s 0 host TARGET_SERVER_HOSTNAME -w FILENAME.pcap
    
  5. Analyze the tcpdump captured:

    Sample tcpdump output: ( view larger image)

    In the above sample tcpdump, you can see the following:

    1. In packet 4, Edge Microgateway sent a GET request to the target server.
    2. In packet 5, the target server responded with ACK to acknowledge the request.
    3. However, in packet 6, instead of responding with a response payload, the target server sends a FIN, ACK initiating the closure of the connection.
    4. In packets 7 onwards, the connection is closed mutually. Since the connection was closed before the response was sent, Edge Microgateway will return the HTTP 502 error back to the client.
    5. Note that the timestamp of packet 8, 2021-06-23T03:52:24.110Z corresponds to the timestamp at which error was logged in the Edge Microgateway logs. The timestamps in the log files and in the tcpdump can often be used to correlate the errors with the actual packets.

    Resolution

    Fix the issue on the backend server appropriately.

    If the issue persists and you need assistance troubleshooting 502 Bad Gateway Error or you suspect that it's an issue within Edge Microgateway, go to Must gather diagnostic information.

    Must gather diagnostic information

    If the problem persists even after following the above instructions, gather the following diagnostic information and then contact Apigee Edge Support:

    • Log files: The default folder is /var/tmp but it may be overridden in the main config.yaml file (logging > dir parameter). It is recommended to change the log > level to info before providing the log files to Apigee Support.
    • Configuration file: The main configuration of Edge Microgateway resides in the YAML file in the default Edge Microgateway folder, $HOME/.edgemicro. There is a default config file called default.yaml and then one for each environment ORG-ENV-config.yaml. Please upload this file in full for the impacted org and env.