504 Gateway Timeout

Symptom

The client application receives an HTTP status code of 504 with the message "Gateway Timeout" as a response for the API calls.

The HTTP status code - 504 Gateway Timeout error indicates that the client did not receive a timely response from the Edge Gateway or backend server during the execution of an API

Error Messages

Client application gets the following response code:

HTTP/1.1 504 Gateway Timeout

In some cases, the following error message may also be observed:

{
   "fault": {
      "faultstring": "Gateway Timeout", 
      "detail": {
           "errorcode": "messaging.adaptors.http.flow.GatewayTimeout"
       }
    }
}

What causes gateway timeouts?

Typical path for an API request via the Edge platform will be Client -> Router -> Message Processor -> Backend Server as shown in the below figure:

The client application, routers, and Message Processors within Edge platform are set up with suitable timeout values. The Edge platform expects a response to be sent within a certain period of time for every API request based on the timeout values. If you don't get the response within the specified period of time, then 504 Gateway Timeout Error is returned.

The following table provides more details about when timeouts may occur in Edge:

Timeout Occurrence Details
Timeout occurs on Message Processor
  • Backend server does not respond to the Message Processor within a specified timeout period on the Message Processor.
  • Message Processor times out and sends the response status as 504 Gateway Timeout to the Router.
Timeout occurs on Router
  • Message Processor does not respond to the router within the specified timeout period on the Router.
  • Router times out and sends the response status as 504 Gateway Timeout to the client application.
Timeout occurs on client application
  • Router does not respond to the client application within the specified timeout period on the router.
  • The Client application times out and ends the response status as 504 Gateway Timeout to the end user.

Possible Causes

In Edge, the typical causes for 504 Gateway Timeout error are:

Cause Details Steps Given For
Slow Backend Server The backend server that is processing the API request is too slow due to high load or poor performance. Public and Private Cloud users
Slow API Proxy processing by Edge Edge takes a long time to process the API request due to high load or poor performance.

Slow Backend Server

If the backend server is very slow and/or taking a long time to process the API request, then you will get a 504 Gateway Timeout error. As explained in the section above, the timeout can occur under one of the following scenarios:

  1. Message Processor times out before backend server responds.
  2. Router times out before Message Processor/backend server responds.
  3. Client application times out before Router/Message Processor/backend server responds.

The following sections describe how to diagnose and resolve the issue under each of these scenarios.

Scenario #1 Message Processor times out before Backend Server responds

Diagnosis

You can use the following procedures to diagnose if the 504 Gateway Timeout error has occurred because of the slow backend server.

Procedure #1 Using Trace

If the issue is still active (504 Errors are still happening), then follow the below steps:

  1. Trace the affected API in Edge UI. Either wait for the error to occur or if you have the API call, then make some API calls and reproduce the 504 Gateway Timeout Error.
  2. Once the error has occurred, examine the specific request which shows the response code as 504.
  3. Check the elapsed time at each phase and make a note of the phase where most time is spent.
  4. If you observe the "Error" with the longest elapsed time immediately after one of the following phases, then it indicates that the backend server is slow or taking a long time to process the request:
    • "Request sent to target server"
    • Service Callout policy

The following provides a sample Trace showing that the backend server did not respond even after 55 seconds resulting in a 504 Gateway Timeout Error:

In the above trace, the Message Processor times out after 55002 ms as the backend server does not respond.

Procedure #2 Using Message Processor Logs

  1. Check the Message Processor's log (/opt/apigee/var/log/edge-message-processor/logs/system.log)
  2. If you find "Gateway Timeout" and "onTimeoutRead" errors for the specific API proxy request at the specific time, then it indicates that the Message Processor has timed out.

    Sample Message Processor log showing Gateway Timeout Error

    2015-09-29 20:16:54,340 org:myorg env:staging api:profiles rev:13 NIOThread@1
    ERROR ADAPTORS.HTTP.FLOW - AbstractResponseListener.onException() :
    AbstractResponseListener.onError(HTTPResponse@4d898cf1, Gateway
    Timeout) 
    2015-09-29 20:16:57,361 org:myorg env:staging api:profileNewsletters rev:8
    NIOThread@0 ERROR HTTP.CLIENT - HTTPClient$Context$3.onTimeout() :
    SSLClientChannel[C:XX.XX.XX.XX:443 Remote
    host:192.168.38.54:38302]@120171 useCount=2 bytesRead=0
    bytesWritten=824 age=55458ms lastIO=55000ms .onTimeoutRead
    

    In the above Message Processor log, you notice that the backend server denoted with the IP address XX.XX.XX.XX did not respond even after 55 seconds (lastIO=55000ms). As a result, the Message Processor timed out and sent 504 Gateway Timeout Error.

    Check This: How is timeout controlled on Message Processor?

    • How is timeout controlled on Message Processor Message Processors are usually set with a default timeout value of 55 seconds) via the property "HTTPTransport.io.timeout.millis". This timeout value is applicable for all the API Proxies that belong to an organization served by this Message Processor.
      • If the backend server does not respond within 55 seconds, then the Message Processor times out and sends 504 Gateway Timeout error to the client.
    • The timeout value specified in the Message Processor can be overridden by the property "io.timeout.millis" specified within the API Proxy. This timeout value is applicable to a specific API Proxy in which the above mentioned property is specified. For example, if the io.timeout.millis is set to 10 seconds within the API Proxy, then the timeout value of 10 seconds will be used for this specific API Proxy.
      • If the backend server does not respond within 10 seconds for the specific API Proxy, then the Message Processor times out and sends 504 Gateway Timeout error to the client.

Resolution

  1. Check why the backend server is taking more than 55 seconds and see if it can be fixed/optimized to respond faster.
  2. If it is not possible to fix/optimize the backend server or it is known that the backend server takes a longer time than the configured timeout, then Increase the timeout value on Router and Message Processor to a suitable value. Only Private Cloud users can perform this step. If you are on Public Cloud, contact Apigee Support for assistance.

Scenario #2 - Router times out before Message Processor/Backend Server responds

You might get 504 Gateway Timeout Errors if the router times out before the Message Processor/backend server responds. This can happen under one of the following circumstances:

  • The timeout value set on the Router is shorter than the timeout value set on the Message Processor. For example, let's say the timeout on Router is 50 seconds, while the Message Processor is 55 seconds.
    Timeout on Router Timeout on Message Processor
    50 seconds 55 seconds
  • The timeout value on the Message Processor is overridden with a higher timeout value using the "io.timeout.millis" property set within the target endpoint configuration of the API Proxy:

    For example, if the following timeout values are set:

    Timeout on Router Timeout on Message Processor Timeout within API Proxy
    57 seconds 55 seconds 120 seconds

    But the io.timeout.millis is set to 120 seconds in the API Proxy:

    <HTTPTargetConnection>
         <Properties>
              <Property name="io.timeout.millis">120000</Property>
          </Properties>
          <URL>http://www.apigee.com</URL>
    </HTTPTargetConnection>
    

    Then, the Message Processor will not timeout after 55 seconds even though it's timeout value (55 seconds) is less than the timeout value on the router (57 seconds). This is because the timeout value of 55 seconds on the Message Processor is overridden by the the value of 120 seconds that is set within the API Proxy. So the timeout value of the Message Processor for this specific API Proxy will be 120 seconds.

    Since the Router has a lower timeout value (57 seconds) compared to 120 seconds set within the API Proxy, the router will timeout if the backend server does not respond back after 57 seconds.

Diagnosis

  1. Check the Nginx access log (/opt/apigee/var/log/edge-router/nginx/<org>~<env>.<port#>_access_log)
  2. If the router times out before the Message Processor, then you will see the status of 504 on the Nginx access logs for the specific API request and the message id from the Message Processor will be set as "-". This is because the Router didn't get any response from the Message Processor within the timeout period set on the router.

    Sample Nginx Log Entry showing 504 due to Router timing out

  3. In the above example, notice the status of 504 on Nginx, the message id from the Message Processor is "-" and total time elapsed is 57.001 seconds. This is because the router timed out after 57.001 seconds and we didn't get any response from the Message Processor.
  4. In this case, you will see "Broken Pipe" Exceptions in the Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log).
    2017-06-09 00:00:25,886 org:myorg env:test api:myapi-v1 rev:23 messageid:rrt-mp01-18869-23151-1  NIOThread@1 INFO  HTTP.SERVICE - ExceptionHandler.handleException() : Exception java.io.IOException: Broken pipe occurred while writing to channel ClientOutputChannel(ClientChannel[A:XX.XX.XX.XX:8998 Remote host:YY.YY.YY.YY:51400]@23751 useCount=1 bytesRead=0 bytesWritten=486 age=330465ms  lastIO=0ms )
    2017-06-09 00:00:25,887  org:myorg env:test api:myapi-v1 rev:23 messageid:rrt-mp01-18869-23151-1  NIOThread@1 INFO  HTTP.SERVICE - ExceptionHandler.handleException() : Exception trace:
    java.io.IOException: Broken pipe
            at com.apigee.nio.channels.ClientOutputChannel.writePending(ClientOutputChannel.java:51) ~[nio-1.0.0.jar:na]
            at com.apigee.nio.channels.OutputChannel.onWrite(OutputChannel.java:116) ~[nio-1.0.0.jar:na]
            at com.apigee.nio.channels.OutputChannel.write(OutputChannel.java:81) ~[nio-1.0.0.jar:na]
             … <snipped>
    

This error is displayed because once the router times out, it closes the connection with the Message Processor. When the Message Processor completes its processing, it attempts to write the response to the router. Since the connection to the router is already closed, you get the Broken Pipe exception on the Message Processor.

This exception is expected to be seen under the circumstances explained above. So the actual cause for the 504 Gateway Timeout error is still the backend server taking longer time to respond and you need to address that issue.

Resolution

  1. If it's a custom backend server, then
    1. Check why the backend server is taking a long time to respond and see if it can be fixed/optimized to respond faster.
    2. If it is not possible to fix/optimize the backend server or it is a known fact that the backend server takes a long time, then Increase the timeout value on Router and Message Processor.

      Idea: Set the timeout value on the different components in the following order:

      Timeout on Client > Timeout on Router > Timeout on Message Processor > Timeout within API Proxy

  2. If it's a NodeJS backend server, then:
    1. Check if the NodeJS code makes calls to any other backend server(s) and if it's taking a long time to return a response. Check why the backend server(s) is taking longer time and fix the problem as appropriate.
    2. Check if the Message Processor(s) is experiencing high CPU or Memory usage:
      1. If any Message Processor is experiencing high CPU usage, then generate three thread dumps every 30 seconds using the following command:
        <JAVA_HOME>/bin/jstack -l <pid> > <filename>
                    
        
      2. If any Message Processor is experiencing high memory usage then generate a heap dump using the following command:
        sudo -u apigee <JAVA_HOME>/bin/jmap -dump:live,format=b,file=<filename> <pid>
                        
        
      3. Restart the Message Processor using the below command. It should bring down the CPU and Memory:
        /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
                            
        
      4. Monitor the API calls to confirm if the problem still exists.
      5. Contact Apigee Support and provide the thread dumps, heap dump, and Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log)to help investigate the cause for the high CPU/memory usage.

Check This: How is timeout controlled for NodeJS backend servers on Message Processor

  • The NodeJS backend server runs within the JVM process of Message Processor. The timeout value for NodeJS backend servers is controlled via the property "http.request.timeout.seconds" in nodejs.properties file. This property is set to 0 by default i.e., the timeout is disabled by default for all the API Proxies that belong to an organization served by this Message Processor. So even if a NodeJS backend server takes long time, the Message Processor will not timeout.
  • However, if the NodeJS backend server takes long and if the time taken by the API request is > 57 seconds, then the Router will timeout and sends 504 Gateway Timeout Error to the client.

Scenario #3 - Client Application times out before Router/Message Processor/Backend Server responds

You might get 504 Gateway Timeout Errors if the client application times out before the backend server responds. This situation can happen if:

  1. The timeout value set on the client application is lower than the timeout value set on the router and Message Processor:

    For example, if the following timeout values are set:

    Timeout on Client Timeout on Router Timeout on Message Processor
    50 seconds 57 seconds 55 seconds

    In this case, the total time available to get a response for an API request through Edge is <= 50 seconds. This includes the time taken to make an API request, the request being processed by Edge (Router, Message Processor), the request being sent to the backend server (if applicable), backend processing the request and sending the response, Edge processing the response and finally sending it back to the client.

    If the router does not respond to the client within 50 seconds, then the client will timeout and close the connection with the router. The client will get the response code of 504.

    This will cause the Nginx to set a status code of 499 indicating the client closed the connection.

Diagnosis

  1. If the client application times out before it gets a response from the router, then it will close the connection with the router. In this situation, you will see a status code of 499 in the Nginx access logs for the specific API request.

    Sample Nginx Log Entry showing status code 499

  2. In the above example, note that the status of 499 on the Nginx and total time elapsed is 50.001 seconds. This indicates that the client timed out after 50.001 seconds.
  3. In this case, you will see "Broken Pipe" Exceptions in the Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log).
    2017-06-09 00:00:25,886 org:myorg env:test api:myapi-v1 rev:23 messageid:rrt-1-11193-11467656-1  NIOThread@1 INFO  HTTP.SERVICE - ExceptionHandler.handleException() : Exception java.io.IOException: Broken pipe occurred while writing to channel ClientOutputChannel(ClientChannel[A:XX.XX.XX.XX:8998 Remote host:YY.YY.YY.YY:51400]@23751 useCount=1 bytesRead=0 bytesWritten=486 age=330465ms  lastIO=0ms )
    2017-06-09 00:00:25,887  org:myorg env:test api:myapi-v1 rev:23 messageid:rrt-1-11193-11467656-1  NIOThread@1 INFO  HTTP.SERVICE - ExceptionHandler.handleException() : Exception trace:
    java.io.IOException: Broken pipe
            at com.apigee.nio.channels.ClientOutputChannel.writePending(ClientOutputChannel.java:51) ~[nio-1.0.0.jar:na]
            at com.apigee.nio.channels.OutputChannel.onWrite(OutputChannel.java:116) ~[nio-1.0.0.jar:na]
            at com.apigee.nio.channels.OutputChannel.write(OutputChannel.java:81) ~[nio-1.0.0.jar:na]
             … <snipped>
    
    
  4. After the Router times out, it closes the connection with the Message Processor. When the Message Processor completes its processing, it attempts to write the response to the Router. Since the connection to the Router is already closed, you get the Broken Pipe exception on the Message Processor.
  5. This exception is expected under the circumstances explained above. So the actual cause for the 504 Gateway Timeout error is still that the backend server takes a long time to respond and you need to address that issue.

Resolution

  1. If it's your custom backend server then:
    1. Check the backend server to determine why it is taking more than 57 seconds and see if it can be fixed/optimized to respond faster.
    2. If it is not possible to fix/optimize the backend server or if you know that the backend server will take a long time, then increase the timeout value on router and Message Processor.

      Idea: Set the timeout value on the different components in the following order:

      Timeout on Client > Timeout on Router > Timeout on Message Processor > Timeout within API Proxy

  2. If it's a NodeJS backend, then:
    1. Check if the NodeJS code makes calls to any other backend server(s) and if that's taking a long time to return. Check why those backend server(s) is taking longer time.
    2. Check if the Message Processor(s) is experiencing high CPU or memory usage:
      1. If a Message Processor is experiencing high CPU usage, then generate three thread dumps every 30 seconds using the following command:
        <JAVA_HOME>/bin/jstack -l <pid> > <filename>
        
      2. If a Message Processor is experiencing high memory usage, then generate a heap dump using the following command:
        sudo -u apigee <JAVA_HOME>/bin/jmap -dump:live,format=b,file=<filename> <pid>
        
      3. Restart the Message Processor using the below command. This should bring down the CPU and Memory:
        /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
        
      4. Monitor the API calls to confirm if the problem still exists.
      5. Contact Apigee Support and provide the thread dumps, heap dump, and Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log)to help them investigate the cause for the high CPU/memory usage.

Increase the timeout value on Router and Message Processor

Choose the timeout values to be set on the Router and Message Processor carefully depending on your requirements. Don't set arbitrarily large timeout values. If you need assistance, contact Apigee Support.

Router

chown apigee:apigee /opt/apigee/customer/application/router.properties
  1. Create the /opt/apigee/customer/application/router.properties file on the Router machine, if it does not already exist.
  2. Add the following line to this file:
    conf_load_balancing_load.balancing.driver.proxy.read.timeout=<time in seconds>
    

    For example, if you want to set the timeout value of 120 seconds, then set it as follows:

    conf_load_balancing_load.balancing.driver.proxy.read.timeout=120
    
  3. Ensure this file is owned by apigee:
  4. Restart the router:
    /opt/apigee/apigee-service/bin/apigee-service edge-router restart
    
  5. If you have more than one router, repeat the above steps on all the routers.

Message Processor

  1. Create /opt/apigee/customer/application/message-processor.properties file on the Message Processor machine, if it does not already exist.
  2. Add the following line to this file:
    conf_http_HTTPTransport.io.timeout.millis=<time in milliseconds>
    

    For example, if you want to set the timeout value of 120 seconds, then set it as follows:

    conf_http_HTTPTransport.io.timeout.millis=120000
    
  3. Ensure this file is owned by apigee:
    chown apigee:apigee /opt/apigee/customer/application/message-processor.properties 
    
  4. Restart the Message Processor:
    /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
      
    
  5. If you have more than one Message Processor, repeat the above steps on all the Message Processors.

Idea: Set the timeout value on the different components in the following order:

Timeout on Client > Timeout on Router > Timeout on Message Processor > Timeout within API Proxy

Slow API Request Processing by Edge

If Edge is very slow and/or taking a long time to process the API request, then you will get a 504 Gateway Timeout error.

Diagnosis

  1. Trace the affected API in Edge UI.
  2. Either wait for the error to occur or if you have the API call, then make some API calls and reproduce the 504 Gateway Timeout Error.
  3. Note, in this case, you may see a successful response in the Trace.
    1. The Router/client times out as the Message Processor does not respond back within the specified timeout period on the Router/client (whichever has the lowest time out period). However, the Message Processor continues to process the request and may complete successfully.
    2. In addition, the HTTPTransport.io.timeout.millis value set on the Message Processor triggers only if the Message Processor communicates with a HTTP/HTTPS backend server. In other words, this timeout will not get triggered when any policy (other than Service Callout policy) within API Proxy is taking a long time.
  4. After the error has occurred, examine the specific request that has the longest elapsed time.
  5. Check the elapsed time at each phase and make a note of the phase where the most time is spent.
  6. If you observe the longest elapsed time in any of the policies other than the Service Callout policy, then that indicates that Edge is taking a long time to process the request.
  7. Here's a sample UI trace showing very high elapsed time on JavaScript Policy:

  8. In the above example, you notice that the JavaScript policy takes an abnormally long amount of time of ~ 245 seconds.

Resolution

  1. Check if the policy that took a long time to respond and if there is any custom code that might require a long time to process. If there is any such code, then see if you can fix/optimize the identified code.
  2. If there is no custom code that might cause high processing time, then check if the Message Processor(s) is experiencing high CPU or memory usage:
    1. If any Message Processor is experiencing high CPU usage, then generate three thread dumps every 30 seconds using the following command:
      <JAVA_HOME>/bin/jstack -l <pid> > <filename>
            
      
    2. If any Message Processor is having high Memory usage, then generate a heap dump using the following command:
      sudo -u apigee <JAVA_HOME>/bin/jmap -dump:live,format=b,file=<filename> <pid>
      
    3. Restart the Message Processor using the below command. This should bring down the CPU and Memory.
      /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
      
    4. Monitor the API calls and confirm if the problem still exists.
    5. Contact Apigee Support and provide the thread dumps, heap dump, and Message Processor logs (/opt/apigee/var/log/edge-message-processor/logs/system.log)to help them investigate the cause for the high CPU/memory usage.

Diagnose issues using API Monitoring

Note: The steps in this section can be performed by Public Cloud users only.

API Monitoring enables you to isolate problem areas quickly to diagnose error, performance, and latency issues and their source, such as developer apps, API proxies, backend targets, or the API platform.

Step through a sample scenario that demonstrates how to troubleshoot 5xx issues with your APIs using API Monitoring. For example, you may want to set up an alert to be notified when the number of 504 status codes exceeds a particular threshold.