Timeout error

Symptom

Deployment of API proxy revisions through the Edge UI or management API fails with a timeout error.

Error Messages

Click to change deployment status. 
The revision is deployed and traffic can flow, but flow may be impaired. 
Error: Call timed out; either server is down or server is not reachable

Possible Causes

The typical causes for this issue are:

Cause Details For
Network Connectivity Issue Communication failure between Management Server and Message Processor due to network connectivity issues or firewall rules. Private Cloud Users only
Large API Proxy Bundle Message Processor may take a long time to activate if the API proxy bundle is large in size, leading to RPC timeouts. Private and Public Cloud Users

Network Connectivity Issue

Note: Only Edge Private Cloud users can perform the following steps. If you are on Edge Public Cloud, contact Apigee Support.

Diagnosis

  1. Get the deployment status for the specific API that shows the error by using the following management API call:
    curl -v http://<management-server-IPaddress>:<port#>/organizations/<orgname>/environments/<envname>/apis/<apiname>/deployments -u <username>
    

    Sample output showing the error:

    { 
                    "error": "Call timed out; either server is down or server is not reachable", 
                    "status": "error", 
                    "type": [ 
                    "message-processor" 
                    ], 
                    "uUID": "ebbc1078-cbde-4a00-a7db-66a3c1b2b748" 
                    }, 
                    { 
                    "status": "deployed", 
                    "type": [ 
                    "message-processor" 
                    ], 
                    "uUID": "204e2b7e-52f7-46d9-b458-20f9bfb51e6d" 
                    }, 
                    { 
                    "status": "deployed", 
                    "type": [ 
                    "router" 
                    ], 
                    "uUID": "967e63c6-ee95-47c0-9608-f4a32638fb1e" 
                    }, 
                    { 
                    "status": "deployed", 
                    "type": [ 
                    "router" 
                    ], 
                    "state" : "error"
                    } 
    

    The above sample output shows that the error occurred on one of the Message Processors having the UUID "ebbc1078-cbde-4a00-a7db-66a3c1b2b748".

  2. Based on the deployment status output for your API proxy, login to each of the Message Processors with the corresponding UUID that showed the error and perform the following steps:
    1. Check if the Message Processor is listening on the port 4528:
      netstat -an | grep LISTEN | grep 4528
      

      If the Message Processor is not listening on port 4528, then restart the Message Processor:

      /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
      
    2. Re-check the deployment status of the API proxy by using the management API call shown in step #1 above. If there no errors, then that indicates the issue is resolved.
  3. If the problem persists, test the connectivity from Management Server to the Message Processor on port 4528 using the following steps:
    1. If telnet is available, then use telnet:
      telnet <MessageProcessor_IP> 4528
      
    2. If telnet is not available, use netcat to check the connectivity as follows:
      nc -vz <MessageProcessor_IP> 4528
      
    3. If you get the response "Connection Refused" or "Connection timed out", then engage your network operations team.
  4. Test the connectivity from the Message Processor to the Management Server on port 4526 using the following steps:
    1. If telnet is available, then use telnet:
      telnet <management-server-IP> 4526
      
    2. If telnet is not available, use netcat to check the connectivity as follows:
      nc -vz <management-server-IP> 4526 
      
    3. If you get the response "Connection Refused" or "Connection timed out", engage your network operations team.
  5. Work with your network operations team and do the following:
    1. Ensure RPC protocol is allowed on both the Management Server and Message Processor.
    2. Remove any firewall restrictions or security rules setup between the Management Servers and Message Processors to allow connectivity to port 4526 on the management server, and connectivity from Management Server to Message Processors on port 4528.
  6. Re-check the deployment status (refer to step #1 above). If you don't see any errors, then it indicates the error is resolved.
  7. If the issue persists, check if there is network issue on the Message Processor. If there's a network issue, restarting the specific Message Processor that shows the timeout error (as per the deployment status output) may fix the issue:
    /opt/apigee/apigee-service/bin/apigee-service edge-message-processor restart
    
  8. If the problem still persists, then check the Management Server logs at: (/opt/apigee/var/log/edge-management-server/logs/system.log).

    Sample Call timed out error from Management Server Log

    2016-05-17 09:29:56,448 org:myorg env:prod qtp281969267-360792 ERROR DISTRIBUTION - RemoteServicesConfigEventHandler.configureServers() : exception for server with uuid e1381db7-d83b-4752-ae04-2de33f07e555 : cause = RPC Error 504: Call timed out communication error = true 
            com.apigee.rpc.RPCException: Call timed out 
            at com.apigee.rpc.impl.AbstractCallerImpl.handleTimeout(AbstractCallerImpl.java:64) ~[rpc-1.0.0.jar:na] 
            at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.handleTimeout(RPCMachineImpl.java:483) ~[rpc-1.0.0.jar:na] 
            at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.access$000(RPCMachineImpl.java:402) ~[rpc-1.0.0.jar:na] 
            at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall$1.run(RPCMachineImpl.java:437) ~[rpc-1.0.0.jar:na] 
            at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:532) ~[netty-all-4.0.0.CR1.jar:na] 
            at io.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:430) ~[netty-all-4.0.0.CR1.jar:na] 
            at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:371) ~[netty-all-4.0.0.CR1.jar:na] 
            at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_79] 
            
    

    If you observe a similar error as shown in the above example, then increase the RPC timeout on the Management Server so that if there's any network slowdown then it should give time for Management Server to connect to the Message Processor.

Resolution

Note: Only Edge Private Cloud users can perform the following steps. If you are on Edge Public Cloud, contact Apigee Support.

  1. Perform the following steps to increase the RPC timeout:
    1. Create the file /opt/apigee/customer/application/management-server.properties the Management Server machine, if it does not already exist.
    2. Add the following line into this file:
      conf_cluster_rpc.connect.timeout=<time in seconds>
      

      The default RPC timeout value is 10 and it is recommended to increase it to 40 seconds. Set it as follows:

      conf_cluster_rpc.connect.timeout=40
      
    3. Ensure this file is owned by apigee:
      chown apigee:apigee /opt/apigee/customer/application/management-server.properties
      
    4. Restart the Management Server:
      /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
      
    5. If you have more than one Management Server, repeat the above steps on all the Management Servers.
    6. Deploy the API proxy in the Edge UI or by using the Edge management API call. If the API proxy gets deployed without any issues, then that indicates the issue is resolved.
  2. If the problem persists, then collect tcpdump command from the Management Server and Message Processor. Enable the tcpdump command on each of the servers and then initiate the deployment of the API Proxy from the UI or using the management API:
    1. Run the below tcpdump command from the Management Server:
      tcpdump -i any -s 0 host <message-processor-IP address> -w <File name>
      
    2. Run the below tcpdump command from the Message Processor:
      tcpdump -i any -s 0 host <management-server-IP address> -w <File name>
      
    3. Contact Apigee Support to get assistance on analyzing the tcpdumps and to troubleshoot the problem further.

Large API Proxy Bundle

Diagnosis

  1. Check the size of the API proxy bundle for which the deployment error is being observed.
  2. If the size is reasonably large (10MB or higher), then it's very likely that Message Processor may need more time to activate the API proxy.
  3. If the API Proxy bundle size is greater than 15 MB, then proceed to API Proxy Bundle larger than 15MB.

Resolution

Note: Only Edge Private Cloud users can perform the following steps. If you are on Edge Public Cloud, contact Apigee Support.

Increase the RPC timeout on the Management Server so that Message Processor has enough time to activate large API proxy bundles. Perform the following steps to increase the RPC timeout value:

  1. Create the /opt/apigee/customer/application/management-server.properties file on the Management Server machine, if it does not already exist.
  2. Add the following line to this file:
    conf_cluster_rpc.connect.timeout=<time in seconds>
    

    The default RPC timeout value is 10 and it is recommended to increase it to 40 seconds. Set it as follows:

    conf_cluster_rpc.connect.timeout=40
    
  3. Ensure this file is owned by apigee:
    chown apigee:apigee /opt/apigee/customer/application/management-server.properties
    
  4. Restart the Management Server:
    /opt/apigee/apigee-service/bin/apigee-service edge-management-server restart
    
  5. If you have more than one Management Server, repeat the above steps on all the Management Servers.

If the problem persists, contact Apigee Support for further assistance.