TLS/SSL Handshake Failures

Symptom

A TLS/SSL handshake failure occurs when a client and server cannot establish communication using the TLS/SSL protocol. When this error occurs in Apigee Edge, the client application receives an HTTP status 503 with the message Service Unavailable. You see this error following any API call where an TLS/SSL handshake failure occurs.

Error Messages

HTTP/1.1 503 Service Unavailable

You can also see this error message when an TLS/SSL handshake failure occurs:

Received fatal alert: handshake_failure

Possible causes

TLS (Transport Layer Security, whose predecessor is SSL) is the standard security technology for establishing an encrypted link between a web server and a web client, such as a browser or an app. A handshake is a process that enables the TLS/SSL client and server to establish a set of secret keys with which they can communicate. During this process, the client and server:

  1. Agree on the version of the protocol to use.
  2. Select the cryptographic algorithm to be used.
  3. Authenticate each other by exchanging and validating digital certificates.

If the TLS/SSL handshake succeeds, then the TLS/SSL client and server transfer data to each other securely. Otherwise, if a TLS/SSL handshake failure occurs the connection is terminated and the client receives a 503 Service Unavailable error.

Possible causes for TLS/SSL handshake failures are:

Cause Description Who can perform the troubleshooting steps
Protocol mismatch The protocol used by the client is not supported by the server. Private and Public Cloud users
Cipher Suite mismatch The cipher suite used by the client is not supported by the server. Private and Public Cloud users
Incorrect Certificate The hostname in the URL used by the client does not match the hostname in the certificate stored at the server end. Private and Public Cloud users
An incomplete or invalid certificate chain is stored at the client or server end. Private and Public Cloud users
An incorrect or expired certificate is sent by the client to the server or from the server to the client. Private and Public Cloud users
SNI Enabled Server The backend server is Server Name Indication (SNI) enabled; however, the client cannot communicate with the SNI servers. Private Cloud users only

Protocol Mismatch

A TLS/SSL handshake failure occurs if the protocol used by the client is not supported by the server either at the incoming (northbound) or outgoing (southbound) connection. See also Understanding northbound and southbound connections.

Diagnosis

  1. Determine whether the error occurred at the northbound or southbound connection. For further guidance on making this determination, see Determining the source of the problem.
  2. Run the tcpdump utility to gather further information:
    • If you are a Private Cloud user, then you can collect the tcpdump data at the relevant client or server. A client can be the client app (for incoming, or northbound connections) or the Message Processor (for outgoing, or southbound connections). A server can be the Edge Router (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections) based on your determination from Step 1.
    • If you are a Public Cloud user, then you can collect the tcpdump data only on the client app (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections), because you do not have access to the Edge Router or Message Processor.
    tcpdump -i any -s 0 host IP address -w File name
    
    See tcpdump data for more information on using the tcpdump command.
  3. Analyze the tcpdump data using the Wireshark tool or a similar tool.
  4. Here's a sample analysis of the tcpdump using Wireshark:
    • In this example, the TLS/SSL handshake failure occurred between the Message Processor and the backend server (the outgoing, or southbound connection).
    • Message #4 in the tcpdump output below shows that the Message Processor (Source) sent a "Client Hello" message to the backend server (Destination).

    • If you select the Client Hello message, it shows that the Message Processor is using the TLSv1.2 protocol, as shown below:

    • Message #5 shows that the backend server acknowledges the "Client Hello" message from the Message Processor.
    • The backend server immediately sends Fatal Alert : Close Notify to the Message Processor (message #6). This means the TLS/SSL Handshake failed and the connection will be closed.
    • Looking further into message #6 shows that cause of TLS/SSL handshake failure is that the backend server supports only TLSv1.0 protocol as shown below:

    • Because there is a mismatch between the protocol used by the Message Processor and the backend server, the backend server sent the message: Fatal Alert Message: Close Notify.

Resolution

The Message Processor runs on Java 8 and uses TLSv1.2 protocol by default. If the backend server does not support the TLSv1.2 protocol, then you can take one of the following steps to resolve this issue:

  1. Upgrade your backend server to support the TLSv1.2 protocol. This is a recommended solution as the protocol TLSv1.2 is more secure.
  2. If you are unable to upgrade your backend server immediately for some reason, then you can force the Message Processor to use the TLSv1.0 protocol to communicate with the backend server by following these steps:
    1. If you did not specify a target server in the proxy's TargetEndpoint definition, then set the Protocol element to TLSv1.0 as shown below:
      <TargetEndpoint name="default">
       …
       <HTTPTargetConnection>
         <SSLInfo>
             <Enabled>true</Enabled>
             <Protocols>
                 <Protocol>TLSv1.0<Protocol>
             </Protocols>
         </SSLInfo>
         <URL>https://myservice.com</URL>
       </HTTPTargetConnection>
       …
      </TargetEndpoint>
      
    2. If you configured a target server for your proxy, then use this management API to set the protocol to TLSv1.0 in the specific target server configuration.

Cipher Mismatch

You can see a TLS/SSL handshake failure if the cipher suite algorithm used by the client is not supported by the server either at the incoming (northbound) or outgoing (southbound) connection in Apigee Edge. See also Understanding northbound and southbound connections.

Diagnosis

  1. Determine whether the error occurred at the northbound or southbound connection. For further guidance on making this determination, see Determining the source of the problem.
  2. Run the tcpdump utility to gather further information:
    • If you are a Private Cloud user, then you can collect the tcpdump data at the relevant client or server. A client can be the client app (for incoming, or northbound connections) or the Message Processor (for outgoing, or southbound connections). A server can be the Edge Router (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections) based on your determination from Step 1.
    • If you are a Public Cloud user, then you can collect the tcpdump data only on the client app (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections), because you do not have access to the Edge Router or Message Processor.
    tcpdump -i any -s 0 host IP address -w File name
    
    See tcpdump data for more information on using the tcpdump command.
  3. Analyse the tcpdump data using the Wireshark tool or any other tool that you are familiar with.
  4. Here's the sample analysis of the tcpdump output using Wireshark:
    • In this example, the TLS/SSL Handshake failure occurred between the Client application and Edge router (northbound connection). The tcpdump output was collected on the Edge router.
    • The message #4 in the tcpdump output below shows that the client application (source) sent a "Client Hello" message to the Edge Router (destination).

    • Selecting the Client Hello message shows that the client application is using the TLSv1.2 protocol.

    • Message #5 shows that the Edge Router acknowledges the "Client Hello" message from the client application.
    • The Edge router immediately sends a Fatal Alert : Handshake Failure to the client application (message #6). This means the TLS/SSL handshake failed and the connection will be closed.
    • Looking further into message #6 shows the following information:
      • The Edge Router supports TLSv1.2 protocol. This means that the protocol matches between the client application and the Edge Router.
      • However, the Edge router still sends the Fatal Alert: Handshake Failure to the client application as shown in the screenshot below:

    • The error could be the result of one of the following issues:
      • The client application is not using the cipher suite algorithms supported by the Edge Router.
      • The Edge Router is SNI-enabled, but the client application is not sending the server name.
    • Message #4 in the tcpdump output lists the cipher suite algorithms supported by the client application, as shown below:

    • The list of cipher suite algorithms supported by the Edge Router are listed in the /opt/nginx/conf.d/0-default.conf file. In this example, the Edge Router supports only the High Encryption cipher suite algorithms.
    • The client application does not use any of the High Encryption cipher suite algorithms. This mismatch is the cause of the TLS/SSL handshake failure.
    • Because the Edge Router is SNI-enabled, scroll down to message #4 in the tcpdump output and confirm that the client application is sending the server name correctly, as shown in the figure below:


    • If this name is valid, you can infer that the TLS/SSL handshake failure has occurred because the cipher suite algorithmss used by the client application are not supported by the Edge Router.

Resolution

You must ensure that the client uses the cipher suite algorithms that are supported by the server. To solve the issue described in the previous Diagnosis section, download and install the Java Cryptography Extension (JCE) package and include it in the Java installation to support High Encryption cipher suite algorithms.

Incorrect Certificate

A TLS/SSL handshake failure occurs if you have incorrect certificates in the keystore/truststore, either at the incoming (northbound) or outgoing (southbound) connection in Apigee Edge. See also Understanding northbound and southbound connections.

If the problem is northbound, then you may see different error messages depending on the underlying cause.

The following sections list example error messages and the steps to diagnose and resolve this issue.

Error messages

You might see different error messages depending on the cause of the TLS/SSL handshake failure. Here's a sample error message that you might see when you call an API proxy:

* SSL certificate problem: Invalid certificate chain
* Closing connection 0
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html

Possible causes

The typical causes for this issue are:

Cause Description Who can perform the troubleshooting steps
Hostname Mismatch The hostname used in the URL and the certificate in the keystore of the router does not match. For example, a mismatch occurs if the host name used in the URL is myorg.domain.com while the certificate has the hostname in its CN as CN=something.domain.com.

Edge Private and Public Cloud users
Incomplete or Incorrect certificate chain The certificate chain is not complete or not correct. Edge Private and Public Cloud users only
Expired or unknown certificate sent by the server or client An expired or unknown certificate is sent by the server or client either at the northbound or at the southbound connection. Edge Private Cloud and Edge Public Cloud users

Hostname Mismatch

Diagnosis

  1. Note the hostname used in the URL returned by the following Edge management API call:
    curl -v https://myorg.domain.com/v1/getinfo
    For example:
    curl -v https://api.enterprise.apigee.com/v1/getinfo
  2. Get the CN used in the certificate stored in the specific keystore. You can use the following Edge management APIs to get the details of the certificate:
    1. Get the certificate name in the keystore:

      If you are a Private Cloud user, use the Management API as follows:
      curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs
      If you are a Public Cloud user, use the Management API as follows:
      curl -v https://api.enterprise.apigee.com/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs
      
    2. Get the details of the certificate in the keystore using the Edge management API.

      If you are a Private Cloud user:
      curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs/cert-name
      
      If you are a Public Cloud user:
      curl -v https://api.enterprise.apigee.com/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs/cert-name
      

      Sample cert::

      "certInfo": [
          {
            "basicConstraints": "CA:FALSE",
            "expiryDate": 1456258950000,
            "isValid": "No",
            "issuer": "SERIALNUMBER=07969287, CN=Go Daddy Secure Certification Authority, OU=http://certificates.godaddy.com/repository, O=\"GoDaddy.com, Inc.\", L=Scottsdale, ST=Arizona, C=US",
            "publicKey": "RSA Public Key, 2048 bits",
            "serialNumber": "07:bc:a7:39:03:f1:56",
            "sigAlgName": "SHA1withRSA",
            "subject": "CN=something.domain.com, OU=Domain Control Validated, O=something.domain.com",
            "validFrom": 1358287055000,
            "version": 3
          },
      

      The subject name in the primary certificate has the CN as something.domain.com.

      Because the hostname used in the API request URL (refer to step#1 above) and the subject name in the certificate don't match, you get the TLS/SSL handshake failure.

Resolution

This issue can be resolved in one of the following two ways:

  • Obtain a certificate (if you don't have one already) where the subject CN has a wildcard certificate, then upload the new complete certificate chain to the keystore. For example:
    "subject": "CN=*.domain.com, OU=Domain Control Validated, O=*.domain.com",
  • Obtain a certificate (if you don't have one already) with an existing subject CN, but use your-org.your-domain as a subject alternative name, then upload the complete certificate chain to the keystore.

References

Keystores and Truststores

Incomplete or incorrect certificate chain

Diagnosis

  1. Get the CN used in the certificate stored in the specific keystore. You can use the following Edge management APIs to get the details of the certificate:
    1. Get the certificate name in the keystore:

      If you are a Private Cloud user:
      curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs
      
      If you are a Public Cloud user:
      curl -v https://api.enterprise.apigee.com/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs
      
    2. Get the details of the certificate in the keystore:

      If you are a Private Cloud user:
      curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs/cert-name
      
      If you are a Public Cloud user:
      curl -v https://api.enterprise.apigee.com/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs/cert-name
      
    3. Validate the certificate and its chain and verify that it adheres to the guidelines provided in the article How certificate chains work to ensure it's a valid and complete certificate chain. If the certificate chain stored in the keystore is either incomplete or invalid, then you see the TLS/SSL handshake failure.
    4. The following grahpic shows a sample certificate with an invalid certificate chain, where the intermediate and root certificates do not match:
    5. Sample intermediate and root certificate where issuer and subject do not match


Resolution

  1. Obtain a certificate (if you don't have one already) that includes a complete and valid certificate chain.
  2. Run the following openssl command to verify that the certificate chain is correct and complete:
    openssl verify -CAfile root-cert -untrusted intermediate-cert main-cert
  3. Upload the validated certificate chain to the keystore.

Expired or unknown certificate sent by the server or client

If an incorrect/expired certificate is sent by the server/client either at the northbound or at the southbound connection, then the other end (server/client) rejects the certificate leading to a TLS/SSL handshake failure.

Diagnosis

  1. Determine whether the error occurred at the northbound or southbound connection. For further guidance on making this determination, see Determining the source of the problem.
  2. Run the tcpdump utility to gather further information:
    • If you are a Private Cloud user, then you can collect the tcpdump data at the relevant client or server. A client can be the client app (for incoming, or northbound connections) or the Message Processor (for outgoing, or southbound connections). A server can be the Edge Router (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections) based on your determination from Step 1.
    • If you are a Public Cloud user, then you can collect the tcpdump data only on the client app (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections), because you do not have access to the Edge Router or Message Processor.
    tcpdump -i any -s 0 host IP address -w File name
    
    See tcpdump data for more information on using the tcpdump command.
  3. Analyze the tcpdump data using Wireshark or a similar tool.
  4. From the tcpdump output, determine the host (client or server) that is rejecting the certificate during the verification step.
  5. You can retrieve the certificate sent from the other end from the tcpdump output, provided the data is not encrypted. This will be useful to compare if this certificate matches with the certificate available in the truststore.
  6. Review the sample tcpdump for the SSL communication between the Message Processor and the backend server.

    Sample tcpdump showing Certificate Unknown error


    1. The Message Processor (client) sends "Client Hello" to the backend server (server) in message #59.
    2. The backend server sends "Server Hello" to the Message Processor in message #61.
    3. They mutually validate the protocol and cipher suite algorithms used.
    4. The backend server sends the Certificate and Server Hello Done message to the Message Processor in message #68.
    5. The Message Processor sends the Fatal Alert "Description: Certificate Unknown" in message #70.
    6. Looking further into message #70, there are no additional details details other than alert message as shown below:


    7. Review message #68 to get the details about the certificate sent by the backend server, as shown in the following graphic:

    8. The backend server's certificate and its complete chain are all available underneath the "Certificates" section, as shown in the above figure.
  7. If the certificate is found to be unknown either by the Router (northbound) or the Message Processor (southbound) as in the example illustrated above, then follow these steps:
    1. Get the certificate and its chain that is stored in the specific truststore. (Refer to the virtual host configuration for the Router and target endpoint configuration for the Message Processor). You can use the following APIs to get the details of the certificate:
      1. Get the certificate name in the truststore:
        curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/truststore-name/certs
      2. Get the details of the certificate in the truststore:
        curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/truststore-name/certs/cert-name
    2. Check if the certificate stored in the truststore of the Router (northbound) or Message Processor (southbound) matches with the certificate that is stored in the keystore of the client application (northbound) or target server (southbound), or the one that is obtained from the tcpdump output. If there's a mismatch, then that's the cause for the TLS/SSL handshake failure.
  8. If the certificate is found to be unknown either by the client application (northbound) or the target server (southbound), then follow these steps:
    1. Get the complete certificate chain used in the certificate stored in the specific keystore. (Refer to the virtual host configuration for the Router and target endpoint configuration for the Message Processor.) You can use the following APIs to get the details of the certificate:
      1. Get the certificate name in the keystore:
        curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs
      2. Get the details of the certificate in the keystore:
        curl -v https://management-server-ip:port/v1/organizations/org-name/environments/env-name/keystores/keystore-name/certs/cert-name
        
    2. Check if the certificate stored in the keystore of the Router (northbound) or Message Processor (southbound) matches the certificate stored in the truststore of the client application (northbound) or target server (southbound), or the one that is obtained from the tcpdump output. If there's a mismatch, then that's the cause for the SSL handshake failure.
  9. If the certificate sent by a server/client is found to be expired then the receiving client/server rejects the certificate and you see the following alert message in the tcpdump:

    Alert (Level: Fatal, Description: Certificate expired)

  10. Verify that the certificate in the keystore of the appropriate host is expired.

Resolution

To resolve the issue identified in the example above, upload the valid backend server's certificate to the trustore on the Message Processor.

The following table summarizes the steps to resolve the issue depending on the cause of the problem.

Cause Description Resolution
Expired Certificate NorthBound
  • Certificate stored on the keystore of the router is expired.
  • Certificate stored on the keystore of the client application is expired (2-way SSL).
Upload a new certificate and its complete chain to the keystore on the appropriate host.
SouthBound
  • Certificate stored on the keystore of the Target Server is expired.
  • Certificate stored on the keystore of the Message Processor is expired (2-way SSL).
Upload a new certificate and its complete chain to the keystore on the appropriate host.
Unknown Certificate NorthBound
  • Certificate stored on the truststore of the client application does not match the Router's certificate.
  • Certificate stored on the truststore of the router does not match the client application's certificate (2 way SSL).
Upload the valid certificate to the truststore on the appropriate host.
SouthBound
  • Certificate stored on the truststore of the target server does not match the Message Processor's certificate.
  • Certificate stored on the truststore of the Message Processor does not match the target server's certificate (2-way SSL).
Upload the valid certificate to the truststore on the appropriate host.

SNI Enabled Server

The TLS/SSL handshake failure can occur when the client is communicating with a Server Name Indication (SNI) Enabled Server, but the client is not SNI enabled. This could happen either at the northbound or the southbound connection in Edge.

First, you need to identify the hostname and port number of the server being used and check if it is SNI enabled or not.

Identification of SNI enabled server

  1. Execute the openssl command and try to connect to the relevant server hostname (Edge Router or backend server) without passing the server name, as shown below:
    openssl s_client -connect hostname:port
    
    You may get the certificates and sometimes you may observe the handshake failure in the openssl command, as shown below:
    CONNECTED(00000003)
    9362:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-64.50.6/src/ssl/s23_clnt.c:593
    
  2. Execute the the openssl command and try to connect to the relevant server hostname (Edge router or backend server) by passing the server name as shown below:
    openssl s_client -connect hostname:port -servername hostname
    
  3. If you get a handshake failure in step #1 or get different certificates in step #1 and step #2, then it indicates that the specified Server is SNI enabled.

Once you've identified that the server is SNI enabled, you can follow the steps below to check if the TLS/SSL handshake failure is caused by the client not being able to communicate with the SNI server.

Diagnosis

  1. Determine whether the error occurred at the northbound or southbound connection. For further guidance on making this determination, see Determining the source of the problem.
  2. Run the tcpdump utility to gather further information:
    • If you are a Private Cloud user, then you can collect the tcpdump data at the relevant client or server. A client can be the client app (for incoming, or northbound connections) or the Message Processor (for outgoing, or southbound connections). A server can be the Edge Router (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections) based on your determination from Step 1.
    • If you are a Public Cloud user, then you can collect the tcpdump data only on the client app (for incoming, or northbound connections) or the backend server (for outgoing, or southbound connections), because you do not have access to the Edge Router or Message Processor.
    tcpdump -i any -s 0 host IP address -w File name
    
    See tcpdump data for more information on using the tcpdump command.
  3. Analyze the tcpdump output using Wireshark or a similar tool.
  4. Here's the sample analysis of tcpdump using Wireshark:
    1. In this example, the TLS/SSL handshake failure occurred between the Edge Message Processor and backend server (southbound connection).
    2. The message #4 in the tcpdump output below shows that the Message Processor (source) sent a "Client Hello" message to the backend server (destination).

    3. Selecting the "Client Hello" message shows that the Message Processor is using the TLSv1.2 protocol.

    4. The message #4 shows that the backend server acknowledges the "Client Hello" message from the Message Processor.
    5. The backend server immediately sends a Fatal Alert : Handshake Failure to the Message Processor (message #5). This means the TLS/SSL handshake failed and the connection will be closed.
    6. Review message #6 to discover the following information
      • The backend server does support TLSv1.2 protocol. This means that the protocol matched between the Message Processor and the backend server.
      • However, the backend server still sends the Fatal Alert: Handshake Failure to the Message Processor as shown in the figure below:

    7. This error might occur for one of the following reasons:
      • The Message Processor is not using the cipher suite algorithms supported by the backend server.
      • The backend server is SNI enabled, but the client application is not sending the server name.
    8. Review the message #3 (Client Hello) in the tcpdump output in more detail. Note that the Extension: server_name is missing, as shown below:

    9. This confirms that the Message Processor did not send the server_name to the SNI-enabled backend server.
    10. This is the cause for the TLS/SSL handshake failure and the reason that the backend server sends the Fatal Alert: Handshake Failure to the Message Processor.
  5. Verify that the jsse.enableSNIExtension property in system.properties is set to false on the Message Processor to confirm that the Message Processor is not enabled to communicate with the SNI-enabled server.

Resolution

Enable the Message Processor(s) to communicate with SNI enabled servers by performing the following steps:

  1. Create the/opt/apigee/customer/application/message-processor.properties file (if it does not exist already).
  2. Add the following line into this file: conf_system_jsse.enableSNIExtension=true
  3. Chown the owner of this file to apigee:apigee:
    chown apigee:apigee /opt/apigee/customer/application/message-processor.properties
  4. Restart the Message Processor.
    /opt/apigee/apigee-service/bin/apigee-service message-processor restart
  5. If you have more than one Message Processor, repeat the steps #1 through #4 on all the Message Processors.

If you are unable to determine the cause for TLS/SSL Handshake failure and fix the issue or you need any further assistance, contact Apigee Support. Share the complete details about the issue along with the tcpdump output.