Saturday, September 18, 2021

TLS Handshake Failing - Changing IP address.

Having a difficult issue with SSL handshakes on a client-server TLS connection over 443 we are trying to troubleshoot. We have a client PC on a customer's network trying to connect over the internet to a cloud hosted server. We have good access to the cloud solutions' tech support and somewhat indirect access to the customer's firewall vendor that is managing the infrastructure getting our PC out to the internet to connect to the cloud server. We had this PC, via an agent service, connecting fine to the cloud solution over port 443 previously. The customer required that we change the local IP on the client PC decrementing the last octet (we went from .251 to .250). It is not an option at this point to change the IP back or to anything else for reasons I can't go into. The agent no longer connects and here is what we know:

  • The failure with .250 occurs after the client sends Client Hello and the server responds with an ACK but the Server Hello never comes following that. The Server sends an RST, ACK to close the socket after 15+ second timeout, I believe because it next expected a Client Key Exchange from the client following the Server Hello, but the Server Hello never gets to the client, so it of course never sends the Client Key Exchange. When we temporarily go back to .251 (or try another IP in the subnet that is free) this key exchange happens flawlessly every time.
  • We tested via OpenSSL this key exchange to our server, and it behaved as above when using the agent. We also tested key exchange with OpenSSL to google's 8.8.8.8:443 and it behaved exactly the same. .250 failed like above and .251 worked (and .252 worked too).
  • When it fails over .250 the protocol shown in wireshark is TLSv1, when it works on .251 the protocol is TLSv1.2 (we are not doing anything differently except changing the local ip). This may be a quirk/feature in wireshark as i'm seeing similar structures to the packets for the record layer and the hanshake protocol in both cases but in the "protocol" column in wire shark is it choosing to display TLSv1 and TLSv1.2 respectively for the packets in spite of this. So this may be nothing, it may be because the actual TLS version is declared in the Server Hello and that is never happening for .250 so it shows TLSv1.
  • The firewall/networking vendor for the customer has confirmed that they are not doing any SSL inspection and that the rules are the same for all IPs in this subnet. We've asked this multiple times at this point. This is out of our direct control and the area most suspect at this point IMO, but they are growing tired of our prodding.
  • If I traceroute to our server on .250, none of the hops reply via ICMP after the default gateway. If i do a traceroute to our server on .251, all of the hops reply via ICMP from the default gateway all the way to the cloud server. Again, hardly seems like these two IPs have the same rules. The IP is the only thing changing between the two tracert tests).

Would love some insight/encourage to focus our efforts on the firewall vendor, or identify any other avenues of attack in our troubleshooting/isolation.



No comments:

Post a Comment