Monday, August 19, 2019

Help Understanding HTTP(s) Request Latency through NLB to on Premise Server vs Directly through Firewall

I have the following scenarios (we are located in a data center in Chicago):

  1. AWS Network Load Balancer in US-EAST-1 listening on 443, forwarding traffic over port 8443 to an on-premise Nginx reverse proxy listening on 8443 (and 443). We have an AWS direct connection as well.

From my understanding (and verifying using packet captures), traffic goes... client (azure) -> aws-us-east-1-nlb:443 -> internal-reverse-proxy:8443 -> upstream server. response is the same but in reverse

  1. Firewall in our Data Center with static NAT enabled for our internal reverse proxy server for port 443.

traffic goes... client (azure) -> firewall NAT:443 -> internal-reverse-proxy:443 -> upstream server. response is the same but in reverse

I am using https://github.com/apigee/apib as my http(s) benchmarking tool, and the command I am running is:

apib -d 10 -c 50 -t application/json -H "Host: REDACTED" -H "Authorization: REDACTED" -x POST --csv-output --name "REDACTED" https://REDACTED 

Every single time I test, it doesn't matter the location of the client I am testing from, AWS always seems to win, and I just cannot understand how or why.

I used an Azure free-tier account to spin up an Ubuntu 18.04 server in US-SOUTH-CENTRAL location (Dallas, TX), and here were my results:

Client Server Throughput Avg. Latency Threads Connections Duration Completed Successful Errors Sockets Min. latency Max. latency 50% Latency 90% Latency 98% Latency 99% Latency Latency Std Dev
DallasTX FirewallDirect 166.039 594.611 2 100 30.029 4986 4986 0 100 174.797 921.127 592.297 692.909 794.512 855.213 85.196
DallasTX FirewallDirect 185.297 533.783 2 100 30.022 5563 5563 0 100 192.812 997.528 512.197 672.711 779.978 808.828 101.795
DallasTX FirewallDirect 203.451 487.400 2 100 30.027 6109 6109 0 100 173.641 1052.452 477.83 620.199 709.733 747.05 99.43
DallasTX AWS-US-EAST-1-NLB 194.637 508.742 2 100 30.025 5844 5844 0 100 272.026 832.814 492.759 650.240 716.850 732.909 95.55
DallasTX AWS-US-EAST-1-NLB 189.954 519.970 2 100 30.023 5703 5703 0 100 265.800 965.367 504.867 665.015 754.291 818.146 100.296
DallasTX AWS-US-EAST-1-NLB 183.496 541.257 2 100 30.022 5509 5509 0 100 310.584 933.735 522.065 673.488 794.933 835.227 94.65

Can someone help me understand if I am missing something when testing the above? I just do not see how AWS can have request/response latency that is, in some cases, better than making requests to our server directly through our firewall in Chicago when sourcing from a client in Dallas, TX. Is there something I am not understanding or failing to account for in this scenario? I know the firewall now has to process the NAT traffic but that should be failure trivial and we have a pretty powerful firewall fronting our services.

Traffic for using AWS would have to go.. 1. Dallas, TX -> 2. Northern Virginia -> 3. To Chicago -> 4. Over our Direct Connect -> 5. Get processed internally -> 6. Request gets sent out to AWS NLB and sent to client.

Traffic for using NAT on our Firwall would go... 1. Dallas, TX -> 3. Chicago Firewall -> 4. NAT'd to internal proxy and processed -> 5. Sent back out to client

The latency from Dallas to US-EAST-1 is about 29ms (taken from https://www.dotcom-tools.com/internet-backbone-latency.aspx) and the latency from Chicago to US-EAST-1 is about 42ms (taken from https://www.cloudping.info/, I am located in Chicago) giving a total latency of about 71ms in just travel time.

The latency from Dallas to Chicago is close to 39ms (pinging from my VM in Azure to our firewall).

Assuming time to process the data, I should see results where it is about 30-40 milliseconds quicker to use our Firewall directly for NAT instead of AWS in US-EAST-1 for a client located in Dallas.

Additionally, even testing using Azure NORTH-CENTRAL, I get the exact same results as above, the latency is the same for using US-EAST-1 and just using our Firewall directly (for NAT). Ping between Azure NORTH-CENTRAL and our Firewall is just 2 milliseconds, so I would expect to see using our Firewall directly for NAT to be about 70 milliseconds quicker, but its the same as AWS (average request time).

Can anyone help point out if there is something I am missing or not taking into account when doing these tests? I just don't see how my testing is showing AWS to be comparable or better than hitting our Firewall directly for NAT.

Thanks



No comments:

Post a Comment