Wednesday, April 4, 2018

Move from L2 to L3 adjacency - OSPF design sanity check?

We currently have an environment where our primary and secondary data centers are connected via 2x10GB L2 links (stretched VLANs) and I am working through a plan to move to a separate address space for workloads in our secondary data center. The locations are roughly 10 miles apart, so we typically see around 1ms latency between the sites.

Additionally, we currently use OSPF both within the data center (area 0) and between the data center and our various office locations (stub areas, hub and spoke configuration). We are given two VLAN handoffs from our provider (one in the 100 range, one in the 200 range), both of which have the same link cost and whose L3 interfaces exist on our primary datacenter core router.

Our current design looks like this: https://imgur.com/a/nAJiK

We have identified the following issues with the current design:

  • Layer 3 gateways for datacenter VLANs (and point-to-point links for OSPF to branch sites) only exist in one physical location and the loss of the primary data center would affect the secondary as well.

  • There is no DIA circuit or firewall in the secondary data center, so an outage that impacts Internet traffic in the primary datacenter would also affect the workloads in the secondary data center.

  • Traffic tromboning occurs for VMs in the secondary data center if they exist within the same data center but on different subnets

  • Datacenter Interconnect is single point of failure for connectivity between the data center sites.

I'd like to solve those issues with the following configuration changes:

  • Separate address space for both primary and secondary data centers - L3 gateways would exist on equipment local to each respective data center location

  • Move backup (2xx range) VLANs to secondary data center core router, using OSPF priorities to control traffic and DR/BDR election. The backup VLANs would then be routed through our provider's DR router to our secondary site.

  • Utilize BFD to more quickly respond to OSPF neighbor failures and speed up re-convergence

  • Deploy additional HA pair of firewalls in secondary data center

  • Install additional DIA circuit for Internet traffic to/from secondary data center

  • Set up IPSec VPN between primary and secondary firewalls as backup in the event that the data center interconnect fails

A poorly-done quick simplified diagram of what I'm thinking the proposed design would look like: https://imgur.com/a/775YZ

Now, some questions:

  • I've created the data center VLANs in the secondary data center - are there any caveats that anyone can see for just adding those L3 interfaces to the existing area 0 OSPF configuration, traversing the existing L2 link? OSPF is already configured for the secondary data center, but since there is L2 connectivity, direct routes have a lower metric.

  • I have worked with our provider to run an additional 10GB handoff so that we can migrate our backup stub area VLANs one at a time for testing. Since our existing connection is also a simple L2 link, what is to stop all traffic from traversing this L2 link instead of our existing 2x10 link once we turn up the new handoff? I feel like I might be over-thinking this one.

  • Am I missing any glaring failure scenarios or configuration changes in the proposed design?

Disclaimer: I'm a jack of all trades, and I don't consider multi-site OSPF design or networking in general to be my strong suit. We will likely be engaging a VAR to help us with final design and/or implementation of this plan, but I like to do as much of my own legwork as is practical and of course am looking to learn and grow from the experience. Thank you for the feedback, and please let me know if you need any additional context.



No comments:

Post a Comment