Wednesday, July 18, 2018

OSPF over MPLS (with a dash of imposter syndrome)

Hello,
First a little background. I've been working in network engineering for ~11 yrs, 5 in NOC, and 5+ in actual engineering. Prior to that was 10+ years in ISP helpdesk and various IT, PC break/fix, and small network support roles.
Early last year I got laid off from a large national ISP (rhymes with Barter) were I was doing commercial network engineering, mostly building and troubleshooting L2 fiber circuits (and the supporting routers, etc). I was doing well in that job, and felt confident in my knowledge. Company paid to get my JNCIA (just expired).
Late last year I started a new job in a MUCH smaller organization, with 2 main locations and <500 employees. I'm running into some difficulty because now I'm expected to be designing and implementing things that I only worked on peripherally (or not at all) before (full network design, dynamic routing, QoS, voip, Wifi).
Add to this that at my previous job, the team of 5 was extremely supportive and worked well together, always bouncing ideas off each other, helping out, etc. Whereas my new team is me, a smart young guy with excellent schooling but barely 2 yrs experience, another guy with the same or slightly more experience than me, but who hates working with anyone but himself, and my boss who is a good guy but has way too much on his plate. As for the network, I'm not even gonna get into the near complete lack of documentation or even simple port descriptions.

So I was tasked with updating ~50 field sites that connect back to the main office (Site A) via hardware VPN. We upgraded all the sites with SonicWall TZ500s, which VPN back to NSA6600 at the main site. One of the main points of doing this was to build redundancy into the VPN tunnels by setting up secondary tunnels to the 2nd main office(Site B). Such that the remote site will have a primary connection to site A, but if A goes down, it will establish the VPN tunnel to Site B. 

We have this working at the Sonicwall level. The tunnels are all running to Site A now and working MUCH better than their previous implementation.

The problem is that the the internal network is currently running static routes only. When the VPN tunnel changes locations, the route to the remote site is now through different devices. Thus, the internal network won't know the change route unless we change the static routes manually.  

We thought OSPF on the Layer 3 switches (Dell/Force10 S4810's) at Site A & B would solve this. However, the connection between Site A and Site B runs over 2 different MPLS (SDWAN?) connections from 2 ISPs (Verizon & Windstream, both managed by VZ). Because of that, we are handing static routes to the MPLS, which statically point them to the other end of the MPLS, then hand them back to our Site B layer 3 switch. When we put the return routes in, they get bounced back by the MPLS router, because the zero route in the MPLS router points back to our L3 switch. Somehow we missed this during our initial testing phase.

All the field site equipment is in place, but now we can't do the redundancy, so it's a serious roadblock. Even though the new equipment and VPN tunnels are better than what they had, I'm feeling like a bit of a failure.  I'd like to make sure i'm not missing something, and see if I can pull out some kind of solution to maybe save some face. 
(Ideally, I'd love to scrap the MPLS links entirely and go to a flat L2 VPLS solution then put a couple actual routers in so we can do all the routing in house, but that's not gonna happen in less than 2-3 years due to contracts.)

I've been researching several options;

Is it possible to run OSPF virtual links over the MPLS connections? 

Would it work to turn up OSPF between our L3 switches and the provider MPLS routers? (The MPLS's are managed services, so it appears to be an option, although it'll cost more.)
I thought another option might be to create GRE tunnel between the two main office L3 switches. This would eliminate having to work with the MPLS ISP's (very preferable). However, I'm having trouble finding config documentation on how to do that on a Dell S4810.
Would BGP be a better option either with handoff to the ISP or through GRE tunnels?

Thank you for your time. Please let me know if I can clarify anything.



No comments:

Post a Comment