Thursday, January 3, 2019

Strange connection stop

Hello guys,

First, happy New year to you all!

Now about the question, I would like to ask you.

I got some strange problem at the work. To be precise network connection problems. I'll try to explain the situation to you.

We have two infrastructures in the same city but different datacenters. Infrastructures consist of main router, switches, hypervisor and VMs. Let's say both of them are almost identical. There is a task to migrate VMs from one location to another without changing IPs.

For that reason, I have created the GRE tunnel from one router which is Juniper MX960 to another which is Juniper SRX3400. GRE tunnel purpose: The link for a BGP connection the link to transmit traffic from already migrated VMs

BGP connection is used for advertising already migrated IPs to the old location. In a graph below you can see that GRE tunnel is actually set between juniper routers routing instances (for isolation purposes).

So in short main route that we are interested in is VM 1 <--> Hypervisor A <--> MX960(VM-PUB routing instance) <--> SRX3400(RT-PUB2 routing instance) <--> Uplink C

Also, I should mention that GRE tunnel is established like this: MX960(VM-PUB routing instance <--> RT-PUB1 routing instance) <--> Uplink A <--> Uplink C <--> SRX3400(RT-PUB2 routing instance)

Now about the problem. At first, all seemed to be fine, but a few days ago we encountered a problem. When migrated VM (the one that goes via GRE) bigger amount of traffic its connection is stalled. Its example are SSH, HTTP, SCP and so on. With SCP test I see that it always stops at exactly 2112 KB. Meanwhile, when migrated VM sends receives data it's no problem - it could go for gigabytes and all is fine.

I've already checked the MTU and all seems good, on GRE tunnel it is 1476 on the hypervisor - router route it also matches.

I'm thinking that this may be some kind of limitation on SRX3400. We had already problems with it because of asymmetric connections before and had to do a workaround.

Maybe any of you guys have any idea why connection could work like that? Everything is working perfect but as soon as VM 1(migrated) send traffic outside, that connection stops working.

 Uplink A Uplink B Uplink C + + + | | | | | | | | | | | | | | | | | | | | | +----------------------------------+ +--------------------------------+ | MX 960 | | | | SRX 3400 | | | +--+----------+-----+ | | | | | |RT-PUB1 | | | +--------------+------+ | | | | | | |RT-PUB2 | | | +----+--------------+ | | | | | | | lt tunnel | | | | | | +-----------+---------------+ | GRE tunnel | | | | | |VM-PUB +-------------------+ | | | | | | | +--------+------------+ | | +--+------------------------+ | | | | | | | +--------------------------------+ | | | | +----------------------------------+ | | | | | | | | | | | +--------+--------------+ +--------+--------------+ |Hypervisor A | |Hypervisor B | | | | | | | | | | | | | | +--------------+ | | +-------------+ | | |VM 1(migrated)| | | |VM 2 | | | | | | | | | | | | | | | | | | | +--------------+ | | +-------------+ | +-----------------------+ +-----------------------+ 


No comments:

Post a Comment