Friday, December 20, 2019

War story: Camera system wants to route traffic to default gateway despite having a directly connected secondary interface.

Thought I'd tell y'all a story about a problem that had me stumped for a few days. That and allow google to index it to save someone else this experience.

I get put on a project to help a customer migrate from the internal IP address scheme based on 192.0.2.0/24 to a more appropriate RFC1918 compliant subnet. Step one is to clear the low-hanging fruit: We're moving the IP surveillance cameras from 192.0.25.0/24 to 10.1.25.0/24.

So the Exacqvision server (Windows 10 appliance) has two interfaces, one for management and access (192.0.2.20/24 on vlan 1), one just to talk to cameras (192.0.25.3 on vlan25). I create the new 10.1.25.0 network on the core router (which is a SonicWall firewall but hey, thats the way small offices with Netgear switching roll) and attach it to vlan 25. Move the first camera, update its IP in exacqvision, and all is well. So I move the rest of the cameras. Check the firewall and of course its routing 150mbit of camera traffic, so I switch the IP of the vlan 25 NIC to its new IP addresses and... nothing happens. 150mbit still going through the firewall.

I disable and reenable the cameras, no dice. Reboot, no dice. netstat confirms that exacqvision is using the 192.0.2.20 NIC to route to 10.1.25.0/24 even though 10.1.25.3/24 is DIRECTLY CONNECTED and UP. The route table shows 10.1.25.0/24 on the correct interface. arp shows nothing but the firewall IP and MAC. Traceroute to camera 10.1.25.100 shows its directly connected. And yet... every camera is being routed from the public vlan1 interface.

I figure it has to be the Exacqvision service picking the wrong source interface, even though I don't know that's a thing the OS would allow it to pick, because traceroute indicates that other applications are using the correct interface.

As it turns out, however, that's not the case. After a few days to clear the head and return to the site, I did a test that found the real problem: Unplugging the 192.0.2.20 interface stopped connectivity to the cameras.

The inside vlan25 NIC... was configured for vlan1 on the Netgear switching. 150mbit of camera data was always being routed by the firewall, even before I got there.

Once the vlan issue was resolved, arp started showing all the camera MAC addresses. New connections from Exacqvision started going out the correct interface.

Why did I get an arp from the sonicwall on that interface? No idea.
Why did traceroute show directly connected to the camera? No idea.
Why did windows fail silently to access 10.1.25.0/24 through a directly connected interface and then decide on its own to use a different interface to try route it? No idea.
(Also: Why was the TTL of the pings I was getting the SAME when it was going through the firewall verses when it was directly connected after the fix? No idea.)

Thats my story. Quite a walk for a problem that... kinda didn't turn out to be the network in the first place. But thats why network engineers tend to be good at so many other things... we have to be able to point to where the problem really lies because so many things look like network problems.

Key phrases for the google: Exacqvision accesses camera through wrong interface. Windows 10 uses wrong interface



No comments:

Post a Comment