Let me get this out of the way, yes, Sonicwall. At this point, I'm not sure if this is part of the issue (I'm sure there will be jokes stating this is the issue) or if this would happen regardless of the router/firewall being used.
I will do my best to keep it as short as I can, but I do want to provide enough details and information.
Quick diagram, https://i.imgur.com/Yo3Lu0u.png
The Problem- PC1 is a windows 7 computer that runs a camera program that displays 10 cameras, 8 cameras are local to the PC1 network/LAN and 2 cameras are remote (same overall network, different subnet) from the same network which PC2 (windows 10) is connected to on the other side of the wireless bridge. The p2p bridge that links both buildings resets each night (soft reset.....radios reboot at 1am and 1:05am, respectively). The daily reboot is a recommendation from the manufacturer of the wireless bridge (EZ Bridge). Every time I log into PC1 (a remote connection using VNC) I see that only the 8 local cameras are being displayed and the 2 remote cameras show as disconnected/no video/etc.
At first I just assumed the link was bad/needed to be reset, but the other end of this link has VOIP traffic, internet/network/LAN file shares, printing, etc...each on their own VLAN. The remote building only has a couple of users who are not always in the office, not high volume traffic. I am able to successfully ping anything on the PC2 network from PC1 and anything from the PC1 network to PC2, the link is up, there are no other issues with this link other than the 2 cameras appear as disconnected.
This is what I do to get the 2 cameras back online on PC1 (which is in the main building and is being used to display all 10 cameras between both properties).
- I leave the camera program running on PC1 and remote into PC2 and reboot PC2.
- I wait for PC2 to reboot and one of two things happen, on PC1 the camera views come back online or sometimes I have to close the program running on PC1 and relaunch and then the cameras come back online, I've seen both scenarios. 90% of the time rebooting PC1 is all that is needed.
Here is what I am seeing when I perform a packet capture in the sonicwall
DROPPED, Drop Code 736 (Packet dropped - cache add cleanup drop the pkt), Module ID: 25 (network)
Google takes me here, https://www.sonicwall.com/support/knowledge-base/how-can-i-resolve-drop-code-cache-add-cleanup/180118173647344/
The options on the page are as follows:
-
Review the TCP conversation using the packet monitor. If the dropped packet is received after the connection was closed (FIN or RST Packet), the drop is legitimate. If so, you will need to find out why the connection was closed/reset before the end of it by checking the machine that is sending the FIN/RST packet.
-
Try by disabling Enforce strict TCP compliance with RFC 793 and RFC 1122 in Firewall Settings | Flood Protection. CAUTION: This will reduce the security of your network.
-
Make sure that the TCP Connection Timeout on the specific Access Rule or on Manage | Firewall Settings | Flood Protection is not too low (by default it is set to 15 minutes).
-
Try to disable "Enable TCP sequence number randomization" from the diag page of the firewall (https://IP of the SonicWall/diag.html).
-
If the dropped traffic is VPN, make sure that you have a public IP set on the WAN Interface: a double NAT condition may cause the firewall to drop the traffic as "Cache Add Cleanup" due to the change in the packet header.
In reference to the sonciwall support article:
-
I believe this is what is happening, I do see packets with the reset flag enabled, I think that the daily soft reboot is causing a TCP handshake issue since the network has to re-establish a link after the reboot is completed. However, I recall testing this by setting up the radios to not reboot for a couple of days and I believe the issue still happened. I've been having this issue for a while and have been too busy to focus on the problem and I don't remember what I've already tried, I have no problems testing this, again.
-
This was already disabled, I don't think it is unchecked by default, but I am not the only one that logs into this firewall/router. However, since it is unchecked, I know it is not number 2.
-
I believe the default time of 15 minutes should be fine, if it were changed to a very low value, I could change it, but this seems normal.
-
I have not disabled 'TCP sequence number randomization'. When I looked into that setting, it says that it does not apply to packets over a L2 bridge, which is what these packets are passing.
-
The dropped packets are not part of a VPN tunnel.
I did call sonicwall on this (months ago) and the packet capture shows that the packet comes in on x3 (normal) with a source IP of PC1 and shows a destination IP of PC2, but the egress column is blank and sonicwall says it should list the parent interface of x21 with the VLAN of PC2, which I agree is correct, but nothing shows up under egress. Sonicwall support states that this is the reason that the packet is being dropped because it doesn't know which interface to send the traffic to.
At this time, I'm not able to change building 2 to be a layer 3 connection, right now I'm simply extending my L2 networks. I'd like to set up another wireless link between the buildings to allow for some redundancy and/or test L3 connectivity with test equipment while keeping the L2 link online until everything can be converted to L3.
Another strange observation, Regardless if the traffic is VPN or not VPN, in another office (behind an ISP internet connection) I can connect to the cameras on PC2 (WAN and LAN connectivity is enabled for VPN traffic and non VPN traffic) and I can leave my computer connected for weeks at a time w/o the cameras dropping/disconnecting even with the daily wireless ptp bridge rebooting. I'm not understanding why the LOCAL PC1 connection is possibly being affected by the ptp reboot, but a computer over the internet (vpn or non vpn) is able to re-establish a connection. Unless of course this points back to the sonicwall locking up and not knowing what's going on with the traffic.
Hopefully I've included enough information to explain the problem, it would be great to get perspective from someone else. I'm not an expert and I don't have anyone else to bounce this off of, internally, the rest of the staff consists of programmers and help desk.
Thanks.
No comments:
Post a Comment