Sunday, March 8, 2020

YSK about firewalls and how they work

Inspired by the famous "YSK about fiber" post.

I realized there's a certain aura about firewalls in this field. Firewalls can be tricky. They are born to drop traffic and sometimes they do it unexpectedly, leading to long troubleshooting sessions.

Also, while everybody here agrees that routing and switching is part of the job, not everybody agrees on firewalls. Sometimes, they are managed by a separate team entirely.

So I decided to write down something, in an attempt to clear up some things about how firewalls work. I hope this helps.

A firewall is a default-deny device. What is not explicitly allowed is forbidden.

Basic concepts

I think some of the mystery aura about firewalls is because there's a whole vocabulary of new concepts. And we all know how vendors make things easier when dealing with naming things /s

Policy/rule. Identifies source zone, destination zone (if any), source, destination, port, application, action, profile.

Zones. A zone is a group of physical or virtual interfaces with similar functions and which usually have the same trust level. When you set up zones, firewalls put a default-allow rule for traffic within the zone itself. However, you can also set up fine-grained policies which can deny even traffic that stays in the same zone (so-called "intra-zone blocking"). Microsegmentation relies on this.

Application. An application (sometimes called service) is an actual Layer 7 protocol (or application, more on that later) that is recognized by the firewall. This allows you to set up policies that don't just allow traffic to tcp/8443, but only https traffic to that port.

Some firewalls take this a step further, and are able to recognize actual applications like Facebook or Office365, mostly by checking domains and IPs plus some heuristics. This means you can set up a policy like "10.0.0.0/8 to Internet, application=Facebook, deny". Of course these things are not 100% foolproof.

Obviously, you can still use Layer4 ports in your policies, it's just an additional tool.

Action. Permit/deny/log.

Profile. A profile is a security capability of the firewall that goes beyond simple L3 permit/deny. Examples: antivirus scanning, TLS inspection, URL filtering.

This means you can set up a policy that permits traffic but, for example, performs an AV scanning on what's being sent.

The implementation of these features can be tricky and often relies on weaknesses in the protocols that are then fixed. Basically, TLS inspection is a MitM attack performed by a firewall. So, as protocols are fixed to improve privacy and make MitM attacks harder, NGFW vendors race to make their own MitM possible... or suggest people to disable these new features.

AV scanning can be either proxy-based or stream-based, in which case the firewall examines files block by block. You can also set up your firewalls to send suspicious files to a sandbox, or to an external antivirus platform (Virustotal-like).

URL filtering relies on the fact that HTTPS doesn't usually hide the domain and, if TLS inspection is enabled, it also examines the actual URL in the payload. Firewalls then keep a database which classifies URLs in categories (travel, gambling, health, cryptomining,...). This is also useful for compliance reasons. Example: you don't want to carry out TLS inspection when people access health-related websites.

Instances. Firewalls can often act as L3 devices, so you can sometimes set up VRFs on them. You can also set up "partitions", basically multiple virtual firewalls that can be managed separately, by different people.

As usual, vendors like to call the same thing in different ways. So, partitions are called "contexts" by Cisco (ASA and FTD), "logical systems" by Juniper, "virtual systems" by Palo Alto, "virtual domains" (VDOM) by Fortinet. But, guess what? They are exactly the same thing.

Objects and groups. An object is an IP or IP range that is given a name. This allows you to:

1) create human-readable policies, which don't rely on rote memorization of IP ranges to be understood. Instead of "192.168.44.0/27" you can write "domain-controller-net" in the policy (of course if you defined the object), and everybody that has to read the policy is happier

2) re-IP servers without manually changing the policies, just by modifying the corresponding object.

BIG CAVEAT here: if you re-IP a server on a range that falls within another zone, you're typically SOL and you have to manually change policies. Traffic between different zones is by default blocked, and you end up staring at a configuration that seems good and asking yourself "why doesn't it work?... Oh wait, the zone is wrong".

You can set up object groups, to make policies even more readable and flexible. You can also define application groups (for example: all the ports required for NFS).

Session timeouts. Firewalls are stateful devices, so they have to keep a session table. When the session timeout expires, the firewall usually closes the TCP session. This can be a problem, especially during low traffic times.

There are all kinds of issues that can happen because of this, including not being able to access your Outlook address book.

In most cases, default session timeouts are fine, but keep in mind special cases like databases or Exchange.

As always, there's a tradeoff. Setting up high session timeouts means DoS attacks on the firewall (by exhausting its resources) are easier. However, you can mitigate this risk by properly configuring DoS protection on the firewall.

High availability

Asymmetric routing is notoriously bad, especially when hitting stateful devices like firewalls. As a result, active/standby is the standard way to handle firewall HA.

Active/active is not supported by all devices, sometimes certain features are not supported, sometimes you have to ask your vendor, so it's certainly not a solution out-of-the-box.

When you do need active/active, things get more complex because you have to avoid, or at least manage, asymmetric routing.

Common approaches rely on:

  • active/active as a combination of two active/standby groups with inverted priorities. Let's say you have firewalls A and B. With this approach, you set up two HA groups.

Group 1: master = A, slave = B. Group 2: master = B, slave = A.

  • session synchronization between the two firewalls. In this way, even if a packet actually travels asymmetrically, both firewalls have the same information, so nothing wrong should happen. A caveat here is that, depending on how often you synchronize session state and when that packet actually arrives, you may still have some issues.

If you're doing Active/Active and your firewalls also act as default gateways, you also have to introduce a way to manage ARP.

Again, there are two approaches to this:

1) you share ARP among the firewalls based on some load-balancing mechanism, and use gratuitous ARP in case of failure (Palo Alto, "ARP load sharing"), or

2) you prevent ARP requests from hitting the other data center. ARP traffic is broadcast, so you can benefit from BUUM (Broadcast, Unknown Unicast, Multicast) suppression techniques used by OTV, EVPN, and so on.



No comments:

Post a Comment