Traffic analysis for a NSX micro-segmentation design using Syslog
In a recent project I had to setup micro-segmentation using the Distributed Firewall (DFW) of VMware NSX in a greenfield environment. The approach I choose was to design and configure the DFW rule base using the Service Composer and all the information could gathered within the environment and within the project team (i.e. which services are running, which ports should be allowed, and so on). The truth is, in a greenfield environment you are always going to miss certain specific traffic flows. In order to identify these unknown flows, I configured each ESXi host to send its logging to a centralized Syslog server. Unfortunately, the customer did not want to use vRealize Log Insight for now (or Network Insight for that matter) but instead provided me with a vanilla RHEL instance running Rsyslog.
Not breaking the environment during the project phase
Because we were building the new environment with a fairly large team working together on different aspects of the overall design, I did not want to break the environment by unintentionally blocking traffic with the DFW. So, each ‘block’ rule in the DFW rulebase was configured with the “Log” option selected and the Action was set to ‘Allow’ during the project phase.
This way I could analyse traffic in the DFW logging and see which flows would be blocked as soon as I switched the Action to ‘Block’ without frustrating my team members by unintentionally blocking traffic.
Accessing the dfwpktlogs logfile
The NSX DFW normally logs to a local file on the ESXi host in /var/log/dfwpktlogs.log. By configuring the ESXi host with a central logging host (see Configuring syslog on ESXi (2003322) – Don’t forget to open the firewall ports on the ESXi host and restart the Syslog daemon!) all ESXi logging is sent to the Syslog server. I could then periodically SFTP the ESXi logfiles to a local system and perform an analysis of the logfiles. There are tons of Syslog analysis tools available but I simply stuck with macOS Console application, which suited my purpose. All ESXi logging is aggregated in a single logfile so I filtered on dfwpktlogs:
Interpreting the dfwpktlogs logfile
Looking at hundreds of potential logfile entries can be pretty daunting. The first important thing is understanding what you are actually looking at. Lets describe the fields in a single entry:
2016-12-09T07:12:26.543Z xxxxxxxxxxxxx dfwpktlogs: 26773 INET match PASS domain-c7/1001 IN 60 TCP 10.33.24.50/45926->10.33.24.9/8140 S
(The information below is primarily gathered from the NSX 6.2.4 Administration Guide at https://pubs.vmware.com/NSX-6/index.jsp?topic=%2Fcom.vmware.nsx.admin.doc%2FGUID-ECEE0A32-88D5-4E82-A9B1-4847A91E1EBF.html)
- 2016-12-09T07:12:26.543Z = Timestamp
- xxxxxxxxxxxxx = Hostname
- dfwpktlogs = NSX Distributed Firewall logging
- 26773 = This field was added in v6.2.4 and can be used to trace the log entry back to a particular vnic and particular VM. Dale Coghlan explains how to trace this information back in this blogpost.
- INET = AF (Address Family) value. This can either be INET (for ipv4) or INET6 (for ipv6)
- match = Reason for the log entry. This can be match, bad-offset, fragment, short, normalize, memory, bad-timestamp, congestion, ip-option, proto-cksum, state-mismatch, state-insert, state-limit, src-limit, synproxy, spoofguard. In this case a match was found against a DFW rule.
- PASS = Action. This can be PASS, DROP, PUNT, REDIRECT, COPY, TERMINATE. In this case the traffic is allowed because all my future Block rules are still configured with the Action ‘Allow’.
- domain-c7 = Rule set value. This is an internal MoRef ID (Managed Object Reference ID) used internally to identify objects
- 1001 = Rule ID. This is an internal ID for the specific DFW rulebase entry that was matched to a certain flow.
- IN = Direction of the flow. This can be OUT or IN.
- 60 = This is the packet length
- TCP = Protocol. This can be TCP, UDP or PROTO
- 10.33.24.50/45926 = Source IP address/Source Port
- 10.33.24.9/8140 = Destination IP address/Destination Port
- S = TCP Flag. The TCP Flags are explained in the image below (provided by Cisco)
Matching the logfile entry with a specific DFW rule
My logfiles contained hundreds of entries, so I found it difficult to derive from the logfile which potential block rules were being ‘matched’ by the DFW with a specific flow. The Rule ID from the log entry is an internal ID and does not show up in the DFW GUI. Maybe there is an easier way to do this but I eventually exported the entire DFW rulebase to an XML file where the Rule ID is displayed:
I now know the Default Rule in the Default Section Layer3 section was responsible for this specific match and log entry. As you can see, the action is still configured for for ‘allow’. When I would change this rule to ‘block’, this specific flow would be dropped. I can now use the information from the log entry (Source, Destination, Ports) to determine if this specific flow should be incorporated in the ruleset so it can be allowed, or if the traffic is indeed breaking security policy and should be dropped.
Other methods for analyzing traffic flows
Using Syslog, macOS Console and an XML export of the DFW rulebase is a pretty rudimentary method to analyze traffic flows for a micro-segmentation setup. It is pretty efficient however, and I found it very useful to validate my security design against the actual traffic flows. Of course there are a lot of other ways of doing this. Some examples:
NSX Flow Monitoring
NSX also provides ‘Flow monitoring’ for example. This is also a very useful tool to analyze traffic flows but using the GUI interface to analyse thousands of sessions is pretty hard. I primarily used Flow Monitoring as an additional source of information.
NSX Activity Monitoring
If you have the option of leveraging the Guest Introspection part of NSX, Activity Monitoring can provide you with a detailed view of traffic flowing between VMs. You have to deploy the Guest Introspection service appliance and make sure the required driver package is installed with VMware Tools:
I did not have access to Guest Introspection in my project so I could not use this solution.
vRealize Network Insight
Of course the mother of all network traffic analysis tools is vRealize Network Insight. Unfortunately, I could not use this awesome tool in my project at the moment but this is without doubt the tool I would have used if possible. It uses the Arkin technology VMware aquired some time ago to deliver intelligent analysis of your NSX environment. I highly recommend having a look at VMware Hands on Labs “HOL-1729-SDC-1 – Introduction to vRealize Network Insight” if you want to know more about Network Insight.