Skip to content

Firewall drops packets

Hypervisor firewall may drop packets routed upstream with destination downstream on the same host as the source packet.

Symptom

When virtual machines are on the same hypervisor host in the same cluster, traffic may fail to flow correctly from one virtual machine to another. If debugging is enabled on the host firewall, the source and destination ports may be swapped in the kernel log, similar to the below log:

Nov 23 15:44:08 cluster-name-hypercloud-compute-kvm-90e2ba8a9600 kernel: [15017663.647164] IN=u-INTERNET OUT=u-INTERNET PHYSIN=bond0.35
PHYSOUT=one-383-0 MAC=02:00:26:44:c1:4d:a2:fe:04:00:00:40:08:00 SRC=216.220.185.13
DST=48.68.193.77 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=48687 DF PROTO=TCP SPT=22 DPT=36314 WINDOW=63480 RES=0x00 ACK PSH URGP=0 

Conditions

The following conditions must all be TRUE:

  1. Virtual Machines Deployed on the Same Hypervisor Node
  2. TCP Traffic is not flowing as expected (Appears to be blocked.)
  3. Live migrating one of the virtual machines to another node resolves the issue temporarily.
  4. Traffic is being routed upstream to ANOTHER router before being routed back to the other VM. (VMs are on different ROUTED subnets.)
  5. The hypervisor firewall is dropping the return traffic packets because the destination port appears to be the source port.

Root Cause

This issue is caused by the hypervisor stateful firewall NOT relating the return traffic to the origin connection. In some instances, upstream routers may mangle TCP packets inline by randomizing the TCP sequence numbers. If this occurs, the Linux conntrack module will not consider the return traffic to be related to the connection already established in the state table. In this case, the return traffic will be treated as a NEW connection, where the destination port will be marked as the source port. The source port is almost always in the ephemeral port range and will almost certainly be blocked by the security group, as it is not best practice to allow ephemeral port range traffic through the hypervisor firewall via the security group rules configured for the attached NICs.

Final Fix

To remediate this issue, disable TCP sequence randomization on the inline router. For a Cisco ASA or Cisco Firepower Threat Defense device, the configuration is similar to the below:

  policy-map global_policy
       class class-default
          set connection random-sequence-number disable

Workaround

The definitive fix for this issue relies on upstream network configuration changes. Alternatively, this issue may be worked around by performing ONE of the following:

  1. Update the DESTINATION security group to allow TCP traffic from ALL ports from the SOURCE IP address.
  2. Update the DESTINATION security group to allow TCP traffic from ALL ports from the entire SOURCE Virtual Network subnet.

Details

Last Modified: 02/23/2021
Last Modified By: Kenneth Van Alstyne <kenny.vanalstyne@softiron.com>
Status: Root Cause Found, Final Fix Released
Known Affected Releases: All
Fossil Ticket ID: 4f863867ec
Jira Ticket ID: CS-356