TCP checksum errors on compute nodes

TCP checksum errors on compute nodes

If you have nested VMs in one network running through the VMware ESXi bare metal hypervisor on different compute nodes, the TCP-based services may not work or have the TCP checksum errors increasing in the output of the dropstats command. It can be due to certain Network Interface Cards (NICs) not supporting IP checksum calculation.

To identify the issue:

  1. Inspect the output of the dropstats command that shows the number of Checksum errors.

  2. Inspect the output of the tcpdump command for a specific NIC. For example, for enp2s0f1. If you find cksum incorrect entires, the issue exists in your environment.

    tcpdump -v -nn -l -i enp2s0f1.1162 host 10.0.2.162 | grep -i incorrect
    

    Example of system response:

    tcpdump: listening on enp2s0f1.1162, link-type EN10MB (Ethernet), capture size 262144 bytes
    10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901889431, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456361578 ecr 41455995,nop,wscale 7], length 0
    10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901889183, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456361826 ecr 41455995,nop,wscale 7], length 0
    10.254.19.231.80 > 192.168.100.3.45506: Flags [S.], cksum 0x43bf (incorrect -> 0xb8dc), \
    seq 1901888933, ack 1081063811, win 28960, options [mss 1420,sackOK,\
    TS val 456362076 ecr 41455995,nop,wscale 7], length 0
    
  3. If you do not find the checksum errors using the tcpdump command, inspect the output of the flow -l command that shows the information about a drop for unknown reason.

Workaround:

Turn off the transmit (TX) offloading on all compute nodes for the problematic NIC used by vRouter:

  1. Run the following command:

    ethtool -K <interface_name> tx off
    
  2. Verify the status of the TX checksumming:

    ethtool -k <interface_name>
    

    Example of system response:

    tx-checksumming: off
    tx-checksum-ipv4: off
    tx-checksum-ipv6: off
    tx-checksum-sctp: off
    tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp6-segmentation: off [requested on]