Project

General

Profile

Actions

Optimization #2218

closed

Leave TSO enabled for Linux AF_PACKET runmode

Added by Bhavesh Davda over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Effort:
Difficulty:
Label:

Description

Not sure why Suricata choses to disable all NIC offloads by default on Linux, and spews out a nasty/scary warning in the logs:

29/9/2017 -- 15:00:42 - <Notice> - This is Suricata version 3.2.1 RELEASE
29/9/2017 -- 15:00:45 - <Warning> - [ERRCODE: SC_ERR_NIC_OFFLOADING(284)] - NIC offloading on eth0: SG: SET,  GRO: SET, LRO: unset, TSO: SET, GSO: SET. Run: ethtool -K eth0 sg off gro off lro off tso off gso off
29/9/2017 -- 15:00:45 - <Warning> - [ERRCODE: SC_ERR_AFP_CREATE(190)] - Using AF_PACKET with offloading activated leads to capture problems

I can understand why packet capture with various receive offloads like LRO/GRO, or even receive checksum offload, can make life difficult for packet analysis. But on the transmit side, Suricata can trust that the NIC driver will "do the right thing" (TM) for TSO packets, or drop them, if for example the TCP pseudo-header checksum is incorrect or something.

From a performance POV, TSO makes a huge difference, both in terms of CPU utilization and throughput. Anecdotally, we've measured CPU/throughput going from 453%/2131Mbps to 110%/9842Mbps (10GbE line rate) just by turning TSO on manually using "ethtook -K tso on"

I can propose a patch to leave TSO enabled in AF_PACKET runmode if you agree.


Files

0001-Leave-TSO-enabled-for-Linux-AF_PACKET-runmode.patch (3.87 KB) 0001-Leave-TSO-enabled-for-Linux-AF_PACKET-runmode.patch Proposed patch to leave TSO enabled for Linux AF_PACKET runmode Bhavesh Davda, 09/29/2017 03:45 PM
0001-Leave-SG-enabled-for-Linux-AF_PACKET-runmode.patch (3.28 KB) 0001-Leave-SG-enabled-for-Linux-AF_PACKET-runmode.patch Proposed patch to leave SG enabled for Linux AF_PACKET runmode Bhavesh Davda, 10/02/2017 11:41 AM
Actions #1

Updated by Eric Leblond over 6 years ago

I agree TSO could be interesting to keep. What is your test ?

Actions #2

Updated by Bhavesh Davda over 6 years ago

Test is iperf3:

Server:
iperf3 --server --port 20001

Client (running Suricata):
iperf3 -c <server-IPv4> -p 20001 -f m -i 1 -t 6 -O 2 -P 16

Result (truncated):
[ ID] Interval Transfer Bandwidth Retr
[SUM] 0.00-6.00 sec 6.55 GBytes 9381 Mbits/sec 6118 sender

Actions #3

Updated by Bhavesh Davda over 6 years ago

Proposed patch. Just to see if this is the right way to go about doing this.

Actions #4

Updated by Eric Leblond over 6 years ago

In your test, you are testing the local stack not Suricata. In most cases, suricata is handling a copy of the traffic on an interface and not acting as a server.

You already can disable the offloading done by suricata by setting the folowing variables:

  capture:
    # disable NIC offloading. It's restored when Suricata exists.
    # Enabled by default
    disable-offloading: false

Actions #5

Updated by Bhavesh Davda over 6 years ago

Eric Leblond wrote:

In your test, you are testing the local stack not Suricata. In most cases, suricata is handling a copy of the traffic on an interface and not acting as a server.

Not sure I follow. Suricata was running in the background:

suricata -D -c /etc/suricata/suricata.yaml -i eth0 --pidfile /var/log/suricata/suricata.pid

You already can disable the offloading done by suricata by setting the folowing variables:
[...]

Yes, I had already verified that, but note that:

1. It logs those warnings I mentioned in the original description of this issue
2. It leaves all offloads enabled, include GRO, LRO, GSO, SG, etc. not just TSO

What I'm proposing is specifically only leaving TSO enabled.

Actions #6

Updated by Victor Julien over 6 years ago

  • Description updated (diff)
  • Priority changed from High to Normal
Actions #7

Updated by Victor Julien over 6 years ago

I'm not sure why I added TSO, or if it is necessary. I guess my thinking was that when capturing egress traffic it was relevant. So the question comes down to: is TSO handled before or after Suricata would capture egress traffic.

Actions #8

Updated by Bhavesh Davda over 6 years ago

Turns out TSO needs SG (as do receive side offloads), but SG is a leaf-level feature (i.e. SG doesn't depend on anything else). So a 2nd patch to leave SG on as well.

Actions #9

Updated by Bhavesh Davda over 6 years ago

Victor Julien wrote:

I'm not sure why I added TSO, or if it is necessary. I guess my thinking was that when capturing egress traffic it was relevant. So the question comes down to: is TSO handled before or after Suricata would capture egress traffic.

So if I understand your question correctly, the actual TCP Segmentation is done after the protocol packet handlers are called, where pcap/AF_PACKET is hooked up in the Linux networking stack. So dev_queue_xmit (and helpers) first call the packet_rcv AF_PACKET protocol hook, before calling the NIC driver's ndo_start_xmit method.

For a TSO capable NIC, the NIC driver's ndo_start_xmit method will use the device-specific interface to set up TSO in hardware, and the NIC will DMA the un-segmented large TCP packet and do the TCP segmentation based on the MSS in hardware.

Actions #10

Updated by Bhavesh Davda over 6 years ago

Please let me know if you agree with the proposed patches, or if I should make any changes.

I'm contemplating also leaving GSO enabled, because even though it's segmentation "offload" done in software, it was designed for better performance as described here: https://wiki.linuxfoundation.org/networking/gso. This will of course need to be tested, like I've already done for the other 2 patches to keep TSO and SG enabled.

Actions #11

Updated by Victor Julien over 6 years ago

I'm a bit confused about the goal here. If you want to keep offloads enabled, see https://redmine.openinfosecfoundation.org/issues/2218#note-4

The offloads give fake packets, and for accuracy we need the real packets. If that comes at a perf cost, so be it. Ppl that know what they are doing can disable the behavior.

Actions #12

Updated by Victor Julien over 6 years ago

I tested with TSO enabled, and it gives packets that are far bigger than the MTU. I guess AF_PACKET captures the larger packets that are sent to the NIC, that the NIC sends as real proper packets. So TSO needs to remain disabled so that capture of egress traffic works as expected. Not sure about sg. Disabling of gso is also needed on the receive side.

Actions #13

Updated by Bhavesh Davda over 6 years ago

I'm not sure what "fake packets" v/s "real packets" means. The packets that suricata will see over the AF_PACKET interface will be exactly the ones that the NIC driver will be getting for transmit, as far as transmit side offloads such as TSO are concerned.

After that, the NIC driver will set up the NIC hardware to do the offload "on-the-fly" as it DMAs the packets from system memory and before it puts them on the wire. But system memory will never see the final TSO-segmented packets that will be put on the wire.

For that matter, this is no different than say a router in the middle doing IP fragmentation/reassembly; it's a piece of equipment in the end-to-end datapath that is free to further munge the packets in flight.

But from suricata's POV, it is seeing exactly the same packet that is being handed off to the NIC hardware by the NIC driver.

My issue with using the "disable-offloading: false" setting is the warnings suricata logs when you do that. I understand that this would be an issue with receive offloads such as GRO/LRO, where the NIC will indicate to it's driver that the checksum is "correct" without actually fixing the checksum in the packet delivered to system memory, and that information is lost across the AF_PACKET interface making the checksum appear invalid to suricata.

But for transmit side offloads, I'm yet to hear of a specific issue it can cause to suricata's abilities to analyze packets.

Actions #14

Updated by Bhavesh Davda over 6 years ago

BTW even for receive checksum offloads, you can use the PACKET_AUXDATA socket option on AF_PACKET sockets, and look at TP_STATUS_CSUM* in aux.tp_status for what the NIC hardware determined w.r.t. the checksum. And looks like suricata's AFPReadFromRing function is doing this.

Actions #15

Updated by Victor Julien over 6 years ago

The point is that Suricata needs to see the packets as they are (& will be) on the wire.

Actions #16

Updated by Andreas Herz over 6 years ago

  • Assignee set to Bhavesh Davda
  • Target version set to TBD
Actions #17

Updated by Victor Julien about 6 years ago

  • Status changed from New to Closed
  • Assignee deleted (Bhavesh Davda)
  • Target version deleted (TBD)
Actions

Also available in: Atom PDF