Investigate removing memcpy() from pcap runmodes and possibly others.
currently in the pcap run modes inside of the callback function we do the following memcpy()
memcpy(p->pkt, pkt, p->pktlen);
This is an artifact from the NFQUEUE and is needed by that run mode, but may not be needed by libpcap. We should investigate modifying pkt to simply use the ptr given to use by libpcap. Not sure if this is thread safe etc. In my initial test making the following code modifications lead to a 20% increase but also a pretty big loss in accuracy.
//uint8_t pkt[IPV6_HEADER_LEN + 65536 + 28];
//memcpy(p->pkt, pkt, p->pktlen);
p->pkt = pkt;
#2 Updated by Jason Ish over 8 years ago
Defrag should only need to be modified with respect to storing the reassembled, so a pretty minor fix there.
But will this work? This buffer that is passed to us is managed by libpcap, and will be reclaimed by libpcap at its will. Is there a way to tell libpcap not to reclaim a buffer until we're ready to let it go?
I did a simple test app that made a copy of the packet and stored a pointer to it. The both contained the same data only for about 32 packets (just a very default settings pcap app). Increasing the pcap buffer size may push this out a bit, but its still not under our control.
If you look at pcap_next_ex, which is kind of a wrapper around pcap_dispatch so you can avoid using the callback yourself, it places the packet in a buffer you provided. So you still have the memcpy, but this time inside libpcap rather than in your app. This is required because once the callback has returned you can't rely on the contents of the packet buffer passed to you inside the callback.
I have to admit I haven't kept fully up to date on the latest libpcap developments so there may be away around this.
#3 Updated by Jason MacLulich over 8 years ago
This is the same problem we encounter with DAG integration -- we have to memcpy the packets from the DAG buffers as the DAG card writes to the same ring buffer.
We do have control though on when the DAG card can write over packets but this information would have to be propagated through the engine in some fashion, you would still require copying pkts from the buffer for IP/TCP reassembly, but you may avoid an initial copy.
In the past with more specialized applications I've been able to keep packets in the buffer and then copy them out when we've needed to, e..g IMA (Inverse Multiplexing over ATM) lends itself nicely to this -- this is a relatively simple application though compared with Suricata.
Could be interesting work to see if this is applicable -- Will what sort of HW did you do your testing on?
#4 Updated by Will Metcalf over 8 years ago
"If you look at pcap_next_ex, which is kind of a wrapper around pcap_dispatch so you can avoid using the callback yourself, it places the packet in a buffer you provided. So you still have the memcpy, but this time inside libpcap rather than in your app."
Does this buy us anything performance wise?
#7 Updated by Eric Leblond over 6 years ago
After a first read, it seems we can use the zero copy under the same assumption as with af_packet: all treatment must be made within the callback thread. This means only 'single' and 'workers' mode can benefit from this. I preparing a patch to implement this.
#8 Updated by Eric Leblond over 6 years ago
- File 0001-pcap-enable-zero-copy-mode-in-some-running-mode.patch 0001-pcap-enable-zero-copy-mode-in-some-running-mode.patch added
- % Done changed from 0 to 80
The attached patch brings zero-copy to 'workers' and 'single' mode.