Bug #1805
closedpfring: zero copy broken
Description
It appears that in certain setups, using PF_RING with multiple threads and zero copy mode is broken.
My test is simple: I blast ~9.6Gbps at the system affected. At some point it crashes sometimes.
I have made test to trigger the issue very quickly: In our 'Packet' structure we have a pointer to the position in the packet that is the ethernet header. I can see that the data in some cases gets corrupted.
So the test I added does this:
Next to the pointer, I added a static data structure for holding the contents of the ethernet header. On ethernet layer decoding I copy the data from the pointer into the static struct. Then just before the end of the life of the packet inside suricata (so before the next pfring_recv call on that thread) I compare if the data the pointer points to and my static copy are they same. If not, I abort.
This test can be found here https://github.com/inliniac/suricata/pull/2144/files
When using more than one thread, it blows up within a minute. When I use one thread, it appears to work correctly. Also when running for a long time.
On manual inspection I can see that the 'static' copy of the ethernet header header is correct. It contains the proper eth_type. The packet has also been decoded correctly at the higher levels which proves that in the pointer version it was correct at one point in time as well. However, in this test the pointer to the ethernet header shows junk values.
I'm suspecting there is some synchronization issue in the kernel/pfring module/driver.
On the same hardware and running the same test both AF_PACKET(v3) and NETMAP behave correctly.
Setup:
Intel X710:
# ethtool -i ens2f1 driver: i40e version: 1.4.25-k firmware-version: 4.53 0x8000206e 0.0.0 expansion-rom-version: bus-info: 0000:0f:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes
It's an older (Nehalem) 4core Xeon with Hyper threading:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 26 Model name: Intel(R) Xeon(R) CPU W3550 @ 3.07GHz
8 RSS queues:
[ 0.869890] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.25-k [ 0.869892] i40e: Copyright (c) 2013 - 2014 Intel Corporation. [ 0.885006] i40e 0000:0f:00.0: fw 4.40.35115 api 1.4 nvm 4.53 0x8000206e 0.0.0 [ 0.989150] i40e 0000:0f:00.0: MAC address: xxx [ 0.993134] i40e 0000:0f:00.0: SAN MAC: xxx [ 1.673081] i40e 0000:0f:00.0: PCI-Express: Speed 5.0GT/s Width x8 [ 1.673084] i40e 0000:0f:00.0: PCI-Express bandwidth available for this device may be insufficient for optimal performance. [ 1.673086] i40e 0000:0f:00.0: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. [ 1.679122] i40e 0000:0f:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 8 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA [ 1.693104] i40e 0000:0f:00.1: fw 4.40.35115 api 1.4 nvm 4.53 0x8000206e 0.0.0 [ 1.795281] i40e 0000:0f:00.1: MAC address: xxx [ 1.799253] i40e 0000:0f:00.1: SAN MAC: xxx [ 2.043232] i40e 0000:0f:00.1: PCI-Express: Speed 5.0GT/s Width x8 [ 2.043237] i40e 0000:0f:00.1: PCI-Express bandwidth available for this device may be insufficient for optimal performance. [ 2.043240] i40e 0000:0f:00.1: Please move the device to a different PCI-e link with more lanes and/or higher transfer rate. [ 2.074505] i40e 0000:0f:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 8 RX: 1BUF RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA [ 2.075630] i40e 0000:0f:00.1 ens2f1: renamed from eth2 [ 2.093337] i40e 0000:0f:00.0 ens2f0: renamed from eth0 [ 3953.702730] i40e 0000:0f:00.1 ens2f1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None [ 3957.127461] i40e 0000:0f:00.1 ens2f1: NIC Link is Down [ 3959.517008] i40e 0000:0f:00.1 ens2f1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None
Using PF_RING 6.4.0
[18827] 9/6/2016 -- 11:01:34 - (runmode-pfring.c:343) <Info> (ParsePfringConfig) -- Using flow cluster mode for PF_RING (iface ens2f1) [18827] 9/6/2016 -- 11:01:34 - (util-runmodes.c:295) <Info> (RunModeSetLiveCaptureWorkersForDevice) -- Going to use 2 thread(s) [New Thread 0x7ffff3e18700 (LWP 18859)] [18859] 9/6/2016 -- 11:01:34 - (source-pfring.c:472) <Info> (ReceivePfringThreadInit) -- Enabling zero-copy for ens2f1 [18859] 9/6/2016 -- 11:01:34 - (source-pfring.c:537) <Info> (ReceivePfringThreadInit) -- (W#01-ens2f1) Using PF_RING v.6.4.0, interface ens2f1, cluster-id 99 [New Thread 0x7ffff2f54700 (LWP 18860)] [18860] 9/6/2016 -- 11:01:34 - (source-pfring.c:472) <Info> (ReceivePfringThreadInit) -- Enabling zero-copy for ens2f1 [18860] 9/6/2016 -- 11:01:34 - (source-pfring.c:537) <Info> (ReceivePfringThreadInit) -- (W#02-ens2f1) Using PF_RING v.6.4.0, interface ens2f1, cluster-id 99 [18827] 9/6/2016 -- 11:01:34 - (runmode-pfring.c:521) <Info> (RunModeIdsPfringWorkers) -- RunModeIdsPfringWorkers initialised $ cat /proc/net/pf_ring/info PF_RING Version : 6.4.0 (unknown) Total rings : 2 Standard (non ZC) Options Ring slots : 4096 Slot version : 16 Capture TX : Yes [RX+TX] IP Defragment : No Socket Mode : Standard Total plugins : 0 Cluster Fragment Queue : 0 Cluster Fragment Discard : 0 $ cat /proc/net/pf_ring/19136-ens2f1.37 Bound Device(s) : ens2f1 Active : 1 Breed : Standard Appl. Name : Suricata Socket Mode : RX+TX Capture Direction : RX+TX Sampling Rate : 1 IP Defragment : No BPF Filtering : Disabled Sw Filt Hash Rules : 0 Sw Filt WC Rules : 0 Hw Filt Rules : 0 Sw Filt Hash Match : 0 Sw Filt Hash Miss : 0 Poll Pkt Watermark : 128 Num Poll Calls : 2 Channel Id Mask : 0xFFFFFFFFFFFFFFFF Cluster Id : 99 Slot Version : 16 [6.4.0] Min Num Slots : 4098 Bucket Len : 1524 Slot Len : 1728 [bucket+header] Tot Memory : 7090176 Tot Packets : 9680214 Tot Pkt Lost : 9220222 Tot Insert : 458907 Tot Read : 448573 Insert Offset : 294608 Remove Offset : 297888 Num Free Slots : 0 TX: Send Ok : 0 TX: Send Errors : 0 Reflect: Fwd Ok : 0 Reflect: Fwd Errors: 0