Support #2900
closedalert 'SURICATA STREAM pkt seen on wrong thread' when run mode set to workers
Description
This alert is constantly triggered when run mode 'workers' is enabled.
Everything works as expected when set to AutoFP.
Can this alert be safely suppressed, or is this something that should be considered an issue?
Setup:
Suricata 4.1.2_1 inline IPS mode with netmap on FreeBSD 11.2
Updated by Victor Julien almost 6 years ago
- Related to Optimization #2725: stream/packet on wrong thread added
Updated by Victor Julien almost 6 years ago
Unfortunately this is a serious issue that can lead to missed alerts and logs. Resolving it should be high priority. If autofp works well I would recommend staying on that. We're tracking the larger issue in #2725
Updated by Andreas Herz over 5 years ago
We're trying to narrow this issue down as best as we can. Can you give us more details about your config/setup (I saw a pfsense/netgate post from you, I guess that's related to that?) and the traffic seen?
I have similiar issues (but on Linux with AFPacketv3+workers mode) and I'm trying to find a scheme for the traffic that might produce those issues.
Thanks
Updated by Andreas Herz over 5 years ago
- Assignee set to OISF Dev
- Target version set to Support
Updated by Cooper Nelson over 5 years ago
Adding notes from my recent 'deep dive'.
The root cause appears to be the hardware implementation of RSS in some NICs, confirmed in the ixgbe driver.
Fragmented TCP packets will be hashed by 'sd' only (as the TCP header is only present on the first packet), so fragmented flows will only go to the same queue if every TCP packet in the flow is fragmented.
However, in practice its very common for the handshake and first packets of a big TCP flow to not be fragmented and fragmentation occurs later in the flow. Particularly when the packet rates increase due to receive window scaling.
Looking at the documentation for AF_PACKET shows that it is supposed to handle this case properly, but either its not or perhaps suricata isn't setting it properly on all kernels:
http://man7.org/linux/man-pages/man7/packet.7.html
It also may be the case that this is describing a software implementation that is overridden by hardware RSS, if present. I think I remember regit mentioning that if there was a flow hash generated on the NIC, that is what cluster_flow used.
I do not think it is possible to force a 'sd' hash on the older 10Gbit Intel NICs, however I might be mistaken.
I'm thinking cluster_flow could be modified to handle fragmented TCP packets properly, or simply just hash on 'sd' only. However the TCP packets would still be delivered out-of-order to the worker thread in many cases due to timing issues. Not sure how much of an issue this is with the stream tracker.
Updated by Anonymous over 5 years ago
Andreas Herz wrote:
We're trying to narrow this issue down as best as we can. Can you give us more details about your config/setup (I saw a pfsense/netgate post from you, I guess that's related to that?) and the traffic seen?
I have similiar issues (but on Linux with AFPacketv3+workers mode) and I'm trying to find a scheme for the traffic that might produce those issues.
Thanks
I can no longer replicate the issue.
I have replicated my (almost) exact same setup from at the time I opened this issue.
Intel pro/1000 PT NIC
pfSense 2.4.4-p3 (FreeBSD 11.2)
Hardware checksum, tcp and large receive offloading disabled
Flow control disabled
Suricata 4.1.4_2
Netmap + worker mode
Changes:
pfSense 2.4.4-p2 -> 2.4.4-p3 (nothing major, still the same FreeBSD release.)
Suricata 4.1.2_1 -> 4.1.4_2
Updated by Anonymous over 5 years ago
Disregard my last update, issue still persists on FreeBSD 11.2 with netmap and worker mode. Intel pro/1000 PT NIC (em driver).
Updated by Andreas Herz over 5 years ago
Karel Van Hecke wrote:
Disregard my last update, issue still persists on FreeBSD 11.2 with netmap and worker mode. Intel pro/1000 PT NIC (em driver).
Could you check what possible options are offered by the NIC. On Linux we can use ethtool to control relevant parts of that. Not sure how it's done with FreeBSD and especially how this affects netmap. Would be nice to see what options are available.
Updated by Andreas Herz over 4 years ago
- Status changed from New to Closed
Hi, we're closing this issue since there have been no further responses.
If you think this bug is still relevant, try to test it again with the
most recent version of suricata and reopen the issue. If you want to
improve the bug report please take a look at
https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs