Bug #1834
closedLost HTTP responses when using multiple af-packet threads
Description
The attached .pcap-file contains a single TCP connection with 100 HTTP requests in it. The client closes TCP connection after the last one.
In order to reproduce the bug, you'll need to enable extended HTTP logging in default suricata.yaml
config, and replay the attached dump like this. For some reason I was unable to reproduce it with feeding pcaps to Suricata directly or via dummy interface, though.
tcprewrite --seed=$RANDOM --fixcsum --infile 100reqs.pcap --outfile - | tcpreplay --intf1=lo -
If multiple af-packet processing threads are enabled, there's a chance that Suricata will somehow lose all HTTP responses in a connection. HTTP requests will still be handled, though.
Most of the time, http.log
will contain like like these, which is expected.
06/29/2016-21:22:29.476586 localhost [**] /foo [**] <useragent unknown> [**] <no referer> [**] GET [**] HTTP/1.1 [**] 404 [**] 169 bytes [**] 93eb:ad81:93eb:ad81:93eb:ad81:93eb:ad7f:50992 -> 93eb:ad81:93eb:ad81:93eb:ad81:93eb:ad7f:80
But sometimes, Suricata will lose the responses, and http.log
will contain the following:
06/29/2016-21:22:30.384542 localhost [**] /foo [**] <useragent unknown> [**] <no referer> [**] GET [**] HTTP/1.1 [**] <no status> [**] 0 bytes [**] 1703:0456:1703:0456:1703:0456:1703:0457:50992 -> 1703:0456:1703:0456:1703:0456:1703:0457:80
There's no reported packet drop on capture.
On my PC the chance of this occuring is about 1 in 50, but can happen as early as after several replays. I have 4 cores, so 4 threads are spawned, but I was able to observe this with two threads as well.
Files
Updated by Victor Julien over 8 years ago
Can you try reducing the number of RSS queues to 1 both on the sending and the receiving NIC?
Updated by WGH WGH over 8 years ago
I'm not quite sure it's applicable to loopback device.
Updated by Eric Leblond over 8 years ago
What is your kernel version ? If >= 4.2 can you try with an older kernel ?
Updated by WGH WGH over 8 years ago
Yes, my kernel is 4.6.2.
I've just tried 3.13.0-32, and have been unable to reproduce it so far. Fascinating :)
Updated by Eric Leblond over 8 years ago
Ok, as I was supposing, your test trigger the "flow hash not more symmetric" issue appeared in 4.2 (see #1777). We are currently discussing with Linux network developer and it should be fixed in upcoming version. Backport to stable is not yet discussed but we will raise the issue. On suricata side, I'm currently working on workaround but I fear this should not be available before 3.2 release.
Updated by Andreas Herz over 8 years ago
- Assignee set to Eric Leblond
- Target version set to TBD
Updated by Victor Julien over 8 years ago
Can you retest after applying the suggestions from Packet_Capture?
Updated by WGH WGH over 8 years ago
I'm now on Linux 4.7.1, and it seems that the problem is gone.
As of other suggestions, again, I don't know how to apply them to loopback device.
Updated by Victor Julien over 8 years ago
- Status changed from New to Closed
- Target version deleted (
TBD)
Loopback probably doesn't use RSS anyway. Thanks for testing!