Project

General

Profile

Actions

Bug #1834

closed

Lost HTTP responses when using multiple af-packet threads

Added by WGH WGH almost 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
Affected Versions:
Effort:
Difficulty:
Label:

Description

The attached .pcap-file contains a single TCP connection with 100 HTTP requests in it. The client closes TCP connection after the last one.

In order to reproduce the bug, you'll need to enable extended HTTP logging in default suricata.yaml config, and replay the attached dump like this. For some reason I was unable to reproduce it with feeding pcaps to Suricata directly or via dummy interface, though.

tcprewrite --seed=$RANDOM --fixcsum --infile 100reqs.pcap --outfile - | tcpreplay --intf1=lo -

If multiple af-packet processing threads are enabled, there's a chance that Suricata will somehow lose all HTTP responses in a connection. HTTP requests will still be handled, though.

Most of the time, http.log will contain like like these, which is expected.

06/29/2016-21:22:29.476586 localhost [**] /foo [**] <useragent unknown> [**] <no referer> [**] GET [**] HTTP/1.1 [**] 404 [**] 169 bytes [**] 93eb:ad81:93eb:ad81:93eb:ad81:93eb:ad7f:50992 -> 93eb:ad81:93eb:ad81:93eb:ad81:93eb:ad7f:80

But sometimes, Suricata will lose the responses, and http.log will contain the following:

06/29/2016-21:22:30.384542 localhost [**] /foo [**] <useragent unknown> [**] <no referer> [**] GET [**] HTTP/1.1 [**] <no status> [**] 0 bytes [**] 1703:0456:1703:0456:1703:0456:1703:0457:50992 -> 1703:0456:1703:0456:1703:0456:1703:0457:80

There's no reported packet drop on capture.
On my PC the chance of this occuring is about 1 in 50, but can happen as early as after several replays. I have 4 cores, so 4 threads are spawned, but I was able to observe this with two threads as well.


Files

100reqs.pcap (61.1 KB) 100reqs.pcap WGH WGH, 06/29/2016 01:21 PM
Actions #1

Updated by Victor Julien almost 8 years ago

Can you try reducing the number of RSS queues to 1 both on the sending and the receiving NIC?

Actions #2

Updated by WGH WGH almost 8 years ago

I'm not quite sure it's applicable to loopback device.

Actions #3

Updated by Eric Leblond almost 8 years ago

What is your kernel version ? If >= 4.2 can you try with an older kernel ?

Actions #4

Updated by WGH WGH almost 8 years ago

Yes, my kernel is 4.6.2.

I've just tried 3.13.0-32, and have been unable to reproduce it so far. Fascinating :)

Actions #5

Updated by Eric Leblond almost 8 years ago

Ok, as I was supposing, your test trigger the "flow hash not more symmetric" issue appeared in 4.2 (see #1777). We are currently discussing with Linux network developer and it should be fixed in upcoming version. Backport to stable is not yet discussed but we will raise the issue. On suricata side, I'm currently working on workaround but I fear this should not be available before 3.2 release.

Actions #6

Updated by Andreas Herz almost 8 years ago

  • Assignee set to Eric Leblond
  • Target version set to TBD
Actions #7

Updated by Victor Julien over 7 years ago

Can you retest after applying the suggestions from Packet_Capture?

Actions #8

Updated by WGH WGH over 7 years ago

I'm now on Linux 4.7.1, and it seems that the problem is gone.

As of other suggestions, again, I don't know how to apply them to loopback device.

Actions #9

Updated by Victor Julien over 7 years ago

  • Status changed from New to Closed
  • Target version deleted (TBD)

Loopback probably doesn't use RSS anyway. Thanks for testing!

Actions

Also available in: Atom PDF