Project

General

Profile

Actions

Bug #8442

open
AK AK

capture-bypass: worker timeout of flows causes statistics inconsistencies

Bug #8442: capture-bypass: worker timeout of flows causes statistics inconsistencies

Added by Adam Kiripolsky 3 months ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

Problem:

When a worker times out a capture-bypassed flow, it does not call the necessary functions to update the flow statistics.
Gathering statistics can be a costly operation, as it depends on the BypassUpdate callback implementation (e.g., querying hardware).

Proposed solution:

Forbid workers from timing out capture-bypassed flows and allow only FlowManager to handle their timeouts and updates.


Files

port_any.pcap (4.12 MB) port_any.pcap pcap for reproducibility test #1 Adam Kiripolsky, 06/12/2026 11:29 PM
port_443.pcap (4.12 MB) port_443.pcap pcap for reproducibility test #2 Adam Kiripolsky, 06/12/2026 11:29 PM
suricata-worket-bypass-stats.yml (84.7 KB) suricata-worket-bypass-stats.yml suricata.yaml for reproducibility test Adam Kiripolsky, 06/12/2026 11:35 PM

Subtasks 1 (1 open0 closed)

Bug #8443: capture-bypass: worker timeout of flows causes statistics inconsistencies (8.0.x backport)AssignedAdam KiripolskyActions

OT Updated by OISF Ticketbot 3 months ago Actions #1

  • Subtask #8443 added

OT Updated by OISF Ticketbot 3 months ago Actions #2

  • Label deleted (Needs backport to 8.0)

AK Updated by Adam Kiripolsky 5 days ago Actions #3

Reproducibility test

I used Suricata in AF_PACKET runmode with EBPF bypass.

To see the results more clearly, I have created a test branch: https://github.com/adaki4/suricata/tree/reproduce-wrong-worker-bypass-stats-v1
This branch adds counters for capture-bypassed flows that would be timed out by the function FlowIsTimedOut() and for the number of deletions from the bypass eBPF map.

I have used two pcaps, port_443.pcap and port_any.pcap , both generated by scapy.

  • port_443.pcap contains 1000 TCP flows (each of 10 packets) with different IP addresses, all with port 443.
  • port_any.pcap contains 1000 TCP flows (each of 10 packets) with different IP addresses, all with different ports other than 443.

Suricata rules are in a file drop-443.rules and can look like:

drop tcp any any -> any 443 (msg:"Dropping all HTTPS traffic (port 443)"; bypass; sid:1000004; rev:1;)
drop tcp any 443 -> any any (msg:"Dropping all HTTPS traffic (port 443)"; bypass; sid:1000005; rev:1;)

I have also reduced Suricata's flow table and the hash-size, as configured in the attached suricata.yaml.

I launched Suricata with:

sudo src/suricata -S ./rules/drop-443.rules -c suricata-worket-bypass-stats.yml  -l /tmp/ -vvvv --af-packet

Note: In my setup, the interface I use to replay traffic is mirrored to Suricata's interface.
First, I send to Suricata's running interface the port_443.pcap via tcpreplay.

sudo tcpreplay   -i <if> port443.pcap

When the replay ends, I immediately send port_any.pcap. It is necessary to send the pcap right after the first one, as the timeout for bypassed flows is set to only 20s.

sudo tcpreplay   -i <if> port_any.pcap

After the replay ends, we can shut down Suricata. There are 2 important lines in the cmd log:

Info: af-packet: rules for bypass deleted: x [ReceiveAFPThreadDeinit:source-af-packet.c:2750]
Info: af-packet: capture bypassed flows timeouted by worker: y [ReceiveAFPThreadDeinit:source-af-packet.c:2751]

The x gives us the number of capture-bypassed flows that were deinitialized correctly, e.g. their entry from eBPF map was deleted. The y gives us the number of capture-bypassed flows that were timed out in the function FlowGetFlowFromHash(). These flows won't have their entries deleted from the eBPF map and the statistics from them are not collected, yet the flows are removed from the flow table. x + y gives us the total number of flows that were supposed to be bypassed, that being 1000 in this case.

In the fixed version (e.g. by applying the fix in PR https://github.com/OISF/suricata/pull/15331), y is always 0 and x is 1000, meaning that all flows and their bypass data are being properly deinitialized.

Actions

Also available in: PDF Atom