Project

General

Profile

Bug #8667 » bug-report.textile

Shane Dugan, 06/17/2026 04:15 PM

 

Summary

Commit "af-packet: speed up thread sync during startup" (923ad6af, introduced in 8.0.0) moved AFPPeersListReachedInc() from after AFPCreateSocket() returns to immediately after bind(), before AFPSetupRing() and AFPSwitchState(AFP_STATE_UP). A worker thread released by AFPSynchronizeStart() (which spins until peerslist.turn 0) can now call AFPWritePacket() while its peer's socket fd is still 0 (zero-initialized). sendto(0, ...) returns ENOTSOCK, producing:

SFE_N_RX: sending packet failed on socket 0: Socket operation on non-socket

The send_errors rate-limit silences all subsequent warnings. Every forwarded packet on that peer is silently dropped forever. The engine never self-recovers — AFPTryReopen is triggered by read-side failures only.

Affected versions

  • Suricata 8.0.0 and later (any version containing commit 923ad6af)
  • Not present in Suricata 7.0.x — in 7.x, AFPPeersListReachedInc() was called in ReceiveAFPLoop after AFPCreateSocket() returned, i.e. after AFPSwitchState(AFP_STATE_UP) had already published the fd.

Root cause

AFPPeerUpdate() — the only site that publishes the peer socket fd — writes two atomics in this fixed order:

SC_ATOMIC_SET(ptv->mpeer->socket, ptv->socket);   /* written first  */
SC_ATOMIC_SET(ptv->mpeer->state,  ptv->afp_state); /* written second */

It is called only from AFPSwitchState(ptv, AFP_STATE_UP), which runs at the end of AFPCreateSocket(), after AFPSetupRing(). In Suricata 8, AFPPeersListReachedInc() runs right after bind(), so turn 0 now means "all peers have bind()-ed", not "all peer fds are published". AFPWritePacket() has no peer->state guard in either version and has always relied on the barrier to imply readiness. That implication is broken in 8.

Trigger conditions

  • Cold restart onlyAFPTryReopen passes peer_update = false, so the barrier never re-runs on a hot reopen. Any forced cold restart suffices: policy change that alters the config fingerprint, host patching/AMI refresh, etc.
  • Timing-racy — the window is the interval between barrier release and AFPSwitchState(AFP_STATE_UP) on the slowest peer. Long uptime with memory fragmentation widens the window by slowing the mmap in AFPSetupRing().

Observed symptoms

  • Runtime warning (often only one line per wedged peer due to rate-limiting): SFE_N_RX: sending packet failed on socket 0: Socket operation on non-socket
  • TX-ok / RX-dead asymmetry at stats: SFE_N_TX: packets: <large> while SFE_N_RX: packets: 0
  • Startup log sequence identical to a healthy start — AF_PACKET IPS mode activated and Engine started. both appear normally; the warning fires between them

Reproduction results

Reproduced on Linux with Suricata 8.0.3, 6 AF-PACKET interface pairs (dummy interfaces, copy-mode: ips, runmode workers, 2 threads per pair), and SURICATA_RING_SETUP_DELAY_US=500000 (env-gated widener holding the existing window open — see attached widener.patch):

Metric Value
Restart cycles 10
Cycles reaching "Engine started" 8
Cycles with ENOTSOCK 7 (87.5%)
Total ENOTSOCK lines 28

TX/RX asymmetry on a wedged engine (last cycle):

Interface Packets
SFE_0_TX 3,727,852
SFE_0_RX 279
SFE_1_TX 19,696,196
SFE_1_RX 279
SFE_4_TX 19,874,739
SFE_4_RX 279
SFE_5_TX 36,548,604
SFE_5_RX 558

ENOTSOCK lines land at the same second as or 1 second before "Engine started" — confirming the race fires inside the startup window.

Proposed fix

Add a peer-state guard at the top of AFPWritePacket() in src/source-af-packet.c, before the socket fd is read:

if (SC_ATOMIC_GET(p->afp_v.peer->state) != AFP_STATE_UP) {
    return;  /* peer fd not yet published — drop cleanly during startup window */
}

Because AFPPeerUpdate() always writes socket before state, observing AFP_STATE_UP guarantees the fd is valid. This is race-free. It preserves the parallel AFPSetupRing() optimization introduced by the original commit.

Fix verification: 10/10 restart cycles produced zero ENOTSOCK lines with the fix applied and the 500ms widener still active.

Attachments

  • reproduce.sh — self-contained reproduction script (Linux, Suricata 8 binary + tcpreplay required)
  • widener.patch — env-gated usleep in AFPSetupRing() for deterministic reproduction; no-op unless SURICATA_RING_SETUP_DELAY_US is set
  • README.md — full technical writeup including race timeline and Fix A vs Fix B comparison
(2-2/4)