Bug #8667 » bug-report.textile
Summary
Commit "af-packet: speed up thread sync during startup" (923ad6af, introduced in 8.0.0) moved AFPPeersListReachedInc() from after AFPCreateSocket() returns to immediately after bind(), before AFPSetupRing() and AFPSwitchState(AFP_STATE_UP). A worker thread released by AFPSynchronizeStart() (which spins until peerslist.turn 0) can now call AFPWritePacket() while its peer's socket fd is still 0 (zero-initialized). sendto(0, ...) returns ENOTSOCK, producing:
SFE_N_RX: sending packet failed on socket 0: Socket operation on non-socket
The send_errors rate-limit silences all subsequent warnings. Every forwarded packet on that peer is silently dropped forever. The engine never self-recovers — AFPTryReopen is triggered by read-side failures only.
Affected versions
- Suricata 8.0.0 and later (any version containing commit 923ad6af)
- Not present in Suricata 7.0.x — in 7.x,
AFPPeersListReachedInc()was called inReceiveAFPLoopafterAFPCreateSocket()returned, i.e. afterAFPSwitchState(AFP_STATE_UP)had already published the fd.
Root cause
AFPPeerUpdate() — the only site that publishes the peer socket fd — writes two atomics in this fixed order:
SC_ATOMIC_SET(ptv->mpeer->socket, ptv->socket); /* written first */ SC_ATOMIC_SET(ptv->mpeer->state, ptv->afp_state); /* written second */
It is called only from AFPSwitchState(ptv, AFP_STATE_UP), which runs at the end of AFPCreateSocket(), after AFPSetupRing(). In Suricata 8, AFPPeersListReachedInc() runs right after bind(), so turn 0 now means "all peers have bind()-ed", not "all peer fds are published". AFPWritePacket() has no peer->state guard in either version and has always relied on the barrier to imply readiness. That implication is broken in 8.
Trigger conditions
- Cold restart only —
AFPTryReopenpassespeer_update = false, so the barrier never re-runs on a hot reopen. Any forced cold restart suffices: policy change that alters the config fingerprint, host patching/AMI refresh, etc. - Timing-racy — the window is the interval between barrier release and
AFPSwitchState(AFP_STATE_UP)on the slowest peer. Long uptime with memory fragmentation widens the window by slowing themmapinAFPSetupRing().
Observed symptoms
- Runtime warning (often only one line per wedged peer due to rate-limiting):
SFE_N_RX: sending packet failed on socket 0: Socket operation on non-socket - TX-ok / RX-dead asymmetry at stats:
SFE_N_TX: packets: <large>whileSFE_N_RX: packets: 0 - Startup log sequence identical to a healthy start —
AF_PACKET IPS mode activatedandEngine started.both appear normally; the warning fires between them
Reproduction results
Reproduced on Linux with Suricata 8.0.3, 6 AF-PACKET interface pairs (dummy interfaces, copy-mode: ips, runmode workers, 2 threads per pair), and SURICATA_RING_SETUP_DELAY_US=500000 (env-gated widener holding the existing window open — see attached widener.patch):
| Metric | Value |
|---|---|
| Restart cycles | 10 |
| Cycles reaching "Engine started" | 8 |
| Cycles with ENOTSOCK | 7 (87.5%) |
| Total ENOTSOCK lines | 28 |
TX/RX asymmetry on a wedged engine (last cycle):
| Interface | Packets |
|---|---|
| SFE_0_TX | 3,727,852 |
| SFE_0_RX | 279 |
| SFE_1_TX | 19,696,196 |
| SFE_1_RX | 279 |
| SFE_4_TX | 19,874,739 |
| SFE_4_RX | 279 |
| SFE_5_TX | 36,548,604 |
| SFE_5_RX | 558 |
ENOTSOCK lines land at the same second as or 1 second before "Engine started" — confirming the race fires inside the startup window.
Proposed fix
Add a peer-state guard at the top of AFPWritePacket() in src/source-af-packet.c, before the socket fd is read:
if (SC_ATOMIC_GET(p->afp_v.peer->state) != AFP_STATE_UP) {
return; /* peer fd not yet published — drop cleanly during startup window */
}
Because AFPPeerUpdate() always writes socket before state, observing AFP_STATE_UP guarantees the fd is valid. This is race-free. It preserves the parallel AFPSetupRing() optimization introduced by the original commit.
Fix verification: 10/10 restart cycles produced zero ENOTSOCK lines with the fix applied and the 500ms widener still active.
Attachments
reproduce.sh— self-contained reproduction script (Linux, Suricata 8 binary + tcpreplay required)widener.patch— env-gatedusleepinAFPSetupRing()for deterministic reproduction; no-op unlessSURICATA_RING_SETUP_DELAY_USis setREADME.md— full technical writeup including race timeline and Fix A vs Fix B comparison