Bug #8654
Updated by Shane Dugan about 16 hours ago
Issue Summary In an IPS deployment with QUIC enabled, we’re observing severe CPU spikes dominated by QUIC transaction iteration. At scale, this issue pins CPUs to 100%, introduces significant latency, and causes packet drops for our customers, while generating a flood of parser errors. (Note that the parser errors are also present in Suricata 7.0.8, there were no CPU spikes on suricata 7.0.8). The CPU fell back to normal by rolling back to Suricata 7.0.8. Reproduction Steps We have attached a rules file (suricata.rules), a yaml config file(test.yaml), a packet capture file, and the scapy script used to generate that pcap (AI Generated). When we run Suricata 8.0.3 in af_packet mode, and play back the packets in the pcap file with tcpreplay in a continuous loop, we were able to observe the CPU growth in a test environment. We were also able to capture the perf top -p <suricata-pid> to see that the functions consuming high CPU were related to Applayer and QUIC protocol parsing. See the perf output below <pre> + 39.48% [.] suricata::quic::quic::quic_state_get_tx_iterator - - + 31.30% [.] AppLayerParserTransactionsCleanup - - + 12.27% [.] AppLayerParserGetStateProgress - - + 6.92% [.] AppLayerParserGetTxData - - + 3.86% [.] FlowGetProtoMapping - - + 1.28% [.] suricata::quic::quic::quic_get_tx_data - - + 0.95% [.] suricata::rdp::rdp::rdp_tx_get_progress - - + 0.93% [.] __pthread_mutex_trylock - - + 0.86% [.] __pthread_mutex_unlock_usercnt - - 0.14% [.] HostTimeoutHash - - 0.11% [.] AFPReadFromRing - - 0.08% [k] audit_filter_syscall.constprop.0.isra.0 </pre> We do not see the same CPU profile when we repeat this in Suricata 7.0.8. <pre> 6.90% suricata [.] AFPReadFromRing 5.53% [kernel] [k] copy_user_enhanced_fast_string 3.81% libc-2.26.so [.] __memmove_avx_unaligned_erms 3.71% libpthread-2.26.so [.] __pthread_mutex_unlock_usercnt 3.59% [vdso] [.] 0x0000000000000728 3.17% libpthread-2.26.so [.] __pthread_mutex_trylock 2.30% [kernel] [k] entry_SYSCALL_64 1.74% [kernel] [k] _raw_spin_lock_irqsave 1.68% suricata [.] suricata::quic::quic::QuicState::new_tx 1.61% [kernel] [k] audit_filter_syscall.constprop.0.isra.0 1.46% suricata [.] DecodeIPV4 1.31% libpthread-2.26.so [.] __pthread_disable_asynccancel 1.24% libpthread-2.26.so [.] __pthread_mutex_lock </pre> The high CPU utilization causes impact to our customers, and has hampered progress in our attempts to upgrade to the Suricata 8.0.3. Can you investigate this performance issue in Suricata 8’s QUIC transaction handling with high priority? The fact that the issue was not observed in Suricata 7.0.8, suggests a fix is needed for Suricata 8.0.3. Thank you! Full setup used: <pre> # Setup: create a dummy interface pair ip link add SFE_0_TX type dummy ip link add SFE_0_RX type dummy ip link set SFE_0_TX up mtu 9001 ip link set SFE_0_RX up mtu 9001 # Terminal 1: run suricata suricata -c suricata.yaml -S rules.rules -k none --af-packet \ --set af-packet.0.interface=SFE_0_TX \ --set af-packet.1.interface=SFE_0_RX \ -l logs/ -vvv # Terminal 2: replay pcap tcpreplay --intf1=SFE_0_TX --mbps=500 --loop=0 quic-cpu-repro.pcap # Terminal 3: profile after 2-3 minutes perf record -g -p $(pgrep suricata) -- sleep 60 perf report --no-children --sort=symbol </pre>