Project

General

Profile

Bug #8654

Updated by Shane Dugan about 16 hours ago

Issue Summary 
  In an IPS deployment with QUIC enabled, we’re observing severe CPU spikes dominated by QUIC transaction iteration. At scale, this issue pins CPUs to 100%, introduces significant latency, and causes packet drops for our customers, while generating a flood of parser errors. (Note that the parser errors are also present in Suricata 7.0.8, there were no CPU spikes on suricata 7.0.8). The CPU fell back to normal by rolling back to Suricata 7.0.8. 


 Reproduction Steps 
 We have attached a rules file (suricata.rules), a yaml config file(test.yaml), a packet capture file, and the scapy script used to generate that pcap (AI Generated). 

 When we run Suricata 8.0.3 in af_packet mode, and play back the packets in the pcap file with tcpreplay in a continuous loop, we were able to observe the CPU growth in a test environment. We were also able to capture the perf top -p <suricata-pid> to see that the functions consuming high CPU were related to Applayer and QUIC protocol parsing. See the perf output below 
 <pre> 
 +     39.48%    [.] suricata::quic::quic::quic_state_get_tx_iterator                               -        - 
 +     31.30%    [.] AppLayerParserTransactionsCleanup                                              -        - 
 +     12.27%    [.] AppLayerParserGetStateProgress                                                 -        - 
 +      6.92%    [.] AppLayerParserGetTxData                                                        -        - 
 +      3.86%    [.] FlowGetProtoMapping                                                            -        - 
 +      1.28%    [.] suricata::quic::quic::quic_get_tx_data                                         -        - 
 +      0.95%    [.] suricata::rdp::rdp::rdp_tx_get_progress                                        -        - 
 +      0.93%    [.] __pthread_mutex_trylock                                                        -        - 
 +      0.86%    [.] __pthread_mutex_unlock_usercnt                                                 -        - 
      0.14%    [.] HostTimeoutHash                                                                -        - 
      0.11%    [.] AFPReadFromRing                                                                -        - 
      0.08%    [k] audit_filter_syscall.constprop.0.isra.0      
 </pre> 

 We do not see the same CPU profile when we repeat this in Suricata 7.0.8. 

 <pre> 
    6.90%    suricata              [.] AFPReadFromRing 
    5.53%    [kernel]              [k] copy_user_enhanced_fast_string 
    3.81%    libc-2.26.so          [.] __memmove_avx_unaligned_erms 
    3.71%    libpthread-2.26.so    [.] __pthread_mutex_unlock_usercnt 
    3.59%    [vdso]                [.] 0x0000000000000728 
    3.17%    libpthread-2.26.so    [.] __pthread_mutex_trylock 
    2.30%    [kernel]              [k] entry_SYSCALL_64 
    1.74%    [kernel]              [k] _raw_spin_lock_irqsave 
    1.68%    suricata              [.] suricata::quic::quic::QuicState::new_tx 
    1.61%    [kernel]              [k] audit_filter_syscall.constprop.0.isra.0 
    1.46%    suricata              [.] DecodeIPV4 
    1.31%    libpthread-2.26.so    [.] __pthread_disable_asynccancel 
    1.24%    libpthread-2.26.so    [.] __pthread_mutex_lock                                    
 </pre> 

 The high CPU utilization causes impact to our customers, and has hampered progress in our attempts to upgrade to the Suricata 8.0.3. Can you investigate this performance issue in Suricata 8’s QUIC transaction handling with high priority? The fact that the issue was not observed in Suricata 7.0.8, suggests a fix is needed for Suricata 8.0.3. 

 Thank you! 

 Full setup used: 
 <pre> 
 # Setup: create a dummy interface pair 
 ip link add SFE_0_TX type dummy 
 ip link add SFE_0_RX type dummy 
 ip link set SFE_0_TX up mtu 9001 
 ip link set SFE_0_RX up mtu 9001 

 # Terminal 1: run suricata 
 suricata -c suricata.yaml -S rules.rules -k none --af-packet \ 
   --set af-packet.0.interface=SFE_0_TX \ 
   --set af-packet.1.interface=SFE_0_RX \ 
   -l logs/ -vvv 

 # Terminal 2: replay pcap 
 tcpreplay --intf1=SFE_0_TX --mbps=500 --loop=0 quic-cpu-repro.pcap 

 # Terminal 3: profile after 2-3 minutes 
 perf record -g -p $(pgrep suricata) -- sleep 60 
 perf report --no-children --sort=symbol 
 </pre> 

Back