Project

General

Profile

Actions

Bug #6619

closed

Profiling takes much longer to run than it used to

Added by Corey Thomas 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Effort:
Difficulty:
Label:

Description

While trying to profile for something else, I noticed that profiling in 7.x.x+ takes longer compared to 6.x.x builds.

I think the source is this commit
https://github.com/OISF/suricata/commit/b591813b8690753ee4a678f8b10ff1928de1fe15

#ifdef PROFILE_RULES
    /* aggregate statistics */
    if (SCTIME_SECS(p->ts) != det_ctx->rule_perf_last_sync) {
        SCProfilingRuleThreatAggregate(det_ctx);
        det_ctx->rule_perf_last_sync = SCTIME_SECS(p->ts);
    }
#endif

This was done in the QA lab but should be reproducible by reading a PCAP with Suricata.

The slowdown can be anywhere from 2-10x longer. Runs were 2-3x longer when done with our CI but almost 10x longer when ssh'd and running in console, so I'm not what factors were at play there. ssh may be inherently slower as it is due to sending text back over the wire.


Files

icmp.pcap (138 Bytes) icmp.pcap Simeon Miteff, 01/10/2024 01:11 AM

Subtasks 1 (0 open1 closed)

Bug #6635: Profiling takes much longer to run than it used to (7.0.x backport)ClosedVictor JulienActions

Related issues 1 (1 open0 closed)

Related to Suricata - Feature #6550: Capability to have rules profiling on pcap runIn ReviewEric LeblondActions
Actions #1

Updated by Simeon Miteff 5 months ago

Corey Thomas wrote:

While trying to profile for something else, I noticed that profiling in 7.x.x+ takes longer compared to 6.x.x builds.

I think the source is this commit
https://github.com/OISF/suricata/commit/b591813b8690753ee4a678f8b10ff1928de1fe15

This was done in the QA lab but should be reproducible by reading a PCAP with Suricata.

I've had a slowness problem processing a large archive of PCAPs that also affects 6.x.x builds (and possibly older). I found that by reducing the number of threads from the maximum for the (4-socket / 176 hw thread) machine I'm testing on to 4 worker threads achieved the best (but still terrible) performance.

Last week Jeff helped me trace the problem to rules profiling (which I can't turn off because that is the purpose of the PCAP processing). Yesterday I used VTune threading analysis to discover that with 176 worker threads, they are spending almost all their time waiting for the lock in `SCProfilingRuleThreatAggregate()`.

It gets called 33,568 times in 44.712s (every 1.3ms) whereas 1s/176 threads should be every ~5.8ms, so I'm not confident that the 1hz logic in the commit you reference is working as intended.

OTOH, it could be working but with many workers the order-of-10ms frequency drives up the lock contention to result in pathelogical performance. Either way, it might explain why I didn't see any change in 7.x.x+ like you did.

I haven't fully grokked the code but I would personally be happy with per-worker thread rules profiling output (similar to threaded EVE logging) and aggregate them externally (since I already do this over multiple parallel suricata executions).

Actions #2

Updated by Victor Julien 5 months ago

Looks like this is related to how time is managed. In the previous setup, on a fixed interval of wall time the threads would do their profiling processing. In b591813b8690753ee4a678f8b10ff1928de1fe15 this time management was shifted to the packet threads, which use the packet time. In pcaps, packet time can move much faster than wall time, so in these cases the processing is triggered (way) more often.

Actions #3

Updated by Victor Julien 4 months ago

  • Status changed from New to In Review
  • Assignee changed from OISF Dev to Victor Julien
  • Target version changed from TBD to 8.0.0-beta1
  • Label Needs backport to 7.0 added
Actions #4

Updated by OISF Ticketbot 4 months ago

  • Subtask #6635 added
Actions #5

Updated by OISF Ticketbot 4 months ago

  • Label deleted (Needs backport to 7.0)
Actions #6

Updated by Simeon Miteff 4 months ago

I've tested this code and I'm seeing a 5x speedup processing PCAPs, which is excellent, but I think rules profiling is now broken, I'm getting empty output like this:

$ cat rule_perf.log 
{"timestamp":"2024-01-09T02:53:41.345566+0000","sort":"ticks","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345590+0000","sort":"average ticks","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345593+0000","sort":"average ticks (match)","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345595+0000","sort":"average ticks (no match)","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345597+0000","sort":"number of checks","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345598+0000","sort":"number of matches","rules":[]}
{"timestamp":"2024-01-09T02:53:41.345600+0000","sort":"max ticks","rules":[]}
Actions #7

Updated by Victor Julien 4 months ago

  • Status changed from In Review to Resolved
Actions #8

Updated by Simeon Miteff 4 months ago

Victor Julien wrote in #note-7:

https://github.com/OISF/suricata/pull/10134/commits/bcb2b50cfc34430e0e91dea781c90d2259ef8f0d

@Simeon Miteff it works here, can you describe your test?

Built from commit 3df238d541793e8f55185e3cbd85f67bda1484f6 (HEAD, victor/detect-cleanups/v14) with:

./configure --enable-profiling --enable-profiling-locks --enable-profiling-rules --enable-debug

Build info:

This is Suricata version 7.0.0-dev (b06c0579f 2022-09-26)
Features: PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HTTP2_DECOMPRESSION HAVE_LIBJANSSON PROFILING TLS TLS_C11 RUST 
SIMD support: SSE_4_2 SSE_4_1 SSE_3 
Atomic intrinsics: 1 2 4 8 16 byte(s)
64-bits, Little-endian architecture
GCC version 11.2.0, C version 201112
compiled with _FORTIFY_SOURCE=2
L1 cache line size (CLS)=64
thread local storage method: _Thread_local
compiled with LibHTP v0.5.40, linked against LibHTP v0.5.45

Suricata Configuration:
  AF_PACKET support:                       yes
  DPDK support:                            no
  eBPF support:                            no
  XDP support:                             no
  PF_RING support:                         no
  NFQueue support:                         no
  NFLOG support:                           no
  IPFW support:                            no
  Netmap support:                          no 
  DAG enabled:                             no
  Napatech enabled:                        no
  WinDivert enabled:                       no

  Unix socket enabled:                     yes
  Detection enabled:                       yes

  Libmagic support:                        no
  libjansson support:                      yes
  hiredis support:                         no
  hiredis async with libevent:             no
  PCRE jit:                                yes
  LUA support:                             no
  libluajit:                               no
  GeoIP2 support:                          no
  Non-bundled htp:                         yes
  Hyperscan support:                       no
  Libnet support:                          no
  liblz4 support:                          yes
  Landlock support:                        yes

  Rust support:                            yes
  Rust strict mode:                        no
  Rust compiler path:                      /home/simeon/.cargo/bin/rustc
  Rust compiler version:                   rustc 1.61.0 (fe5b13d68 2022-05-18)
  Cargo path:                              /home/simeon/.cargo/bin/cargo
  Cargo version:                           cargo 1.61.0 (a028ae4 2022-04-29)

  Python support:                          yes
  Python path:                             /usr/bin/python3
  Install suricatactl:                     yes
  Install suricatasc:                      yes
  Install suricata-update:                 no, not bundled

  Profiling enabled:                       yes
  Profiling locks enabled:                 no

  Plugin support (experimental):           yes

Development settings:
  Coccinelle / spatch:                     no
  Unit tests enabled:                      no
  Debug output enabled:                    no
  Debug validation enabled:                no
  Fuzz targets enabled:                    no

Generic build parameters:
  Installation prefix:                     /home/simeon/opt/suricata
  Configuration directory:                 /home/simeon/opt/suricata/etc/suricata/
  Log directory:                           /home/simeon/opt/suricata/var/log/suricata/

  --prefix                                 /home/simeon/opt/suricata
  --sysconfdir                             /home/simeon/opt/suricata/etc
  --localstatedir                          /home/simeon/opt/suricata/var
  --datarootdir                            /home/simeon/opt/suricata/share

  Host:                                    x86_64-pc-linux-gnu
  Compiler:                                gcc (exec name) / g++ (real)
  GCC Protect enabled:                     no
  GCC march native enabled:                yes
  GCC Profile enabled:                     no
  Position Independent Executable enabled: no
  CFLAGS                                   -g -O2 -fPIC -std=c11 -march=native -I${srcdir}/../rust/gen -I${srcdir}/../rust/dist
  PCAP_CFLAGS                               -I/usr/include
  SECCFLAGS 

Configuration is the ~/opt/suricata/etc/suricata.yaml from the build above (unmodified) which has rules profiling enabled.

Command line: suricata -vvv -k none --pcap-file-recursive -r ./pcap -S test.rules -c suricata.yaml

The test.rules ruleset:

alert icmp any any -> any any (msg:"Test"; sid:1000000;)

Input PCAP put into the pcap/ directory is attached.

The single rule matches and triggers an alert:

$ head -n1 eve.json | jq .
{
  "timestamp": "2012-04-12T05:49:23.977955+1000",
  "flow_id": 452525811821603,
  "pcap_cnt": 1,
  "event_type": "alert",
  "src_ip": "10.0.0.1",
  "src_port": 0,
  "dest_ip": "74.125.225.99",
  "dest_port": 0,
  "proto": "ICMP",
  "icmp_type": 8,
  "icmp_code": 0,
  "pkt_src": "wire/pcap",
  "alert": {
    "action": "allowed",
    "gid": 1,
    "signature_id": 1000000,
    "rev": 0,
    "signature": "Test",
    "category": "",
    "severity": 3
  },
  "flow": {
    "pkts_toserver": 1,
    "pkts_toclient": 0,
    "bytes_toserver": 98,
    "bytes_toclient": 0,
    "start": "2012-04-12T05:49:23.977955+1000" 
  }
}

Here are the last few lines of log output:

[1369548] 10/1/2024 -- 09:39:01 - (source-pcap-file.c:386) <Notice> (ReceivePcapFileThreadExitStats) -- Pcap-file module read 1 files, 1 packets, 98 bytes
[1369547] 10/1/2024 -- 09:39:01 - (tmqh-flow.c:213) <Perf> (TmqhOutputFlowFreeCtx) -- AutoFP - Total flow handler queues - 12
[1369547] 10/1/2024 -- 09:39:01 - (counters.c:871) <Info> (StatsLogSummary) -- Alerts: 1
[1369547] 10/1/2024 -- 09:39:01 - (ippair.c:296) <Perf> (IPPairPrintStats) -- ippair memory usage: 414144 bytes, maximum: 16777216
[1369547] 10/1/2024 -- 09:39:01 - (util-profiling.c:297) <Perf> (SCProfilingDump) -- Done dumping profiling data.
[1369547] 10/1/2024 -- 09:39:01 - (host.c:299) <Perf> (HostPrintStats) -- host memory usage: 398144 bytes, maximum: 33554432
[1369547] 10/1/2024 -- 09:39:01 - (util-profiling-rules.c:453) <Perf> (SCProfilingRuleDump) -- Dumping profiling data for 1 rules.
[1369547] 10/1/2024 -- 09:39:01 - (util-profiling-rules.c:535) <Perf> (SCProfilingRuleDump) -- Done dumping profiling data.
[1369547] 10/1/2024 -- 09:39:01 - (util-profiling-keywords.c:213) <Perf> (SCProfilingKeywordDump) -- Done dumping keyword profiling data.
[1369547] 10/1/2024 -- 09:39:01 - (util-profiling-rulegroups.c:257) <Perf> (SCProfilingSghDump) -- Done dumping rulegroup profiling data.
[1369547] 10/1/2024 -- 09:39:01 - (detect-engine-build.c:1780) <Info> (SigAddressCleanupStage1) -- cleaning up signature grouping structure... complete

And finally rule_perf.log:

{"timestamp":"2024-01-10T09:39:01.647509+1000","sort":"ticks","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647524+1000","sort":"average ticks","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647529+1000","sort":"average ticks (match)","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647532+1000","sort":"average ticks (no match)","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647537+1000","sort":"number of checks","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647540+1000","sort":"number of matches","rules":[]}
{"timestamp":"2024-01-10T09:39:01.647543+1000","sort":"max ticks","rules":[]}

I did some poking around with GDB, and I can confirm:
  • The aggregate/merge functions do run both from the code you changed and on thread shutdown, so it appears the new wall-clock timing based logic is OK
  • profiling_rules_enabled is on, so I think it should not be empty
  • the rules profiling counters don't appear to be incremented, as if the RULE_PROFILING_START/END macro isn't being called

Unfortunately I couldn't get to the bottom of it with GDB.

Before I reproduced it with this minimal setup I ran the code on 1/2 TB of PCAPs with a full ETPro ruleset, and got the same result so I don't think it is ruleset/traffic related.

Actions #9

Updated by Victor Julien 4 months ago

  • Related to Feature #6550: Capability to have rules profiling on pcap run added
Actions #10

Updated by Jeff Lucovsky 3 months ago

@Simeon Miteff If might be related to issue 6550 (linked to this ticket). Can you check if adding that PR's changes help?

Actions #11

Updated by Victor Julien 3 months ago

  • Status changed from Resolved to Closed
Actions #12

Updated by Simeon Miteff 3 months ago

Jeff Lucovsky wrote in #note-10:

@Simeon Miteff If might be related to issue 6550 (linked to this ticket). Can you check if adding that PR's changes help?

Yes this looked super promising but I tried it and get the same result. I left a comment on the PR here: https://github.com/OISF/suricata/pull/9755#issuecomment-1892891926

Actions #13

Updated by Simeon Miteff 3 months ago

FYI after straightening out some issues with the way I was configuring the build for profiling I was able to make the combination of Victor and Eric's branches work for me.

Actions

Also available in: Atom PDF