Feature #4226
openbsize: apply as depth to patterns
Description
When reviewing rule profiling output of comparing the speed of bsize:x; checks to using depth:x; isdataat:!1,relative; in effort to achieve an "exact match" of a buffer, it has been observed that bsize variants of rules often consume considerably more ticks than alternative methods.
Tested on 7.0.0-dev (372fc2673 2020-12-11) with the default suricata.yaml, with minor adjustments for non-json output of the rule profiling output.
command used for testing
suricata -c suricata.yaml -S test.rules -r a564fbcf-e41d-494c-9ad8-7e44e22a03d9.pcap -l /tmp/perf_test/perf_1/
pcap from https://app.any.run/tasks/a564fbcf-e41d-494c-9ad8-7e44e22a03d9/#
test signatures
alert dns any any -> any any (msg:"bsize test"; dns.query; content:"yundol0727.kro.kr"; bsize:17; sid:1;) alert dns any any -> any any (msg:"depth and isdataat test"; dns.query; content:"yundol0727.kro.kr"; depth:17; isdataat:!1,relative; sid:2;) alert dns any any -> any any (msg:"depth and endswith test"; dns.query; content:"yundol0727.kro.kr"; depth:17; endswith; sid:3;)
It's been observed on other buffers as well, just using dns.query as a quick example.
I ran the above signatures through on the same pcap 3 times and have recorded the output of the
First Run
-------------------------------------------------------------------------- Date: 12/18/2020 -- 04:12:47. Sorted by: ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 1 1 0 45045 66.10 1 1 45045 45045.00 45045.00 0.00 2 2 1 0 12123 17.79 1 1 12123 12123.00 12123.00 0.00 3 3 1 0 10983 16.12 1 1 10983 10983.00 10983.00 0.00
Second Run
-------------------------------------------------------------------------- Date: 12/18/2020 -- 04:17:23. Sorted by: ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 1 1 0 26637 53.87 1 1 26637 26637.00 26637.00 0.00 2 2 1 0 11772 23.81 1 1 11772 11772.00 11772.00 0.00 3 3 1 0 11040 22.33 1 1 11040 11040.00 11040.00 0.00
Third Run
-------------------------------------------------------------------------- Date: 12/18/2020 -- 04:18:08. Sorted by: ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 1 1 0 27219 53.34 1 1 27219 27219.00 27219.00 0.00 2 2 1 0 12696 24.88 1 1 12696 12696.00 12696.00 0.00 3 3 1 0 11115 21.78 1 1 11115 11115.00 11115.00 0.00
Build Info
uricata --build-info This is Suricata version 7.0.0-dev (372fc2673 2020-12-11) Features: PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LUA HAVE_LUAJIT HAVE_LIBJANSSON PROFILING TLS TLS_C11 MAGIC RUST SIMD support: SSE_4_2 SSE_4_1 SSE_3 Atomic intrinsics: 1 2 4 8 16 byte(s) 64-bits, Little-endian architecture GCC version 7.5.0, C version 201112 compiled with _FORTIFY_SOURCE=2 L1 cache line size (CLS)=64 thread local storage method: _Thread_local compiled with LibHTP v0.5.36, linked against LibHTP v0.5.36 Suricata Configuration: AF_PACKET support: yes eBPF support: no XDP support: no PF_RING support: no NFQueue support: no NFLOG support: no IPFW support: no Netmap support: no DAG enabled: no Napatech enabled: no WinDivert enabled: no Unix socket enabled: yes Detection enabled: yes Libmagic support: yes libnss support: yes libnspr support: yes libjansson support: yes hiredis support: no hiredis async with libevent: no Prelude support: no PCRE jit: yes LUA support: yes, through luajit libluajit: yes GeoIP2 support: no Non-bundled htp: no Hyperscan support: no Libnet support: yes liblz4 support: yes Rust support: yes Rust strict mode: no Rust compiler path: /usr/bin/rustc Rust compiler version: rustc 1.43.0 Cargo path: /usr/bin/cargo Cargo version: cargo 1.43.0 Cargo vendor: yes Python support: no Python path: not set Python distutils no Python yaml no Install suricatactl: requires python Install suricatasc: requires python Install suricata-update: not bundled Profiling enabled: yes Profiling locks enabled: no Plugin support (experimental): yes Development settings: Coccinelle / spatch: no Unit tests enabled: no Debug output enabled: no Debug validation enabled: no Generic build parameters: Installation prefix: /usr/local Configuration directory: /usr/local/etc/suricata/ Log directory: /usr/local/var/log/suricata/ --prefix /usr/local --sysconfdir /usr/local/etc --localstatedir /usr/local/var --datarootdir /usr/local/share Host: x86_64-pc-linux-gnu Compiler: gcc (exec name) / g++ (real) GCC Protect enabled: no GCC march native enabled: yes GCC Profile enabled: no Position Independent Executable enabled: no CFLAGS -g -O2 -std=c11 -march=native -I${srcdir}/../rust/gen -I${srcdir}/../rust/dist PCAP_CFLAGS -I/usr/include SECCFLAGS
Files
Updated by Victor Julien almost 3 years ago
- Tracker changed from Bug to Feature
- Subject changed from bsize is considerably slower than depth:x; isdataat:!1,relative to bsize: apply as depth to patterns
- Status changed from New to Assigned
- Assignee set to Jeff Lucovsky
- Target version set to 7.0.0-beta1
- Affected Versions deleted (
git master) - Label Needs backport added
If bsize setting is the exact length of a pattern, apply startwith
/endswith
logic. Otherwise, apply it as depth
. For dsize
(and urilen
?) we already do this IIRC.
I think this could be backported as well. Technically its not a bug, but it would be a virtually "free" optimization that should be low risk.
Updated by Victor Julien about 2 years ago
- Target version changed from 7.0.0-beta1 to 8.0.0-beta1
Updated by Brandon Murphy almost 2 years ago
Recently discovered that urilen is much faster than bsize applied to http.uri as well.
I'm not 100% sure if this feature, when implemented, would "solve" that.
urilen:9<>17
vs http.uri; bsize:9<>17;
yielded drastically different and unexpected results in favor of urilen.
Using urilen even seems to impact the number of checks a rule experiences. Is it possible that urilen supports some prefilter element that bsize when applied to http.uri does not?
Feel free to let me know if this should be a separate request - it just seems like it might be related.
Updated by Jeff Lucovsky almost 2 years ago
These do seem related so let's not create a new issue yet.
Updated by Jeff Lucovsky almost 2 years ago
If bsize setting is the exact length of a pattern, apply startwith/endswith logic. Otherwise, apply it as depth. For dsize (and urilen?) we already do this IIRC.
Is this due to the logic in DetectUrilenApplyToContent
?
Updated by Victor Julien about 1 year ago
- Related to Optimization #6375: detect: merge urilen and bsize implementations added
Updated by Jeff Lucovsky 8 months ago
- Related to Bug #6894: bsize validation FP on content negation with hex encoded 0d 0a added
Updated by Jeff Lucovsky 8 months ago
I've been investigating this and I think that the profiling is attributing too many ticks to the first rule.
I've modified the sids so that the rule using bsize is sid 35 -- from suricata master:
-------------------------------------------------------------------------- Date: 4/7/2024 -- 09:57:27. Sorted by: max ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 15 1 0 83365 82.15 1 1 83365 83365.00 83365.00 0.00 2 35 1 0 7501 7.39 1 1 7501 7501.00 7501.00 0.00 3 20 1 0 6926 6.83 1 1 6926 6926.00 6926.00 0.00 4 30 1 0 3682 3.63 1 1 3682 3682.00 3682.00 0.00
I added sid 15 -- only dns.query and content:
alert dns any any -> any any (msg:"w/out bsize test"; dns.query; content:"yundol0727.kro.kr"; sid:15;) alert dns any any -> any any (msg:"depth and isdataat test"; dns.query; content:"yundol0727.kro.kr"; depth:17; isdataat:!1,relative; sid:20;) alert dns any any -> any any (msg:"depth and endswith test"; dns.query; content:"yundol0727.kro.kr"; depth:17; endswith; sid:30;) alert dns any any -> any any (msg:"bsize test"; dns.query; content:"yundol0727.kro.kr"; bsize:17; sid:35;)
With modifications to bsize that make it a content modifier under certain circumstances (sid 35):
-------------------------------------------------------------------------- Date: 4/7/2024 -- 10:02:41. Sorted by: max ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 15 1 0 43219 68.17 1 1 43219 43219.00 43219.00 0.00 2 20 1 0 7700 12.15 1 1 7700 7700.00 7700.00 0.00 3 35 1 0 6618 10.44 1 1 6618 6618.00 6618.00 0.00 4 30 1 0 5860 9.24 1 1 5860 5860.00 5860.00 0.00
Profiling using the same ruleset without sid 15
-------------------------------------------------------------------------- Date: 4/7/2024 -- 10:14:31. Sorted by: max ticks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 20 1 0 103625 90.32 1 1 103625 103625.00 103625.00 0.00 2 30 1 0 6522 5.68 1 1 6522 6522.00 6522.00 0.00 3 35 1 0 4588 4.00 1 1 4588 4588.00 4588.00 0.00
I am continuing to investigate but it appears that the first sid (numerical order) may have extra ticks associated with it.
Updated by Brandon Murphy 8 months ago
FWIW, we have observed similar "oddness" within the profiler since the creation of this ticket. We have noticed it enough that at this point we often individually test each rule when comparing variations to determine the fastest method.
I just tested these two rules within our QA environment
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"UA depth/isdataat test"; flow:established,to_server; http.user_agent; content:"Mozilla"; depth:7; isdataat:!1,relative; sid:1; classtype:unknown; rev:1;) alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"UA bsize test"; flow:established,to_server; http.user_agent; bsize:7; content:"Mozilla"; sid:1; classtype:unknown; rev:1;)
The total ticks were within ~5% of each other when tested individually, which seems an acceptable margin of error between runs given the scale of total ticks (9.4bn for bsize and and 8.9bn for depth/isdata)
I also went back and tested the urilen vs bsize applied to http.uri and tested them individually.
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"bsize on http.uri"; flow:established,to_server; http.uri; bsize:17; content:"/js/jquery.min.js"; sid:1; classtype:unknown; rev:1;) alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"urilen no bsize"; flow:established,to_server; urilen:17; http.uri; content:"/js/jquery.min.js"; sid:1; classtype:unknown; rev:1;) alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"urilen with bsize"; flow:established,to_server; urilen:17; http.uri; bsize:17; content:"/js/jquery.min.js"; sid:1; classtype:unknown; rev:1;)
urilen had considerably less checks (334 vs 1422), though the avg ticks is actually lower with only the bsize 19040 vs 27k/28k when using urilen. total ticks were in favor of urilen, mostly contributed to the decreased number of checks.
I will note though, that in this specific case where the string being searched for is common within the buffer, but uncommon when is the exact content of the buffer, it is often MUCH faster to use a more generic buffer and matching on a more complete string, than it is to use a more specific buffer.
consider this rule, which had about 4million less checks than the other two above and total ticks were a ~98% reduction.
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"UA header test"; flow:established,to_server; http.header; content:"User-Agent|3a 20|Mozilla|0d 0a|"; sid:1; classtype:unknown; rev:1;)
----
All of that completed, I do believe the following questions still stand. Perhaps we need another ticket to investigate profiler accuracy?
1. Can you confirm that the optimization that Victor referenced is already being applied when bsize is used?
If bsize setting is the exact length of a pattern, apply startwith/endswith logic. Otherwise, apply it as depth. For dsize (and urilen?) we already do this IIRC.
2. have you been able to find anything that would explain why urilen impacts the number of checks and speed compared to bsize.
Updated by Victor Julien 8 months ago
I think the first entry being seemly extra expensive is probably just a start up cost. Cold data and/or instruction caches, a memory lookup, that kind of thing. Reordering rules moves the cost to the first rule being evaluated.
Updated by Jeff Lucovsky 8 months ago
"All of that completed, I do believe the following questions still stand. Perhaps we need another ticket to investigate profiler accuracy?
1. Can you confirm that the optimization that Victor referenced is already being applied when bsize is used?
If bsize setting is the exact length of a pattern, apply startwith/endswith logic. Otherwise, apply it as depth. For dsize (and urilen?) we already do this IIRC.
2. have you been able to find anything that would explain why urilen impacts the number of checks and speed compared to bsize."
The bsize optimization is not yet applied in master/main-7.0.x
Do you have a pcap the urilen rule example will trigger on?
Updated by Brandon Murphy 8 months ago
Jeff Lucovsky wrote in #note-11:
If bsize setting is the exact length of a pattern, apply startwith/endswith logic. Otherwise, apply it as depth. For dsize (and urilen?) we already do this IIRC.
Sorry, just to be clear, this is currently happening with bsize in master/7.0.x?
Updated by Brandon Murphy 8 months ago
Jeff Lucovsky wrote in #note-11:
Do you have a pcap the urilen rule example will trigger on?
I do! Took me awhile to find one, but I was able to.
*Edit - I should say, it shows the condition, it does not alert on the example rules.
Using the 3 rules above and enabling rule profiling, the following result snippet shows that only the rule without urilen had checks (i did have to renumber the sids)
-------------------------------------------------------------------------- Date: 4/9/2024 -- 15:27:28. Sorted by: number of checks. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 1 1 1 26541 100.00 1 0 26541 26541.00 0.00 26541.00 -------------------------------------------------------------------------- Date: 4/9/2024 -- 15:27:28. Sorted by: number of matches. -------------------------------------------------------------------------- Num Rule Gid Rev Ticks % Checks Matches Max Ticks Avg Ticks Avg Match Avg No Match -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 1 1 1 1 26541 100.00 1 0 26541 26541.00 0.00 26541.00
Updated by Jeff Lucovsky 8 months ago
Thanks; the urilen is 42; with that change, all 3 rules are evaluated.
Updated by Brandon Murphy 8 months ago
Jeff Lucovsky wrote in #note-15:
Thanks; the urilen is 42; with that change, all 3 rules are evaluated.
Correct, to that pcap demonstrates that there is some prefilter logic within urilen that, despite the fast_pattern being the same in all the rules, prevents the rules from being evaluated when the urilen condition is not met.