Feature #6808
openQuantify how a Suricata rule matches against a PCAP
Description
Have Suricata return a rule's content match percentage against a PCAP.
This match percentage could be derived from this Suricon 2022 and 2023 ask -- "visualizing how Suricata rules and their contents and/or pcres match step by step", https://redmine.openinfosecfoundation.org/issues/5666.
In the referenced ticket, there is a suggestion to "...dump the matching steps from the detection engine...logging...content inspection step my step, including matching offsets, etc...". I believe there is a great opportunity to quantify this inspection work!
Consider the following...
- For a given content inspection, set variable total_available_content to length of content array.
- For each inspected content, tag it with IS_MATCHED: [TRUE|FALSE].
- After inspection is completed, set total_matched_content to total count of IS_MATCHED: TRUE.
- Get rule's content match percentage == total_matched_content / total_available_content.
This percentage could be expressed in a summary log line after the visualized story log lines.
Here's an oversimplified draft of the story and percentage lines. Assume there exists a rule with two contents. Only 1/2 of the contents matches the given pcap, xyz.pcap.
"""
Header name: ptr ...
0lx 48 6f 73 74 |Host|
Header value: ...
0lx 6c 6f 63 61 6c 68 6f 73 74 |localhost|
Debug: IS_MATCHED: FALSE. Match failure at offset:4.
Header name: ptr 0x...
0lx 55 73 65 72 2d 41 67 65 6e 74 |User-Agent|
...
Header value: ptr 0x...
0lx 4d 6f 7a 69 6c 6c 61 2f 35 2e 30 20 28 57 69 6e |Mozilla/5.0 (Win|
10lx 64 6f 77 73 20 4e 54 20 31 30 2e 30 3b 20 57 69 |dows NT 10.0; Wi|
20lx 6e 36 34 3b 20 78 36 34 29 20 41 70 70 6c 65 57 |n64; x64) AppleW|
30lx 65 62 4b 69 74 2f 35 33 37 2e 33 36 20 28 4b 48 |ebKit/537.36 (KH|
40lx 54 4d 4c 2c 20 6c 69 6b 65 20 47 65 63 6b 6f 29 |TML, like Gecko)|
50lx 20 43 68 72 6f 6d 65 2f 31 31 35 2e 30 2e 35 37 | Chrome/115.0.57|
60lx 39 30 2e 31 31 30 20 53 61 66 61 72 69 2f 35 33 |90.110 Safari/53|
70lx 37 2e 33 36 |7.36|
Debug: IS_MATCHED: TRUE
Debug: Content Inspection Match Percentage for SID:1234 against xyz.pcap:\t50%.
"""
Caveats
- The rules being evaluated are assumed to satisfy matches on protocol, source IP/port and destination IP/port already. The Content Inspection Match Percentage should only be based on content within the sig group head, (see Example 4 of https://docs.suricata.io/en/latest/configuration/suricata-yaml.html#inspection-configuration).
- This feature should NOT run in a production environment as I assume there's implied high performance and memory behavior. Instead, this feature would be available if Suricata was compiled with a configuration option like, --enable-debug.
After reading https://github.com/OISF/suricata/blob/master/doc/userguide/performance/tuning-considerations.rst, I assume that users would also need to set "profile: high" and update their groups size variables e.g.
custom-values:
toclient-groups: ?
toserver-groups: ?
Use Case:
There exists a PCAP with suspicious traffic but no ET Rules matched against it.
A rule writer compiles Suricata with the configuration option to report rules' content match percentage. After running Suricata against the PCAP, the rule writer gets the visualization log from Suricata as suggested in https://redmine.openinfosecfoundation.org/issues/5666.
The rule writer greps the log for “Content Inspection Match Percentage for SID”, then sorts output by highest percentage first.
This histogram provides insight about what malware nearly resembles the traffic. Reviewing the logs allows them to further review the “rule match story” of these rules deeper. Now, they may consider writing new rules or adjusting old ones.
Updated by Victor Julien 11 months ago
A complicating factor is the recursive scanning nature of content matching when distance/within are used. In content:a; content:b; distance:0; within:1; we might look for each occurrence of a, then check if b is following it, if not go to the next "a" etc until the end of buffer or some other limit is reached.