Project

General

Profile

Actions

Optimization #1094

open

Special check for first character of buffer

Added by Ken Steele about 10 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
Effort:
Difficulty:
Label:

Description

I was looking at the 1 character MPM patterns to see how they are used in rules, since they can be bad for MPM matching. I have seen several rules where there is a specific first character matching. I’m wondering if it would be worth having a special test based on just checking the first character of the buffer.

Options:
1) A table indexed by the first character of the buffer, that returns a set of rules that are enabled. If the only content is that first character, the rule would be enabled
2) Use the first character in selecting the Pattern Group. For example, if the first character is one that has an exact match use one group, otherwise a second group.

For example, 2010486, has only one content:|17| with depth:1.

I count 204 rules in a recent install-full set of rules with “grep “depth:1;” | grep –v offset | wc –l” and not commented.

Actions #1

Updated by Ken Steele about 10 years ago

This could also help with rules with content: "xxx" offset:0 depth:N where N = len(xxx) such that the buffer must start with xxx, thus exactly specifying the first character of the buffer.

Actions #2

Updated by Ken Steele about 10 years ago

For larger depths:
2 - 85 rules
3 - 60 rules
4 - 507 rules
5 - 76 rules
6 - 34 rules
7 - 33 rules
8 - 56 rules
9 - 17 rules
10 - 18 rules
11 - 19 rules
12 - 13 rules
13 - 12 rules
14 - 14 rules
15 - 3 rules
16 - 16 rules
17 - 5 rules
18 - 6 rules
19 - 2 rules
20 - 9 rules
21 - 7 rules
22 - 2 rules
23 - 2 rules
24 - 9 rules
25 - 1 rule
26 - 3 rules
27 - 2 rules
28 - 3 rules
29 - 4 rules
30 - 5 rules
31 - 1 rules
32 - 60 rules
33 - 3 rules
34 - 2 rules
35 - 2 rules

Actions #3

Updated by Andreas Herz over 8 years ago

  • Assignee set to OISF Dev
  • Target version set to TBD
Actions

Also available in: Atom PDF