Project

General

Profile


This documentation is no longer maintained and exists for historical purposes. The current documentation is located at http://suricata.readthedocs.io/.


Suricata Fast Pattern Determination Explained

If the 'fast_pattern' keyword is explicitly set in a rule, Suricata will use that as the fast pattern match. The 'fast_pattern' keyword can only be set once per rule. If 'fast_pattern' is not set, Suricata automatically determines the content to use as the fast pattern match.

The following explains the logic Suricata uses to automatically determine the fast pattern match to use.

Be aware that if there are positive (i.e. non-negated) content matches, then negated content matches are ignored for fast pattern determination. Otherwise, negated content matches are considered.

Suricata 1.1.x - 1.4.x

  1. The longest (in terms of character/byte length) content match is used as the fast pattern match.
  2. If multiple content matches qualify for the longest length, the one with the highest character/byte diversity score ("Pattern Strength") is used as the fast pattern match. See Appendix C for details on the algorithm used to determine Pattern Strength.
  3. If multiple content matches qualify for the longest length and have the same highest Pattern Strength, the buffer that has the lower "list_id" is used as the fast pattern match. See Appendix A for the list_id of each buffers/list.
  4. If multiple content matches qualify for the longest length and have the same highest Pattern Strength, and have the same list_id (i.e. are looking in the same buffer), then the one that comes first (from left-to-right) in the rule is used as the fast pattern match.

It is worth noting that for content matches that have the same length and Pattern Strength, regular 'content' matches take precedence over matches that use the 'http_*' buffers.

Suricata 2.0.x

  1. Suricata first identifies all content matches that have the highest "priority" that are used in the signature. The priority is based off of the buffer being matched on and generally 'http_*' buffers have a higher priority (lower number is higher priority). See Appendix B for details on which buffers have what priority.
  2. Within the content matches identified in step 1 (the highest priority content matches), the longest (in terms of character/byte length) content match is used as the fast pattern match.
  3. If multiple content matches have the same highest priority and qualify for the longest length, the one with the highest character/byte diversity score ("Pattern Strength") is used as the fast pattern match. See Appendix C for details on the algorithm used to determine Pattern Strength.
  4. If multiple content matches have the same highest priority, qualify for the longest length, and the same highest Pattern Strength, the buffer ("list_id") that was registered last is used as the fast pattern match. See Appendix B for the registration order of the different buffers/lists.
  5. If multiple content matches have the same highest priority, qualify for the longest length, the same highest Pattern Strength, and have the same list_id (i.e. are looking in the same buffer), then the one that comes first (from left-to-right) in the rule is used as the fast pattern match.

It is worth noting that for content matches that have the same priority, length, and Pattern Strength, 'http_stat_msg', 'http_stat_code', and 'http_method' take precedence over regular 'content' matches.

Appendices

Appendix A - Buffers, list_id values, and Registration Order for Suricata 1.3.4

This should be pretty much the same for Suricata 1.1.x - 1.4.x.

list_id Content Modifier Keyword Buffer Name Registration Order
1 <none> (regular content match) DETECT_SM_LIST_PMATCH 1 (first)
2 http_uri DETECT_SM_LIST_UMATCH 2
6 http_client_body DETECT_SM_LIST_HCBDMATCH 3
7 http_server_body DETECT_SM_LIST_HSBDMATCH 4
8 http_header DETECT_SM_LIST_HHDMATCH 5
9 http_raw_header DETECT_SM_LIST_HRHDMATCH 6
10 http_method DETECT_SM_LIST_HMDMATCH 7
11 http_cookie DETECT_SM_LIST_HCDMATCH 8
12 http_raw_uri DETECT_SM_LIST_HRUDMATCH 9
13 http_stat_msg DETECT_SM_LIST_HSMDMATCH 10
14 http_stat_code DETECT_SM_LIST_HSCDMATCH 11
15 http_user_agent DETECT_SM_LIST_HUADMATCH 12 (last)

Note: registration order doesn't matter when it comes to determining the fast pattern match for Suricata 1.3.4 but list_id value does.

Appendix B - Buffers, list_id values, Priorities, and Registration Order for Suricata 2.0.7

This should be pretty much the same for Suricata 2.0.x.

Priority (lower number is higher priority) Registration Order Content Modifier Keyword Buffer Name list_id
3 11 <none> (regular content match) DETECT_SM_LIST_PMATCH 1
3 12 http_method DETECT_SM_LIST_HMDMATCH 12
3 13 http_stat_code DETECT_SM_LIST_HSCDMATCH 9
3 14 http_stat_msg DETECT_SM_LIST_HSMDMATCH 8
2 1 (first) http_client_body DETECT_SM_LIST_HCBDMATCH 4
2 2 http_server_body DETECT_SM_LIST_HSBDMATCH 5
2 3 http_header DETECT_SM_LIST_HHDMATCH 6
2 4 http_raw_header DETECT_SM_LIST_HRHDMATCH 7
2 5 http_uri DETECT_SM_LIST_UMATCH 2
2 6 http_raw_uri DETECT_SM_LIST_HRUDMATCH 3
2 7 http_host DETECT_SM_LIST_HHHDMATCH 10
2 8 http_raw_host DETECT_SM_LIST_HRHHDMATCH 11
2 9 http_cookie DETECT_SM_LIST_HCDMATCH 13
2 10 http_user_agent DETECT_SM_LIST_HUADMATCH 14
2 15 (last) dns_query DETECT_SM_LIST_DNSQUERY_MATCH 20

Note: list_id value doesn't matter when it comes to determining the fast pattern match for Suricata 2.0.7 but registration order does.

Appendix C - Pattern Strength Algorithm

From detect-engine-mpm.c. Basically the Pattern Strength "score" starts at zero and looks at each character/byte in the passed in byte array from left to right. If the character/byte has not been seen before in the array, it adds 3 to the score if it is an alpha character; else it adds 4 to the score if it is a printable character, 0x00, 0x01, or 0xFF; else it adds 6 to the score. If the character/byte has been seen before it adds 1 to the score. The final score is returned.

/** \brief Predict a strength value for patterns
 *
 *  Patterns with high character diversity score higher.
 *  Alpha chars score not so high
 *  Other printable + a few common codes a little higher
 *  Everything else highest.
 *  Longer patterns score better than short patters.
 *
 *  \param pat pattern
 *  \param patlen length of the patternn
 *
 *  \retval s pattern score
 */
uint32_t PatternStrength(uint8_t *pat, uint16_t patlen) {
    uint8_t a[256];
    memset(&a, 0 ,sizeof(a));
    uint32_t s = 0;
    uint16_t u = 0;
    for (u = 0; u < patlen; u++) {
        if (a[pat[u]] == 0) {
            if (isalpha(pat[u]))
                s += 3;
            else if (isprint(pat[u]) || pat[u] == 0x00 || pat[u] == 0x01 || pat[u] == 0xFF)
                s += 4;
            else
                s += 6;
            a[pat[u]] = 1;
        } else {
            s++;
        }
    }
    return s;
}

locked