Project

General

Profile

Actions

Feature #2076

closed

Strip whitespace from buffers

Added by Jason Williams about 7 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Effort:
Difficulty:
Label:

Description

It would be very useful to be able to have a modifier that we could apply to buffers that would normalize/eliminate whitespace. This would be most useful in the file_data; section and would significantly reduce pcre usage when dealing with html and javascript signatures.

For example in javascript you can have:

'window . location = '

We have to write pcres to account for this possible whitespace such as:

content:"window"; pcre:"/^\s*\.\s*location\s*=\s*/";

It would be very useful if we could write this as:

file_data; content:"window.location="; ignore_whitespace;

Actions #1

Updated by Victor Julien about 7 years ago

How do you see this interact with other keywords?

file_data; content:"window.location="; ignore_whitespace; content:"something"; distance:0; within:10; isdataat:!1,relative;

Would the second content and the isdataat also run on some stripped buffer? If so it might make more sense to have something like:

file_data; ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;

Or even something ugly like:

file_data_ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;

If we preprocess the file_data buffer to strip whitespace or do some other transformation, we're essentially creating a new buffer and a new inspect engine internally. Related ticket #1006.

Actions #2

Updated by Jason Williams about 7 years ago

I believe the second option would be most practical for the purpose of reducing pcre usage.

Actions #3

Updated by Andreas Herz almost 7 years ago

  • Assignee set to OISF Dev
  • Target version set to TBD
Actions #4

Updated by Francis Trudeau almost 7 years ago

Will this also eliminate nulls (0x00)? This would help in matching on unicode text among other things.

Actions #5

Updated by Victor Julien almost 7 years ago

in a buffer like " a b c d" would the expected result be "abcd" or something else? Would all whitespace be stripped?

Actions #6

Updated by Jason Williams almost 7 years ago

Victor Julien wrote:

in a buffer like " a b c d" would the expected result be "abcd" or something else? Would all whitespace be stripped?

Well, I think there we should either remove all whitespace and smush the buffer together, or replace all whitespace instances with a single space. So (?:\t\r\n\s\x00)+ becomes \s. I don't think it really matters on the sig writing side, I think whichever has the least amount of overhead on the sensor would be best.

Actions #7

Updated by Victor Julien almost 7 years ago

In your original example of 'window . location = ' the best result would probably be 'window.location=' ?

Actions #8

Updated by Jason Williams almost 7 years ago

I think it would be the best result.

Actions #9

Updated by Francis Trudeau almost 7 years ago

I agree, stripping out whitespace would be best, especially for \x00. Turning \x00+ to \x20 would negate changing \x00 at all.

Actions #10

Updated by Victor Julien over 6 years ago

  • Status changed from New to Assigned
  • Assignee changed from OISF Dev to Victor Julien
  • Target version changed from TBD to 70
Actions #11

Updated by Victor Julien about 6 years ago

  • Status changed from Assigned to Closed
  • Target version changed from 70 to 4.1beta1
Actions

Also available in: Atom PDF