Feature #2076
closed
  
    
    
  
Strip whitespace from buffers
 
        
        Added by Jason Williams over 8 years ago.
        Updated over 7 years ago.
        
  
  
  
  Description
  
  It would be very useful to be able to have a modifier that we could apply to buffers that would normalize/eliminate whitespace. This would be most useful in the file_data; section and would significantly reduce pcre usage when dealing with html and javascript signatures.
	For example in javascript you can have:
	'window    .                 location   =   '
	We have to write pcres to account for this possible whitespace such as:
	content:"window"; pcre:"/^\s*\.\s*location\s*=\s*/";
	It would be very useful if we could write this as:
	file_data; content:"window.location="; ignore_whitespace;
   
 
 
  
  
    
    
    
    How do you see this interact with other keywords?
file_data; content:"window.location="; ignore_whitespace; content:"something"; distance:0; within:10; isdataat:!1,relative;
	Would the second content and the isdataat also run on some stripped buffer? If so it might make more sense to have something like:
file_data; ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;
	Or even something ugly like:
file_data_ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;
If we preprocess the file_data buffer to strip whitespace or do some other transformation, we're essentially creating a new buffer and a new inspect engine internally. Related ticket 
#1006.
 
   
  
  
    
    
    
    I believe the second option would be most practical for the purpose of reducing pcre usage.
 
   
  
  
    
    
    
    
       - Assignee set to OISF Dev
- Target version set to TBD
 
   
  
  
    
    
    
    Will this also eliminate nulls (0x00)?  This would help in matching on unicode text among other things.
 
   
  
  
    
    
    
    in a buffer like " a b c   d" would the expected result be "abcd" or something else? Would all whitespace be stripped?
 
   
  
  
    
    
    
    Victor Julien wrote:
	in a buffer like " a b c   d" would the expected result be "abcd" or something else? Would all whitespace be stripped?
	Well, I think there we should either remove all whitespace and smush the buffer together, or replace all whitespace instances with a single space.  So (?:\t\r\n\s\x00)+ becomes \s. I don't think it really matters on the sig writing side, I think whichever has the least amount of overhead on the sensor would be best.
 
   
  
  
    
    
    
    In your original example of 'window . location = ' the best result would probably be 'window.location=' ?
 
   
  
  
    
    
    
    I think it would be the best result.
 
   
  
  
    
    
    
    I agree, stripping out whitespace would be best, especially for \x00.  Turning \x00+ to \x20 would negate changing \x00 at all.
 
   
  
  
    
    
    
    
       - Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from TBD to 70
 
   
  
  
    
    
    
    
       - Status changed from Assigned to Closed
- Target version changed from 70 to 4.1beta1
 
   
  
 
  
  
 
Also available in:  Atom
  PDF