Optimization #5083
openProposal: new and compact rule parser for Suricata in Rust
Description
Current Scenario
Currently, every rule keyword that we support has its own parser written somewhere in its detect file with its own combinations of options and other keywords. Some keyword parsers are written with the help of libpcre2 and others are simpler with C fn strtok_r .
Issues
Recently with some bug reports, it came to our attention that there may still be missing edge cases when it came to parsing the rules. The edge cases could be as simple as improper space handling and as complex as conflicting keywords allowed as a part of the same rule causing issues elsewhere and, maybe more. While working on #4786, I realized that while it's easy for developers to make/break restrictions for certain keywords, it could be very difficult for rule writers to keep up with the minor differences causing major problems. With Brandon's comment on the issue about keeping things consistent for all rule keywords and testing on all combinations, it seemed like it was too difficult of a task with a possibility of always missing out on edge cases.
The proposal
Hence, the idea of this new rule parser came in. In idea, the parser is simple, not sure about converting this idea into reality (yet). The parser will require us to define a JSON schema that not only validates types but also other attributes like the position of an option to a keyword, whether it is a required argument, etc. For example, for flowbits , we may have
{
flowbits: {
[set, isset,...]: {
required: true,
position: 0,
expects: String,
},
noalert: {
required: true,
position: 0,
}
}
}
- w.r.t. this example, there will have to be a special handling case for flowbits ORing
- Generally, there would have to be a check to see if either of the same position required arguments are present at a time
This is a small idea, the schema is not practical yet but just here for reference. It'll have to be given much more thought to accomodate all the other keywords.
Before parsing
We should be able to send the rule keyword and the part of the signature relevant to the keyword from C.
After parsing
We should be able to send the result of parsing to a generic "Setup" fn on C side as the setup seems to be complex done on Rust side.
- Adding a new rule keyword would only mean updating the schema and any special handling that may be required. (yay for Devs)
- The "behavior" of rule keywords will be consistent everywhere. e.g. we could fix the order of all options to all keywords with `position` key mentioned above. (yay for Rule writers)
- It'll be much safer and faster to add and rely on in built stable Rust capabilities for parsing (fns as simple as split and trim) compared to pcre2 and other ways in C. (yay for Devs)
- Less C and more compact Rust code. (yay for everyone?)
- Conflicting keywords would be much easier to maintin. Probably just with a special key in the schema.
- If there's a bug with the generic parser, all rules will be broken in the same manner.
Origin of the idea
This idea of schema and making things compact came from the way we already implement command line keywords for suricatasc (https://github.com/OISF/suricata/blob/master/python/suricata/sc/specs.py) and also the difficulty level there is to change and track the behavior of all rule keywords and all of their possible combinations in the current scenario.
Comments from Jason Ish
Your output would like be a vector of enum variants that you could then have From impls to convert into structs.
But then I wonder if it would just be better to write a serde deserializer and generate the code from a struct:
https://github.com/jasonish/rust-suricata-rule-parser/blob/main/src/types.rs#L133 (I've been meaning to write a serializer/to_string for these struct, never really thought about doing an input one)
I'd love to see a day where Suricata-Update and Suricata can use the same rule parsing code.