Optimization #426


sid based thresholding data structure improvement

Added by Victor Julien over 10 years ago. Updated over 3 years ago.

Target version:


In recent changes most of the thresholding code moved away from one big lock for the entire thresholding table. With the introduction of a host table, the thresholding that tracks per src or dst is now done in a fine grained manner, reducing lock contention.

For thresholding tracked per rule however, we still use one big evil lock. So we need to fix that. Currently the data structure for storing these threshold entries is an array, with a size of the number of signatures. The signatures internal id (s->num) is the index. The array is stored in the de_ctx.

We need to move away from this. As we are going to support rule reloads, the internal id can no longer be used here. The thresholding table should survive the rule reload. The signatures internal id's will change on a reload so they can't be used as index.

So ideally we'd have some form of hash that uses the gid+sid to lookup the thresholding data struct without a global lock.

Options would be:
- have a hash with per row locking and update threshold entry under row lock.
- have a hash with per row locking for lookups and a per threshold entry lock for checks and updates (this is how flow and host tables work).
- make sure internal id's for sigs remain the same with reloads, using unique id's for new sigs -- too complicated I think.

Other options?

Actions #1

Updated by Victor Julien over 10 years ago

  • Description updated (diff)
Actions #2

Updated by Victor Julien over 10 years ago

Thinking of another option: most signatures don't have a threshold setting and most setups will not use large number of thresholds through the thresholds.conf file. What this means is that the number of threshold rules is relatively low.

We can also do a separate "threshold id" per rule, that is unique per "gid + sid". During the init stage we can set this by having a simple hash. Due to the small number of threshold enabled rules the global lookup array can be small. Each threshold entry would have it's own lock. The only contention would happen when 2 or more threads set/check a threshold for the same rule, which should be relatively rare.

Then on a rule reload, we can use the same simple hash to lookup the existing id's. Then we'd assign new id's to new rules and resize the lookup array somehow. The resize would require some form of locking though, this might bring back the big evil lock... Maybe a rwlock would be an option:

Lookup(Signature *s) {
  entry = s->threshold_id;
  // check/update

Update() {
  // resize and init new entries


Actions #3

Updated by Victor Julien over 10 years ago

  • Assignee set to OISF Dev
  • Target version set to TBD
Actions #4

Updated by Victor Julien almost 9 years ago

  • Tracker changed from Feature to Optimization
Actions #5

Updated by Victor Julien about 4 years ago

  • Assignee changed from OISF Dev to Anonymous
  • Effort set to low
  • Difficulty set to medium
Actions #6

Updated by Andreas Herz over 3 years ago

  • Assignee set to Community Ticket

Also available in: Atom PDF