Optimization #3322
openUse standard CRC32 for hash-like functions
Added by Philippe Antoine about 5 years ago. Updated about 2 months ago.
Description
Instead of a custom one (as CRC32 was I think designed for kind of avoiding collisions)
One such function is StringHash
cf https://github.com/OISF/suricata/pull/4337
Updated by Victor Julien about 5 years ago
As we discussed offline, it would be nice to create some kind of benchmarking framework where we could validate such changes. Pure pcap tests may not always give enough insight. For example with the bm optimizations pcap based tests showed no difference, while I think more micro level benchmarks would have shown something.
Updated by Philippe Antoine about 5 years ago
It would be nice if this benchmarking framework handles caches realistically.
With the example of Boyer-Moore optimizations (one less call to alloc), I am not sure a naive benchmarking would shows much difference as the additional call to alloc would grab repeatedly the same cached memory area, whereas in a real Suricata execution, this would not be the case
Updated by Andreas Herz about 5 years ago
- Assignee set to Philippe Antoine
- Target version set to TBD
Updated by Philippe Antoine about 5 years ago
MurmurHash may be the best function
https://en.wikipedia.org/wiki/MurmurHash
Current function is not random
It is DJB hash
https://stackoverflow.com/questions/10696223/reason-for-5381-number-in-djb-hash-function
Here is a comparison
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
Updated by Philippe Antoine about 5 years ago
I put a first benchmark here
https://github.com/OISF/suricata/pull/4371
Updated by Philippe Antoine almost 5 years ago
- Status changed from New to In Review
Updated by Roland Fischer over 4 years ago
There are a few other hash functions that might be interesting. Google's CityHash comes to mind as it's their "general purpose" hash. SpookyHash/xxhash might be options as well. The stackexchange page lists them as well I think.
Ultimately, it depends on how much time you want to spend on this vs how happy you are with the current hash. Plus, what the usage patterns of the hash are as well as the data to be hashed. ;)
Updated by Victor Julien almost 4 years ago
- Blocked by Bug #4265: QA lab: add possibility to do repeatable replay tests added
Updated by Philippe Antoine almost 4 years ago
Closing https://github.com/OISF/suricata/pull/4816 waiting for QA lab
Updated by Philippe Antoine almost 2 years ago
- Assignee changed from Philippe Antoine to Community Ticket
- Priority changed from Normal to Low
This does not seem the most important area to optimize...
Updated by Philippe Antoine about 2 months ago
- Related to Security #7209: thash: random factor not used; possible abusive hash collisions added
Updated by Philippe Antoine about 2 months ago
SipHash seems to be the current standard, used by rust and other internally...