Bug #2220
openWhen running on a single-CPU machine, pcap processing takes a long time
Description
I was attempting to batch-process a list of PCAPs, but I noticed the speed was very low on one of my worker machines. It is a virtual machine with only a single CPU. Increasing the number of CPUs resolves the issue.
I started Suricata in socket mode:suricata --unix-socket=/tmp/suricata/socket --pidfile=/tmp/suricata/pid -S /tmp/suricata/rules
With the rules file being empty, or containing a single simple rule.
Then, I submit a lot of PCAPs to process using a very simple Python script, and interrupting it using Ctrl+C after a while:
import suricatasc
s = suricatasc.SuricataSC("/tmp/suricata/socket")
s.connect()
while True:
s.send_command("pcap-file", {"filename": "empty.pcap", "output-dir": "/tmp/suricata/"})
On a machine with 1 CPU and 1 core, it takes almost exactly 1 second to process a single PCAP file, even if it is empty. The script above will be successful in sending two commands to the socket, so after 30s of running this script, 60 pcaps will be submitted (and 30 will still need processing).
On a machine with more CPU threads, the above works very fast.
This seems to be an issue with the threading library or so. Since my testing showed the processing taking 1 second per PCAP, I suspected a random sleep somewhere, but couldn't find it.
This issue occurs with at least Suricata 3.0, 3.2 and 4.0.
Updated by Peter Manev about 7 years ago
Do you experience the same if you try runmode single?
Updated by Mathijs van de Nes about 7 years ago
Peter Manev wrote:
Do you experience the same if you try runmode single?
That does not seem to change anything.
It appears to be that the flow manager is delaying the analysis. When reducing the timeout FLOW_NORMAL_MODE_UPDATE_DELAY_* in flow-manager.c to 1 us instead of the default of 1 s, my setup works 33 times as fast on a single CPU. (1000 samples in 30 seconds instead of just 30)
This is just a workaround, but I hope it can point someone who is more familiar with Suricata in the right direction.
Updated by Andreas Herz almost 7 years ago
- Assignee set to OISF Dev
- Target version set to TBD
Updated by Danny Browning almost 7 years ago
Flow manager will encounter this wait at https://github.com/OISF/suricata/blob/master/src/flow-manager.c#L790
Normally the condition variable would be woken up by https://github.com/OISF/suricata/blob/master/src/flow-manager.c#L143, however unix socket doesn't hit this code path as a result of https://github.com/OISF/suricata/blob/master/src/suricata.c#L2290.
This means at the end of a unix run mode, the flow manager will have to wait on the time remaining for the condition variable. Suggest adding a WakeupFlowMangerThread function to FlowManager, and calling that from https://github.com/OISF/suricata/blob/master/src/runmode-unix-socket.c#L379 or https://github.com/OISF/suricata/blob/master/src/suricata.c#L2290 so that processing is not waiting on the flow manager condition variable wakeup.
Updated by Danny Browning almost 7 years ago
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Danny Browning
Updated by Danny Browning almost 7 years ago
Updated by Danny Browning almost 7 years ago
- Assignee changed from Danny Browning to Ralph Broenink
Ralph can you verify that slowness is not seen in this PR?
Updated by Mathijs van de Nes almost 7 years ago
Danny Browning wrote:
Ralph can you verify that slowness is not seen in this PR?
Unfortunately this does not seem to affect the behaviour.
With one CPU core the interval is still 1 second, even though the wake-up function is called.
Updated by Andreas Herz over 5 years ago
- Related to Bug #2221: Suricata batch processing slowed down by 0.2s intervals added
Updated by Andreas Herz over 2 years ago
- Status changed from Assigned to Closed
More or less the exact same thing as #2221
Updated by Ralph Broenink over 2 years ago
Though the issues look very similar, the issue described in #2221 is that the socket has a 0.2s read timeout. This issue concerns the fact that processing in general is very slow and is not concerned about the socket integration, though the issue can be reproduced in a similar way.
I feel that two issues are warranted, especially given some work was already done on this one, and should be reopened.
Updated by Ralph Broenink over 2 years ago
- Status changed from Closed to In Review
- Assignee deleted (
Ralph Broenink)
Updated by Philippe Antoine about 1 month ago
- Status changed from In Review to New