Project

General

Profile

Actions

Bug #2423

open

Suricata 4.0.3 and Napatech crashing

Added by Steve Castellarin almost 4 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

Running Ubuntu 14.04.5 64 bit system on a Dell R630 1U (dual E5-2660 v3 @2.60Ghz processors with 128Gb of memory) with a Napatech NT20E2-PTP capture card (one port active) with Suricata 3.1.1. Upgraded our environment to use Napatech's latest driver set (10.0.4 - Huntington Beach 3) and Suricata 4.0.3. Running Suricata as in daemon mode with command: /usr/bin/suricata -c /etc/suricata/suricata.yaml --napatech --runmode workers -D.

Suricata will run for some time then will see one CPU (CPU defined as a worker) hit 100% and stay there, while one Napatech host buffer (seen by running the Napatech "profiling" command) will hit 100% and drop packets. This will continue without stopping. Then a second CPU (again a CPU that is a worker) hit 100% and another Napatech host buffer will hit 100% and drop packets. This will continue, seeing many CPUs and host buffers pegged, until I issue a "kill `pidof suricata`". Many times this will gracefully end Suricata - but will take 5-10 minutes to do so. But when Suricata ends, it does not remove the /var/run/suricata.pid file.

Attached is the stats.log from a running Suricata 4.0.3 session. The first time a packet drop was seen was at the 12:20:51 mark, and with "nt12.drop" incrementing. During this time one of the CPUs acting as a "worker" was at 100%. But these drops recovered at the 12:20:58 mark, where "nt12.drop" stays constant at 13803. The big issue triggered at the 12:27:05 mark in the file - where one worker CPU was stuck at 100% followed by packet drops in host buffer "nt3.drop". Then came a second CPU at 100% (another "worker" CPU) and packet drops in buffer "nt2.drop" at 12:27:33. Suricata was killed via "kill `pidof suricata`" just before 12:27:54, where you see all host buffers beginning to drop packets.

Also attached is the suricata.yaml configuration file as well as the output from a "suricata --dump-config" command.


Files

statlog.zip (57.2 KB) statlog.zip stats.log from Suricata 4.0.3 run Steve Castellarin, 01/18/2018 08:46 AM
4.0.3.cfg (11 KB) 4.0.3.cfg output from suricata --dump-config Steve Castellarin, 01/18/2018 08:47 AM
suricata.yaml (28.8 KB) suricata.yaml configured 4.0.3 yaml file Steve Castellarin, 01/18/2018 08:47 AM
suricata.log (17.5 KB) suricata.log Steve Castellarin, 01/24/2018 12:49 PM
nts-1002540_1.pcapng (3.18 KB) nts-1002540_1.pcapng Phil Young, 02/19/2020 08:01 PM
Actions #1

Updated by Peter Manev almost 4 years ago

Hi,

Does it crash aka segfault/produces core or you mean it becomes unresponsive?
If it crashes you can maybe try to recompile with debugging enabled and share the info from the generated core.
You can find out some info of how to do that if needed here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs

Actions #2

Updated by Steve Castellarin almost 4 years ago

Hey Peter,

That was a poor choice of wording on my part. Suricata is not crashing - no core dumps, etc are being produced. Suricata becomes unresponsive.

Peter Manev wrote:

Hi,

Does it crash aka segfault/produces core or you mean it becomes unresponsive?
If it crashes you can maybe try to recompile with debugging enabled and share the info from the generated core.
You can find out some info of how to do that if needed here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs

Actions #3

Updated by Andreas Herz over 3 years ago

  • Assignee set to OISF Dev
  • Target version set to TBD

Can you reproduce it with debug enabled?

Actions #4

Updated by Steve Castellarin over 3 years ago

Hi Andreas, I will configure/re-compile with the --enable-debug option. What do I need to do in the YAML configuration to get the level of detail you're looking for?

Actions #5

Updated by Steve Castellarin over 3 years ago

I'm adding a suricata.log file which has an error at the bottom. The error notes an issue accessing Napatech host buffer "nt8". This same host buffer is seen previously in the .log file - it is the host buffer that increases the host buffer fill level to 100%, then begins to increase Adapter SDRAM fill level to 100% until the buffer is rendered unusable (packets continually drop and the CPU pinned to that host buffer is stuck at 100%).

I always see this error message when running Suricata 4.0.3 and this issue begins - just with a different host buffer #.

Actions #6

Updated by Victor Julien over 2 years ago

  • Status changed from New to Assigned
  • Assignee changed from OISF Dev to Phil Young
Actions #7

Updated by Phil Young over 1 year ago

This was very difficult to reproduce and debug. We were finally able to reproduce it as follows:
1. Input the attached pcap file (nts-1002540_1.pcapng)
2. Wait for the time specified in the conf file at: flow-timeouts.tcp.closed
3. Input the file a second time.

At this point Suricata gets stuck in a loop in TmThreadsSlotVarRun() (lines 128-131)where it is attempting to read packets from tv->decode_pq, but the PacketDequeueNoLock() function always returning NULL.

The solution was to move PacketFreeOrRelease() in the Napatech code from the PacketRelease callback function until after the completion of TmThreadsSlotProcessPkt(). It appears that when this is released from within the callback that the queue is corrupted causing the failure.

Actions

Also available in: Atom PDF