Project

General

Profile

Actions

Bug #2306

closed

suricata 4 deadlocks during failed output log reopening

Added by Michael Stone over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

I've observed a consistent pattern of behavior in which after some period of operation suricata 4 will stop responding to inputs. For example, rule reloads fail to occur either using a signal or the socket interface. When suricata is in this state it is still generating alerts, still generating protocol logs, still generating stat logs, and otherwise appears to be operating normally, but it ignores requests to reload or shut down. I don't recall this ever happening with suricata 3, but on 4 (either 4.0 or 4.0.1) it seems that the process will eventually go catatonic. This is a major impact when switching from suri 3 to suri 4, because the only way to ensure that a rule reload actually occurs is to restart the entire process. There are no syslog or other management logs associated with this behavior. (At some point it goes from normal logging that it is reloading rules to not logging anything when asked to reload rules.)


Related issues 1 (0 open1 closed)

Related to Suricata - Bug #2360: possible deadlock with signal handlingClosedVictor Julien12/14/2017Actions
Actions #1

Updated by Victor Julien over 6 years ago

When this happens again, can you attach gdb to the process and store the output of 'thread apply all bt' and post that here?

See https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs

Actions #2

Updated by Michael Stone over 6 years ago

I think I may have identified the problem. gdb backtrace includes this:

Thread 1 (Thread 0x7f530f940540 (LWP 12965)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f530c808b95 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x563433e5ad30) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000563433408262 in SCLogMessage (log_level=log_level@entry=SC_LOG_ERROR, file=file@entry=0x56343349aee8 "util-debug.c", line=line@entry=522,
    function=function@entry=0x56343349b7b8 <__FUNCTION__.18752> "SCLogReopen", error_code=error_code@entry=SC_ERR_FOPEN,
    message=message@entry=0x7ffc68e2bd50 "Erroring re-opening file \"/var/log/suricata.log\": Permission denied") at util-debug.c:580
#3  0x0000563433408424 in SCLogReopen (op_iface_ctx=0x563433e5ad00, op_iface_ctx=0x563433e5ad00) at util-debug.c:521
#4  SCLogMessage (log_level=log_level@entry=SC_LOG_NOTICE, file=file@entry=0x563433480d91 "detect-engine.c", line=line@entry=2906,
    function=function@entry=0x5634334818c0 <__FUNCTION__.21074> "DetectEngineReload", error_code=error_code@entry=SC_OK, message=message@entry=0x7ffc68e2c620 "rule reload starting")
    at util-debug.c:582
#5  0x00005634333082cf in DetectEngineReload (suri=0x7ffc68e2ce80) at detect-engine.c:2906
#6  0x0000563433267591 in SuricataMainLoop (suri=<optimized out>) at suricata.c:2800
#7  main (argc=<optimized out>, argv=<optimized out>) at suricata.c:2936

The config file sets /var/log/suricata for a file log, as well as syslog. It also has run-as suricata:suricata. In suri 3 this put early startup messages in /var/log/suricata, which was created as root and couldn't be written after switching user, but left all messages logging to syslog and otherwise operated normally. In suri 4 it seems that when it can't write to /var/log/suricata it just barfs (and doesn't log that to syslog). So far it seems that if I change the owner of /var/log/syslog, things work ok. If that is the case, then the behavior should change somehow (probably, the failure to open the log file shouldn't be fatal).

Actions #3

Updated by Victor Julien over 6 years ago

  • Status changed from New to Assigned
  • Assignee set to Victor Julien
  • Target version set to 4.0.4

Thanks, I see the issue.

Actions #4

Updated by Victor Julien over 6 years ago

  • Subject changed from suricata 4 stops responding after long runtime to suricata 4 deadlocks during failed output log reopening
Actions #5

Updated by Victor Julien over 6 years ago

  • Related to Bug #2360: possible deadlock with signal handling added
Actions #6

Updated by Victor Julien about 6 years ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF