Project

General

Profile

Actions

Bug #2189

closed

PID file removal at shutdown broken on 4.0.0-rc2

Added by Duane Howard over 5 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
Affected Versions:
Effort:
Difficulty:
Label:

Description

I can reproduce this behavior on two test machines, two variables in play seem to be a custom pid-file name, and running in Daemon + socket mode.
This did not appear to occur on 3.2.X
This seems to occur if I kill by SIGTERM directly, but also by sending a 'shutdown' command to the socket.

@# Configuration bits
CONF=/etc/suricata/suricata-socket.yaml
SOCKET=/var/run/suricata/suricata-test-command.socket
SURICATA=/usr/bin/suricata
SURICATASC=/usr/bin/suricatasc

root@suricata-test:# grep suricata-test.pid $CONF
pid-file: /var/run/suricata-test.pid

# Start suricata in unix-socket/daemon mode
root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D
20/7/2017 -- 14:24:05 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE
20/7/2017 -- 14:24:05 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata-test.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata-test.pid. Aborting!

# send SIGTERM to process in question:
root@suricata-test:# kill 97554

# suricata-test.pid is still on disk.
root@suricata-test:# ls -la /var/run/suricata-test.pid
-rw-r----- 1 root root 6 Jul 20 14:25 /var/run/suricata-test.pid

# remove it and suricata starts fine.
root@suricata-test:# rm /var/run/suricata-test.pid
root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D
20/7/2017 -- 14:27:19 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE

# try killing via shutdown command on socket
root@suricata-test:# $SURICATASC $SOCKET                                                                                            
Command list: shutdown, command-list, help, version, uptime, running-mode, capture-mode, conf-get, dump-counters, reload-rules, register-tenant-handler, unregister-tenant-handler, register-tenant, reload-tenant, unregister-tenant, add-hostbit, remove-hostbit, list-hostbit, pcap-file, pcap-file-number, pcap-file-list, pcap-current, quit
>>> version
Success:
"4.0.0-rc2 RELEASE" 
>>> shutdown
Success:
"Closing Suricata" 

# suricata-test.pid is still on disk again
root@suricata-test:# ls -la /var/run/suricata-test.pid
-rw-r----- 1 root root 6 Jul 20 14:27 /var/run/suricata-test.pid

# Suricata fails to start with same error. Remove file, and all is well again though.
root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D
20/7/2017 -- 14:31:35 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE
20/7/2017 -- 14:31:35 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata-test.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata-test.pid. Aborting!
root@suricata-test:# rm /var/run/suricata-test.pid
root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D                                                                        
20/7/2017 -- 14:44:17 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE@

Files

suricata (4.16 KB) suricata /etc/init.d/suricata Dylan Walter, 11/06/2019 04:37 PM
Actions #1

Updated by Jason Ish over 5 years ago

Do you have Suricata dropping privileges and running as another user?

Actions #2

Updated by Duane Howard over 5 years ago

In this particular test case, no. It continues running as root, and all commands here are issued as root.

Actions #3

Updated by Jason Ish over 5 years ago

  • Assignee set to Jason Ish
  • Target version set to 70
Actions #4

Updated by Jason Ish over 5 years ago

I could not replicate this with git master or Suricata 4.0.0-rc2 built from the archive. The only way I could replicate it was to add --user on the command line to have Suricata drop privileges to a non-root user, or the run-as configuration file section.

When dropping privileges this is intended behaviour as Suricata not longer has enough privileges to remove the PID file it created. It is our understanding that this is the best practice when dealing with PID files.

Can you show me more info like the ps listing of Suricata while running, and the permissions of the socket file?

Actions #5

Updated by Duane Howard over 5 years ago

Could it be a side effect of having another Suricata running (actually sniffing traffic, etc.) at the same time?
To be fair, root owns the job and the socket, and the pid file doesn't disappear with a kill <pid> from root so I'm not sure how this would be related to dropped privileges (we do have another suricata running doing real work, that drops privileges, but not this one.

Permissions on the socket

root@suricata-test:# ls -al /var/run/suricata/suricata-test-command.socket
srw-r----- 1 root suricata 0 Jul 20 18:54 /var/run/suricata/suricata-test-command.socket

ps listing:

root@zombie-lab.cam:/usr/local/google/home/duaneh# ps -ef | grep suricata | grep -v grep
suricata  76609  25951 99 16:46 ?        03:17:34 /usr/bin/suricata -c /etc/suricata/suricata.yaml --af-packet --user suricata --group suricata -F /etc/suricata/bpf.conf
root     118425      1  5 18:54 ?        00:00:10 /usr/bin/suricata -c /etc/suricata/suricata-socket.yaml --unix-socket -D

Actions #6

Updated by Duane Howard over 5 years ago

friendly ping? Any other data I can provide to help?

Actions #7

Updated by Jason Ish over 5 years ago

No, I have not been able to replicate with 4.0.0-rc or 4.0.0. I to have another instance running with af-packet, as user suricata (run from systemd).

Can you try not running daemon mode? See what the exit code is? Maybe its failing on exit before removing the PID file.

Actions #8

Updated by Andreas Herz over 4 years ago

  • Status changed from New to Closed

Hi, we're closing this issue since there have been no further responses.
If you think this bug is still relevant, try to test it again with the
most recent version of suricata and reopen the issue. If you want to
improve the bug report please take a look at
https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs

Actions #9

Updated by Victor Julien about 3 years ago

  • Assignee deleted (Jason Ish)
  • Target version deleted (70)
Actions #10

Updated by Dylan Walter about 3 years ago

I am fairly certain I'm having this issue.

I'm running Suricata 5.0.0 on Ubuntu 16.04 LTS installed from the apt repositories.
We run in af-packet from systemd as root.
I have ~55 identical devices (hardware, Ubuntu version, patch level, config file) standardized with Ansible.

We've only seen this occur at 2 of our locations (odd considering they're identical and update on the same regular schedule). We do signature updates with oinkmaster and after the job runs (cron as root) it kicks off a kill -USR2 $(pidof suricata) it's at this point that it seems to enter the failed state. When we investigate we see the service status as a green active (exited) state. If we restart the service we get an entry in suricata.log: "<Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata.pid. Aborting!"

Stopping the service, blowing away the stale pid as described in the log, and starting the service again clears the issue.

Let me know what I can do to help further.

Actions #11

Updated by Jason Ish about 3 years ago

Dylan Walter wrote:

I am fairly certain I'm having this issue.

I'm running Suricata 5.0.0 on Ubuntu 16.04 LTS installed from the apt repositories.
We run in af-packet from systemd as root.
I have ~55 identical devices (hardware, Ubuntu version, patch level, config file) standardized with Ansible.

We've only seen this occur at 2 of our locations (odd considering they're identical and update on the same regular schedule). We do signature updates with oinkmaster and after the job runs (cron as root) it kicks off a kill -USR2 $(pidof suricata) it's at this point that it seems to enter the failed state. When we investigate we see the service status as a green active (exited) state. If we restart the service we get an entry in suricata.log: "<Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata.pid. Aborting!"

Stopping the service, blowing away the stale pid as described in the log, and starting the service again clears the issue.

Let me know what I can do to help further.

Are you using our provided systemd unit file or creating your own? If using your own, can you add it here please?

Actions #12

Updated by Dylan Walter about 3 years ago

I'm fairly certain it's the provided one, but I'm attaching my /etc/init.d/suricata anyway.

Actions #13

Updated by Jason Ish about 3 years ago

Dylan Walter wrote:

I'm fairly certain it's the provided one, but I'm attaching my /etc/init.d/suricata anyway.

This looks like an issue with the init script which is not provided by Suricata itself, but instead of this package.

Suricata does contain a sample Systemd unit file that handles this case. I'll try to ping the appropriate people to see where to take this.

Actions #14

Updated by Jason Ish about 3 years ago

New issue created, https://redmine.openinfosecfoundation.org/issues/3330 for this issue as I believe its specific to the init file.

Actions

Also available in: Atom PDF