Bug #2189
closedPID file removal at shutdown broken on 4.0.0-rc2
Description
I can reproduce this behavior on two test machines, two variables in play seem to be a custom pid-file name, and running in Daemon + socket mode.
This did not appear to occur on 3.2.X
This seems to occur if I kill by SIGTERM directly, but also by sending a 'shutdown' command to the socket.
@# Configuration bits CONF=/etc/suricata/suricata-socket.yaml SOCKET=/var/run/suricata/suricata-test-command.socket SURICATA=/usr/bin/suricata SURICATASC=/usr/bin/suricatasc root@suricata-test:# grep suricata-test.pid $CONF pid-file: /var/run/suricata-test.pid # Start suricata in unix-socket/daemon mode root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D 20/7/2017 -- 14:24:05 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE 20/7/2017 -- 14:24:05 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata-test.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata-test.pid. Aborting! # send SIGTERM to process in question: root@suricata-test:# kill 97554 # suricata-test.pid is still on disk. root@suricata-test:# ls -la /var/run/suricata-test.pid -rw-r----- 1 root root 6 Jul 20 14:25 /var/run/suricata-test.pid # remove it and suricata starts fine. root@suricata-test:# rm /var/run/suricata-test.pid root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D 20/7/2017 -- 14:27:19 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE # try killing via shutdown command on socket root@suricata-test:# $SURICATASC $SOCKET Command list: shutdown, command-list, help, version, uptime, running-mode, capture-mode, conf-get, dump-counters, reload-rules, register-tenant-handler, unregister-tenant-handler, register-tenant, reload-tenant, unregister-tenant, add-hostbit, remove-hostbit, list-hostbit, pcap-file, pcap-file-number, pcap-file-list, pcap-current, quit >>> version Success: "4.0.0-rc2 RELEASE" >>> shutdown Success: "Closing Suricata" # suricata-test.pid is still on disk again root@suricata-test:# ls -la /var/run/suricata-test.pid -rw-r----- 1 root root 6 Jul 20 14:27 /var/run/suricata-test.pid # Suricata fails to start with same error. Remove file, and all is well again though. root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D 20/7/2017 -- 14:31:35 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE 20/7/2017 -- 14:31:35 - <Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata-test.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata-test.pid. Aborting! root@suricata-test:# rm /var/run/suricata-test.pid root@suricata-test:# "$SURICATA" -c "$CONF" --unix-socket -D 20/7/2017 -- 14:44:17 - <Notice> - This is Suricata version 4.0.0-rc2 RELEASE@
Files
Updated by Jason Ish over 7 years ago
Do you have Suricata dropping privileges and running as another user?
Updated by Duane Howard over 7 years ago
In this particular test case, no. It continues running as root, and all commands here are issued as root.
Updated by Jason Ish over 7 years ago
- Assignee set to Jason Ish
- Target version set to 70
Updated by Jason Ish over 7 years ago
I could not replicate this with git master or Suricata 4.0.0-rc2 built from the archive. The only way I could replicate it was to add --user on the command line to have Suricata drop privileges to a non-root user, or the run-as configuration file section.
When dropping privileges this is intended behaviour as Suricata not longer has enough privileges to remove the PID file it created. It is our understanding that this is the best practice when dealing with PID files.
Can you show me more info like the ps listing of Suricata while running, and the permissions of the socket file?
Updated by Duane Howard over 7 years ago
Could it be a side effect of having another Suricata running (actually sniffing traffic, etc.) at the same time?
To be fair, root owns the job and the socket, and the pid file doesn't disappear with a kill <pid> from root so I'm not sure how this would be related to dropped privileges (we do have another suricata running doing real work, that drops privileges, but not this one.
Permissions on the socket
root@suricata-test:# ls -al /var/run/suricata/suricata-test-command.socket srw-r----- 1 root suricata 0 Jul 20 18:54 /var/run/suricata/suricata-test-command.socket
ps listing:
root@zombie-lab.cam:/usr/local/google/home/duaneh# ps -ef | grep suricata | grep -v grep suricata 76609 25951 99 16:46 ? 03:17:34 /usr/bin/suricata -c /etc/suricata/suricata.yaml --af-packet --user suricata --group suricata -F /etc/suricata/bpf.conf root 118425 1 5 18:54 ? 00:00:10 /usr/bin/suricata -c /etc/suricata/suricata-socket.yaml --unix-socket -D
Updated by Duane Howard over 7 years ago
friendly ping? Any other data I can provide to help?
Updated by Jason Ish over 7 years ago
No, I have not been able to replicate with 4.0.0-rc or 4.0.0. I to have another instance running with af-packet, as user suricata (run from systemd).
Can you try not running daemon mode? See what the exit code is? Maybe its failing on exit before removing the PID file.
Updated by Andreas Herz over 6 years ago
- Status changed from New to Closed
Hi, we're closing this issue since there have been no further responses.
If you think this bug is still relevant, try to test it again with the
most recent version of suricata and reopen the issue. If you want to
improve the bug report please take a look at
https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Reporting_Bugs
Updated by Victor Julien about 5 years ago
- Assignee deleted (
Jason Ish) - Target version deleted (
70)
Updated by Dylan Walter about 5 years ago
I am fairly certain I'm having this issue.
I'm running Suricata 5.0.0 on Ubuntu 16.04 LTS installed from the apt repositories.
We run in af-packet from systemd as root.
I have ~55 identical devices (hardware, Ubuntu version, patch level, config file) standardized with Ansible.
We've only seen this occur at 2 of our locations (odd considering they're identical and update on the same regular schedule). We do signature updates with oinkmaster and after the job runs (cron as root) it kicks off a kill -USR2 $(pidof suricata) it's at this point that it seems to enter the failed state. When we investigate we see the service status as a green active (exited) state. If we restart the service we get an entry in suricata.log: "<Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata.pid. Aborting!"
Stopping the service, blowing away the stale pid as described in the log, and starting the service again clears the issue.
Let me know what I can do to help further.
Updated by Jason Ish about 5 years ago
Dylan Walter wrote:
I am fairly certain I'm having this issue.
I'm running Suricata 5.0.0 on Ubuntu 16.04 LTS installed from the apt repositories.
We run in af-packet from systemd as root.
I have ~55 identical devices (hardware, Ubuntu version, patch level, config file) standardized with Ansible.We've only seen this occur at 2 of our locations (odd considering they're identical and update on the same regular schedule). We do signature updates with oinkmaster and after the job runs (cron as root) it kicks off a kill -USR2 $(pidof suricata) it's at this point that it seems to enter the failed state. When we investigate we see the service status as a green active (exited) state. If we restart the service we get an entry in suricata.log: "<Error> - [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata.pid. Aborting!"
Stopping the service, blowing away the stale pid as described in the log, and starting the service again clears the issue.
Let me know what I can do to help further.
Are you using our provided systemd unit file or creating your own? If using your own, can you add it here please?
Updated by Dylan Walter about 5 years ago
I'm fairly certain it's the provided one, but I'm attaching my /etc/init.d/suricata anyway.
Updated by Jason Ish about 5 years ago
Dylan Walter wrote:
I'm fairly certain it's the provided one, but I'm attaching my /etc/init.d/suricata anyway.
This looks like an issue with the init script which is not provided by Suricata itself, but instead of this package.
Suricata does contain a sample Systemd unit file that handles this case. I'll try to ping the appropriate people to see where to take this.
Updated by Jason Ish about 5 years ago
New issue created, https://redmine.openinfosecfoundation.org/issues/3330 for this issue as I believe its specific to the init file.