Suricata starts in known conditions of no data
I've been working with suricata for the last year and I think suricata's behavior in starting with known failure conditions isn't in keeping with good UNIX/Linux best practices for daemon's or server programs.
Suricata starts referencing ethX devices that are not UP or not in PROMISC mode. In essence this is a successful start with NO DATA. I actually thought I had modified the init script against this but got surprised again. It fails without PF_RING but will start when the interface is for all intents and purposes not usuable. It would be better if suricata handles this or at least fails appropriately. My thought would be changing the suricata.yaml file does not guarantee effective operation of something that suricata has permissions to do. If I fail to mount a directory it relies on it should fail as well.
It will also start when it cannot write to its files. This case may be more nuanced but in the case of http, dns, and unified2 logging, it started without being able to write to a new unified2 file, while it could continue to spool to existing http/dns logs with the write permissions. (We had to move the data volumes and the root directory /var/log/suricata didn't have the correct permissions, but all of the copied files did under the directory did.)
In essence I had 2 different systems start, have open files and look ok to the Admin (and ME!) but hours later I had no data, or incomplete data for its configured purpose (as it can see in its suricata.yaml file).
If it runs with no errors when it shouldn't a process check is of no use, because it will just mask the flaws. It also makes the System V init scripts useless in theory because it will also execute but not really be performing its task. I think a consistent of the ethX devices may be overkill, but a well intentioned check on start-up and a fail condition is more in keeping with other UNIX daemon's. If it can no longer write to a file it depends upon I would also expect it to go into a fail state. It's up to the engineer to design, and the admin to support but its current behavior leaves it difficult to put into a support model for anyone but the engineer.
The above is well intentioned criticism from one viewpoint put forth with the best of intentions.
Updated by Victor Julien about 3 years ago
I think an interface not being in PROMISC is not a reason to not start. It can be valid to just monitor the servers in/out traffic.
The UP case is a good point. In my mind I alternate between bringing it UP or warning/erroring.
The logging issue is handled in #2386.