Optimization #1277
closedTrigger second live rule-reload while first one is in progress
Description
We have some autotests in which i'm testing suricata now, before we want to take it into productive mode.
There are some scripts that trigger a USR2 signal because the .yaml has changed, for example the HOME_NET var changed because of the ppp0 device receiving a new IP.
The problem is, that it could happen that the script is called again because the ppp0 device reveived a newer IP and triggers USR2 again.
Depending on the ruleset and system the live rule-reload from the first time might not have finished yet, so the second USR2 is just going to be "rejected".
I would suggest that the second USR2 signal is stored and after the first reload is completed the second one should reload to make sure the proper IP is in HOME_NET.
An alternative would be some way to detect if the rule-reload is still in progress, so a script could wait. Having a logwatch is one possibility, but kinda messy.
Updated by Andreas Herz over 10 years ago
Just to describe the scenario in more detail, why i even have this request:
- Server with dynamic IP (ppp0 for example)
- UI that allows to activate/deactivate .rules files
- ifup-post script for the dynamic interface
Following scenario might happen:
- UI changes the .yaml config to include foobar.rules and thus sends a SIGUSR2 to suricata so the new rules get activated.
- While the reload is in progress (takes some seconds) the dynamic interface comes up and receives a new dynamic IP. The ifup script changes HOME_NET in .yaml so the new dynamic IP is included and thus sends also a SIGUSR2 to suricata so the new HOME_NET value gets activated.
- Since the second SIGUSR2 is ignored, suricata ends up with a valid but old config that is active for the recent running suricata process. So in this example HOME_NET still has the old value it has when the rule was changed.
The script could check in /var/log/suricata.log if the last line is "rule reload complete" but that's far from perfect to make sure the recent reload is done.
Andreas Herz wrote:
I would suggest that the second USR2 signal is stored and after the first reload is completed the second one should reload to make sure the proper IP is in HOME_NET.
After digging into the code one possible way is to extend "SignalHandlerSigusr2Idle" like:
void SignalHandlerSigusr2Idle(int sig) { if (run_mode == RUNMODE_UNKNOWN || run_mode == RUNMODE_UNITTEST) { SCLogInfo("Ruleset load signal USR2 triggered for wrong runmode"); return; } SCLogInfo("Ruleset load in progress. New ruleset load " "allowed after current is done"); // wait until the reload is done while (UtilSignalIsHandler(SIGUSR2, SignalHandlerSigusr2Idle)) { usleep(500000); } // start a new reload SignalHandlerSigusr2(); return; }
(just an example, needs more logic like when a third SIGUSR2 comes etc.)
This could result in the following behaviour:
- Receiving SIGUSR2 -> start reload
- Receiving SIGUSR2 #2 while reload in progress -> wait for reload to finish, to reload again
This results in a sane active suricata config with the most recent values.
An alternative would be some way to detect if the rule-reload is still in progress, so a script could wait. Having a logwatch is one possibility, but kinda messy.
Since there is no way (at least i know none) to check if the rule-reload is done (besides reading logfiles which could result in other errors) i could also think about some other ways:
- create a lockfile that could be checked from external scripts, although this would result in the scripts to wait
- send some answer/return value that could be checked
A workaround is to restart suricata everytime, but since the rule-reload feature is included, makes more sense and is just nice, i would like to improve this part.
So i would appreciate any opinion about this issue and what could be the most preferable way to deal with it. I will play around with the codebase to see if i can find some other (better) way of dealing with that besides patching the SignalHandlerSigusr2Idle.
Updated by Victor Julien over 10 years ago
An additional option would be to use the unix socket interface. A start has been made to expose the reload feature to it, but it's currently commented out. Check unix-manager.c for the line with:
UnixManagerRegisterCommand("reload-rules", UnixManagerReloadRules, NULL, 0);
If the unix socket interface gives a reliable way to indicate the reload is complete, I think this could be a good method.
Updated by Andreas Herz over 10 years ago
Victor Julien wrote:
An additional option would be to use the unix socket interface. A start has been made to expose the reload feature to it, but it's currently commented out. Check unix-manager.c for the line with:
[...]
If the unix socket interface gives a reliable way to indicate the reload is complete, I think this could be a good method.
I made a pull request to make the USR2 handline more sane:
https://github.com/inliniac/suricata/pull/1132
I had a discussion with Eric on IRC and he did agree that from user perspective sending a signal shouldn't be ignored (while suricata is running) as he gets no response and just expects the reload to trigger. There are some scenarios where several USR2 could occur which should result in the most recent config to be active after all work is done. Like dynamic interface is coming up (first trigger) and the dynamic IP is assigned which takes some time (second trigger), but also at startup and boot time.
But i also think that the unix interface should provide this feature, too.
Updated by Andreas Herz about 9 years ago
- Assignee set to Andreas Herz
- Target version set to 70
Updated by Andreas Herz almost 9 years ago
I would work on converting my patch to 3.0 but we should decide how we want to handle the USR2 reloads. The last time we discussed that in IRC we had two solutions, using some sort of counter for every USR2 signal which results in reloads until counter is 0 or just gathering that there are 1 or more USR2 pending and then trigger another reload. Any suggestions?
Updated by Andreas Herz over 8 years ago
This is working for quite some time now in production:
Updated by Victor Julien over 8 years ago
- Status changed from New to Closed
- Target version changed from 70 to 3.2beta1