Project

General

Profile

Actions

Bug #809

closed

Live Rules Update Corruption - ELSA

Added by Brian Olson almost 11 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
Affected Versions:
Effort:
Difficulty:
Label:

Description

I've been working on routing suricata alerts to ELSA via rsyslog for aggregation an analysis. I recently began using USR2 to update the running ruleset immediately following and oinkmaster update. What I found was that alerts would be corrupted right after the USR2 signal. Example alerts:

GOOD

   May  8 18:02:31 HOSTNAME snort[7905]: [1:20147:16] ET POLICY.....

After USR2 (BAD)

   May  8 18:10:31 HOSTNAME ??M#001[8056]: 6]: [1:20147:0] ET POLICY...

As you can see, after the USR2 signal is processed, the "identity" field in syslog is hosed as well as what appears to be an echo following the PID ("6]:") a few characters later. The identity is set to "snort" because ELSA requires it. Everything else seems fine. I have been able to replicate this repeatedly on several Ubuntu12.04 LTS hosts running suricata 1.4.1. I have since created a work-around for this by changing "suricata" to "snort" via syslog-ng on the ELSA system, but wanted to report the bug.

Thanks!

Actions #1

Updated by Peter Manev almost 11 years ago

  • Subject changed from Live Rules Update Corruption to Live Rules Update Corruption - ELSA

Hi,

 I have since created a work-around for this by changing "suricata" to "snort" via syslog-ng on the ELSA system, but wanted to report the bug.

So if by doing that on the ELSA system, fixes the things for you - isn't it more likely that this could be a parsing issue on the ELSA system ?

If you do a packet capture of the logs being send - do you see anything corrupted?

Thanks

Actions #2

Updated by Brian Olson almost 11 years ago

Hi Peter,

Ahhh...I didn't state this specifically (whoops...sorry), but it only happens when the identity in suricata is set to "snort". It may happen with other strings as well, however, I only had it set to snort in order to make ELSA happy. As soon as I change the identity back to "suricata" and restarted the daemon, it was good, including after a USR2.

When the identity was set to "snort", cycling the daemon would fix the issue until a USR2 signal was received.

This is all local to the system between suricata and the syslog. I only mentioned ELSA as this is the reason I was changing the identity suricata was reporting.

Actions #3

Updated by Peter Manev almost 11 years ago

Hi Brian,

 As soon as I change the identity back to "suricata" and restarted the daemon, it was good, including after a USR2.

So in other words if you set up the identity as suricata in the suricata.yaml - everything is fine, correct? (even after USR2 signal)

I am sorry but I do not see how is this a Suricata bug (are you supposed to set it as "identity snort" in the suricata.yaml )? (maybe I am misunderstanding...)

Thank you

Actions #4

Updated by Brian Olson almost 11 years ago

Yes, with "suricata" as the identity, everything works great. If the identity is changed to "snort," the USR2 signal seems to corrupt the syslog alerting (I didn't check fast.log).

With ELSA, you're supposed to change the identity from suricata to snort in the suricata.yaml file.

Actions #5

Updated by Peter Manev almost 11 years ago

If you do the same test - but you restart Suri (instead of using the USR2 reload signal for rule update ) - would you observe the same errs?

Actions #6

Updated by Brian Olson almost 11 years ago

No errors with a full restart of suri, only with USR2.

Actions #7

Updated by Victor Julien almost 11 years ago

  • Target version deleted (1.4.1)
Actions #8

Updated by Peter Manev almost 11 years ago

Hi Brian,

Can you please post (or share privately if you will) the exact workaround lines in syslog?

Thank you

Actions #9

Updated by Brian Olson almost 11 years ago

Hi Peter,

This is the code I use on the ELSA system (receives syslog messages from suricata system via rsyslog):

rewrite suricata_to_snort {
subst("suricata", "snort", value("PROGRAM"));
};

< -- SNIP -- >
Log {
rewrite(suricata_to_snort);
< -- SNIP -- >

This only changes the value of "program" to "snort" since ELSA requires "snort" and changing the identity on suri itself produces bad alerts.

Actions #10

Updated by Victor Julien over 10 years ago

  • Target version set to TBD
Actions #11

Updated by Demetris Demetriou over 9 years ago

Came across this bug recently. Using suricata on pfsense (pfsense package) and I'm aggregating the logs to a remote rsyslog server. The issue happens at random, but mostly when triggering a live reload of the rules (not always).

Relevant log entries (the ^number is "[ pid ]" without the spaces, bug system changes it):
The following are logs taken by a remote rsyslog server, exactly at the moment the bug happened:

2014-10-08T05:54:51+03:00 host suricata[66280]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:63555 -> xxx.xxx.xxx.xxx:445
2014-10-08T05:54:54+03:00 host suricata[66280]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:63555 ->xxx.xxx.xxx.xxx:445
2014-10-08T05:57:13+03:00 host ▒M-^C▒([66280]: [1:2101616:9] GPL DNS named version attempt [Classification: Attempted Information Leak] [Priority: 2] {UDP} yyy.yyy.yyy.yyy:46858
 -> 213.7.202.53:53
2014-10-08T05:57:23+03:00 host ▒M-^C▒([66280]: [1:9000800:1] Incoming unwanted HTTP connection TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:51121 -> xxx.xxx.xxx.xxx:80

The only modifications I've made to the logs above are obfuscating my host (replaced with host), my IP (replaced with xxx.xxx.xxx.xxx) and the remote IP (replaced with yyy.yyy.yyy.yyy). All other characters are logged exactly as shown (yes, even the funny looking ones).

The corrupted syslog tag does not always contain funny characters. I've seen it contain cleartext, and the cleartext was "moving", as if being truncated and added to the tag. For example it would go like this (F=funny characters):

FFFFFF
FFFFF/
FFFF/v
FFF/var
FF/var
F/varF

Contacted the pfsense package manager and he confirmed that the package does not affect the logging in any way. Logs are sent straight from the binary to the remote server.

It's a scary bug, since it clearly shows that something is getting overflowed somewhere. Please consider a quick patch for this.

Actions #12

Updated by Demetris Demetriou over 9 years ago

More logs showing that the tag field is not static, but changes++
(strikethrough not added by me. As before, only hostname, local IP and remote IP have been changed)

2014-10-14T17:32:53+03:00 host1 ▒n▒-▒▒▒-[55992]: [1:9004430:1] Incoming unwanted HTTP/S connection TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:52497 -> xxx.xxx.xxx.xxx:443
2014-10-14T17:37:29+03:00 host1 `m▒-▒7n-[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:80 -> xxx.xxx.xxx.xxx:745
2014-10-14T17:37:29+03:00 host1 `m▒-▒7n-[55992]: [1:9900000:1] Suspicious activity:privileged port>privileged port [Classification: Potentially Bad Traffic] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:80 -> xxx.xxx.xxx.xxx:745
2014-10-14T17:39:00+03:00 host1 ^D[55992]: [1:9000002:1] Incoming connection to unused service over UDP [Classification: Attempted Information Leak] [Priority: 2] {UDP} yyy.yyy.yyy.yyy:42242 -> xxx.xxx.xxx.xxx:17
2014-10-14T17:40:09+03:00 host1 ^D[55992]: [1:9009950:1] Incoming unwanted POP3/S connection TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:38357 -> xxx.xxx.xxx.xxx:995
2014-10-14T17:43:23+03:00 host1 !U▒/▒^E[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1346 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:43:26+03:00 host1 !U▒/▒^E[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1346 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:45:19+03:00 host1 !U▒/▒^E[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:4089 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:45:22+03:00 host1 !U▒/▒^E[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:4089 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:46:11+03:00 host1 [55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1516 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:46:14+03:00 host1 [55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1516 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:49:20+03:00 host1 @\f-▒▒f-[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1417 -> xxx.xxx.xxx.xxx:445
2014-10-14T17:49:23+03:00 host1 @\f-▒▒f-[55992]: [1:9000001:1] Incoming connection to unused service over TCP [Classification: Attempted Information Leak] [Priority: 2] {TCP} yyy.yyy.yyy.yyy:1417 -> xxx.xxx.xxx.xxx:445
Actions #13

Updated by Victor Julien over 9 years ago

  • Description updated (diff)
Actions #14

Updated by Victor Julien over 9 years ago

The alert-syslog code directly uses strings from the Conf API, but the reload reinitializes that. So I think what we're seeing it use to freed memory / dangling pointers.

@Jason Borden, we can quickly fix this by using local copies, but I guess we need to consider if the global pattern of long term referencing Conf* memory is sane.

Actions #15

Updated by Victor Julien over 9 years ago

  • Status changed from New to Assigned
  • Assignee set to Jason Ish
  • Target version changed from TBD to 3.0RC2

Jason assigned it to you for now, feel free to pass back to me.

Actions #16

Updated by Jason Ish over 8 years ago

Victor Julien wrote:

Jason assigned it to you for now, feel free to pass back to me.

Victor, is this specific case still an issue? I'm simply dumping the memory address of the ident on each log, and it stays static even after a SIGUSR2, pehaps related to your load at prefix modifications?

Actions #17

Updated by Victor Julien over 8 years ago

I don't think this is still an issue, but it should be checked to be sure. I did change how the yaml conf tree is handled with reloads. Previously the main tree could disappear on reload. That shouldn't happen anymore.

Actions #18

Updated by Jason Ish over 8 years ago

Victor Julien wrote:

I don't think this is still an issue, but it should be checked to be sure. I did change how the yaml conf tree is handled with reloads. Previously the main tree could disappear on reload. That shouldn't happen anymore.

I think we're good now. I figured the main tree remained after looking at the code. I also did a large number of USR2 waiting for the memory address of the returned confnode to change, and as expected, it didn't.

Sorry if I missed it, but is there documentation on exactly what is replaced during a live reload?

Actions #19

Updated by Victor Julien over 8 years ago

  • Target version changed from 3.0RC2 to 70
Actions #20

Updated by Victor Julien about 8 years ago

  • Status changed from Assigned to Closed
  • Target version deleted (70)
Actions

Also available in: Atom PDF