Support #1781: fast.log stops updating - Suricata - Open Information Security Foundation

Actions

Copy link

Support #1781

closed

fast.log stops updating

Added by Jon Zeolla about 9 years ago. Updated about 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Affected Versions:

Label:

Description

Numerous hours after starting my suricata service, my fast.log stops getting appended to without a crash of the service, and it will stay this way for hours until I do a manual service restart. I've attached some relevant information, but because it is timed ambiguously (anywhere between 7 and 24 hours after service start) it's hard for me to get a gdb debug, but I am working on that now. It's not core dumping and the service stays up, it's just that the fast.log stops getting appended to. If I look at stats.log, it continues to write updates, but all of the values are static until the instance is restarted.

The service is not entirely failing, which is somewhat more difficult to monitor, maintain, and debug, than if it would just crash and core dump. Some additional details are below, but I'd be more happy to gather more details as requested. I also idle in #suricata as jzeolla if it's easier to chat there and just put the results of any conversation in this thread.

Dell PowerEdge R730
- 12 cores * 2 processors w/HT (48 total)
- 192GB RAM
- 300GB usable disk space, with usage < 30%

Ubuntu 14.04.4 LTS, x86_64
- 3.13.0-85-generic

Suricata 3.0.1 RELEASE
- See attached "build-info.txt" for the output of `suricata --build-info`
- See attached "sanitized suricata.yaml" for the related suricata.yaml
- See attached "sanitized suricata.log" for a not-yet-crashed log file (as far as I can tell, when this issue hits, nothing is written to suricata.log until the service gets restarted).

Files

Download all files

sanitized suricata.yaml (4 KB) sanitized suricata.yaml		Jon Zeolla, 05/09/2016 10:47 AM
build-info.txt (2.82 KB) build-info.txt		Jon Zeolla, 05/09/2016 11:53 AM
sanitized suricata.log (20.1 KB) sanitized suricata.log		Jon Zeolla, 05/09/2016 11:55 AM
stats.log (3.72 MB) stats.log		Jon Zeolla, 05/09/2016 12:51 PM
sanitized suricata 2.yaml (4 KB) sanitized suricata 2.yaml		Jon Zeolla, 05/09/2016 01:06 PM
stats.log (1.63 MB) stats.log		Jon Zeolla, 05/14/2016 09:58 AM
sanitized_suricata_2.log (21.1 KB) sanitized_suricata_2.log		Jon Zeolla, 05/18/2016 07:12 PM
stats.log (1.06 MB) stats.log		Jon Zeolla, 05/19/2016 06:06 AM

Actions

Copy link

Updated by Victor Julien about 9 years ago

Can you add stats.log records?

Actions

Copy link

Updated by Jon Zeolla about 9 years ago

File stats.log stats.log added

Here's yesterday's log, and you can see the number freeze starting at 5/8/2016 -- 13:47:29

Actions

Copy link

Updated by Victor Julien about 9 years ago

the tcp.reassembly_gap is really high, 23M vs 42M sessions. This means a great number of your TCP sessions are not fully tracked. The cause appears to be the memcap:

tcp.reassembly_memuse | Total | 12884901882

This means all 12gb is in use, leading to many segments not considered for reassembly:

tcp.segment_memcap_drop | Total | 233438924

Actions

Copy link

Updated by Jon Zeolla about 9 years ago

File sanitized suricata 2.yaml sanitized suricata 2.yaml added

Good catch, that was a leftover from before we added more RAM to the box (before we were using 100% of our memory). I just increased the memcap from 12gb to 30gb and it's back up and running now. New suricata.yaml:

Actions

Copy link

Updated by Victor Julien about 9 years ago

Tracker changed from Bug to Support

Actions

Copy link

Updated by Jon Zeolla about 9 years ago

File stats.log stats.log added

I'm still encountering the same issue. Here's a new stats.log with the issue becoming apparent at 01:54:54. I've modified suricata.yaml to more aggressively prune tcp connections, and hopefully that will help - any other suggestions?

I also feel as though it would be better for Suricata to crash rather than to fail in the way that it has been. Thoughts?

Thanks,

Jon

Actions

Copy link

Updated by Andreas Herz about 9 years ago

Did you try other output formats besides fast.log?
Just to see if it's just related to fast.log or eve/json etc. as well.

Actions

Copy link

Updated by Jon Zeolla about 9 years ago

I have added parallel JSON output. I will report back when I have more information. Thanks

Actions

Copy link

Updated by Jon Zeolla about 9 years ago

We encountered the issue again in the middle of the night last night, and the fast.log issues match with eve. That is to say, they both stopped writing at the same exact time and only resumed after service reboot.

The last write to fast.log was:
05/17/2016-02:32:22.611623

The last write to eve.json was:
"timestamp":"2016-05-17T02:32:22.611623-0400"

Actions

Copy link

#10

Updated by Peter Manev about 9 years ago

Can you try the following changes/adjustments (see if it makes a diff) :

- max-pending-packets: 65534
- mpm-algo: ac-ks

in the af-packet section -
- rollover: no
- #use-emergency-flush: yes
- #checksum-checks: no
- ring-size: 65534

in the detect-engine section:
- sgh-mpm-context: auto

in the flow section:
- memcap: 1gb
- #prune-flows: 1

in the stream section:
- prealloc-sessions: 100000
- midstream: false

then start Suricata in verbose mode (-v) and please share the suricata.log file.

Actions

Copy link

#11

Updated by Jon Zeolla about 9 years ago

File sanitized_suricata_2.log sanitized_suricata_2.log added

I gave that a shot and got:
18/5/2016 -- 19:56:25 - <Error> - [ERRCODE: SC_ERR_INVALID_YAML_CONF_ENTRY(139)] - Invalid mpm algo supplied in the yaml conf file: "ac-ks"

and after review of the current suricata.yaml.in in master I modified the mpm-algo to be ac-bs, as it appears to be the closest to what you suggested. Attached is my suricata.log from startup. My initial thoughts are that restarts are way faster now, but I will monitor and get back to this soon with an update.

Actions

Copy link

#12

Updated by Peter Manev about 9 years ago

Yes they should be indeed.
Sorry ac-ks is available in git (I rarely use stable :) )

I silly question - you are not running out of disk space right ? (it has happened to me hence the question)

Actions

Copy link

#13

Updated by Victor Julien about 9 years ago

Jon could you add another stats.log from when it stops alerting again?

Actions

Copy link

#14

Updated by Jon Zeolla about 9 years ago

File stats.log stats.log added

Peter,

I don't believe that has ever been an issue. We're are roughly 30% usage of the disk that the logs write to, and the stats.log will continually write the same metrics to disk when fast.log/eve.json is frozen.

Peter & Victor,

We got through the night without issue (and at very low memory usage), but here is the stats.log anyway. I've been evaluating to see if there are any ill effects from the new config. I will report back with a stats.log if it stops reporting again.

Actions

Copy link

#15

Updated by Jon Zeolla about 9 years ago

That last modification has made major improvements to our stability. I've rolled back our interim fixes (scheduled restarts) and reverted to the USR2 for signature reloading. Will let this bake for a couple of days and then review stats.log again to ensure drops are reasonable.

Actions

Copy link

#16

Updated by Jon Zeolla about 9 years ago

Things appear to be much better now, you can close the ticket. Would you mind pointing me to some documentation regarding the mpm-algo change?

Actions

Copy link

#17

Updated by Victor Julien about 9 years ago

Status changed from New to Closed
Priority changed from High to Normal

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Suricata

Custom queries

Support #1781

fast.log stops updating

Updated by Victor Julien about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Victor Julien about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Victor Julien about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Andreas Herz about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Peter Manev about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Peter Manev about 9 years ago

Updated by Victor Julien about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Jon Zeolla about 9 years ago

Updated by Victor Julien about 9 years ago