Bug #2993
closed
VJ
Suricata 5.0.0beta1 memory allocation of 4294966034 bytes failed
Added by Anonymous almost 7 years ago. Updated over 6 years ago.
Description
Today suricata stopped after running for almost 9h:
May 27 12:05:22 server suricata28762: memory allocation of 4294966034 bytes failed
May 27 12:05:25 server systemd1: suricata.service: main process exited, code=killed, status=6/ABRT
May 27 12:05:25 server systemd1: Unit suricata.service entered failed state.
May 27 12:05:25 server systemd1: suricata.service failed.
Could not find a core file though I configured a home directory and core dump in suricata.yaml
As far as I can see the system was not running out of any resources.
- suricata -V
This is Suricata version 5.0.0-beta1 RELEASE
- ldd `which suricata`
linux-vdso.so.1 => (0x00007ffd769c7000)
libhtp.so.2 => /lib/libhtp.so.2 (0x00007fd563c32000)
librt.so.1 => /lib64/librt.so.1 (0x00007fd563a2a000)
libm.so.6 => /lib64/libm.so.6 (0x00007fd563728000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fd563513000)
libGeoIP.so.1 => /lib64/libGeoIP.so.1 (0x00007fd5632e3000)
libmagic.so.1 => /lib64/libmagic.so.1 (0x00007fd5630c6000)
libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fd562ec0000)
libpfring.so.1 => /usr/local/lib/libpfring.so.1 (0x00007fd562c28000)
libpcap.so.1 => /usr/local/lib/libpcap.so.1 (0x00007fd562955000)
libnet.so.1 => /lib64/libnet.so.1 (0x00007fd56273b000)
libjansson.so.4 => /lib64/libjansson.so.4 (0x00007fd56252e000)
libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007fd56230e000)
libhs.so.5 => /usr/local/lib64/libhs.so.5 (0x00007fd5619f5000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fd561793000)
libz.so.1 => /lib64/libz.so.1 (0x00007fd56157d000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fd561357000)
libssl3.so => /lib64/libssl3.so (0x00007fd561105000)
libsmime3.so => /lib64/libsmime3.so (0x00007fd560ede000)
libnss3.so => /lib64/libnss3.so (0x00007fd560bb1000)
libnssutil3.so => /lib64/libnssutil3.so (0x00007fd560981000)
libplds4.so => /lib64/libplds4.so (0x00007fd56077d000)
libplc4.so => /lib64/libplc4.so (0x00007fd560578000)
libnspr4.so => /lib64/libnspr4.so (0x00007fd56033a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd56011e000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd55ff1a000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd55fd04000)
libc.so.6 => /lib64/libc.so.6 (0x00007fd55f937000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd563e51000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd55f630000)
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Release: 7.6
Codename: Maipo
Kernel 3.10.0-957.12.1.el7.x86_64
pfring : 7.5.0 dev
Files
| suricata.yaml (75.8 KB) suricata.yaml | Anonymous, 05/28/2019 08:25 AM |
PM Updated by Peter Manev almost 7 years ago Actions #1
Hi,
Thank you for the report!
Was this a default config run or there were changes made to the memcap/ring settings ?
Updated by Anonymous almost 7 years ago
Actions
#2
- cat /etc/modprobe.d/pf_ring.conf
options pf_ring transparent_mode=0 min_num_slots=32767 enable_tx_capture=0
AH Updated by Andreas Herz almost 7 years ago Actions #3
- Assignee set to OISF Dev
- Target version set to Support
Can you share some more details? Like suricata --build-info, the configuration file and it would be perfect if you can reproduce it with a coredump.
At https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Debugging you can find more information how to debug this.
Updated by Anonymous almost 7 years ago
Actions
#4
- File suricata.yaml suricata.yaml added
Config file attached, maybe the settings are not correct to create a core dump because I was not able to find one after the crash yesterday?
# suricata --build-info
This is Suricata version 5.0.0-beta1 RELEASE
Features: PCAP_SET_BUFF PF_RING AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LIBJANSSON TLS MAGIC RUST
SIMD support: SSE_4_2 SSE_4_1 SSE_3
Atomic intrisics: 1 2 4 8 16 byte(s)
64-bits, Little-endian architecture
GCC version 4.8.5 20150623 (Red Hat 4.8.5-36), C version 199901
compiled with -fstack-protector
compiled with _FORTIFY_SOURCE=2
L1 cache line size (CLS)=64
thread local storage method: __thread
compiled with LibHTP v0.5.30, linked against LibHTP v0.5.30
Suricata Configuration:
AF_PACKET support: yes
eBPF support: no
XDP support: no
PF_RING support: yes
NFQueue support: no
NFLOG support: no
IPFW support: no
Netmap support: no
DAG enabled: no
Napatech enabled: no
WinDivert enabled: no
Unix socket enabled: yes
Detection enabled: yes
Libmagic support: yes
libnss support: yes
libnspr support: yes
libjansson support: yes
liblzma support: yes
hiredis support: no
hiredis async with libevent: no
Prelude support: no
PCRE jit: yes
LUA support: no
libluajit: no
libgeoip: yes
Non-bundled htp: no
Old barnyard2 support: no
Hyperscan support: yes
Libnet support: yes
liblz4 support: yes
Rust support: yes
Rust strict mode: no
Rust debug mode: no
Rust compiler: rustc 1.34.0
Rust cargo: cargo 1.34.0
Python support: yes
Python path: /bin/python2.7
Python version:
Python distutils yes
Python yaml yes
Install suricatactl: yes
Install suricatasc: yes
Install suricata-update: yes
Profiling enabled: no
Profiling locks enabled: no
Development settings:
Coccinelle / spatch: no
Unit tests enabled: no
Debug output enabled: no
Debug validation enabled: no
Generic build parameters:
Installation prefix: /usr
Configuration directory: /etc/suricata/
Log directory: /var/log/suricata/
--prefix /usr
--sysconfdir /etc
--localstatedir /var
--datarootdir /usr/share
Host: x86_64-pc-linux-gnu
Compiler: gcc (exec name) / gcc (real)
GCC Protect enabled: yes
GCC march native enabled: yes
GCC Profile enabled: no
Position Independent Executable enabled: no
CFLAGS -g -O2 -march=native -I${srcdir}/../rust/gen/c-headers
PCAP_CFLAGS -I/usr/local/include
SECCFLAGS -fstack-protector -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security
PM Updated by Peter Manev almost 7 years ago Actions #5
How much memory you have on the box btw in total? (and is it only suricata running or something else that can be resource intensive ?)
Updated by Anonymous almost 7 years ago
Actions
#6
64Gb RAM available and indeed an ELK stack consumes almost half of the available RAM, but this setup has been running for years quite stable. On the other we want/need to move the Elasticsearch setup to a newly to build cluster.
PM Updated by Peter Manev almost 7 years ago Actions #7
I think it is a good candidate reason in some conditions (aka an intense ES search) for not being able to have that needed extra allocation. In general yes - could be better if things are separated as i suspect resource contention would also occur(if not yet) in terms of CPU too.
VJ Updated by Victor Julien almost 7 years ago Actions #8
I don't think this has anything to do with the system or what else is running on it. The error suggests suri tried to alloc almost 4GB in a single malloc, which is certainly a bug. It smells like an int underflow.
Updated by Anonymous almost 7 years ago
Actions
#9
Indeed, but I had a look at some system resources when the crash occured, but neither Zabbix nor sar did mention lack of memory or cpu. Oke is is a busy system, but that keeps the dust away ;)
PM Updated by Peter Manev almost 7 years ago Actions #10
Does it happen repeatedly ?
Updated by Anonymous almost 7 years ago
Actions
#11
Running v5 since May second, it happend twice so far:
2019-05-27T12:05:22.331122+02:00 server suricata: memory allocation of 4294966034 bytes failed
2019-05-21T09:44:12.279565+02:00 server suricata: memory allocation of 4294926257 bytes failed
Suricata is restarted daily to rotate logs.
PM Updated by Peter Manev almost 7 years ago Actions #12
Is it possible to correlate the sys/ram utilization at that time ? Just to make sure it is not sys/mem related I was thinking.
Updated by Anonymous almost 7 years ago
Actions
#13
May 27 crash at 12:05 :
CPU %user %nice %system %iowait %steal %idle
11:30:02 AM all 38.47 5.57 12.01 3.43 0.00 40.53
11:40:02 AM all 37.36 5.61 11.57 4.06 0.00 41.40
11:50:01 AM all 38.84 5.73 11.63 4.01 0.00 39.79
12:00:02 PM all 38.99 5.63 12.03 3.78 0.00 39.57
12:10:01 PM all 17.96 2.53 5.88 2.80 0.00 70.82
12:20:01 PM all 1.27 0.08 0.46 0.17 0.00 98.02
12:30:01 PM all 1.06 0.07 0.43 0.05 0.00 98.39
kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
11:30:02 AM 429940 65502432 99.35 2784 2184880 42437912 57.10 27393028 3474244 87552
11:40:02 AM 434328 65498044 99.34 2732 2262996 42416296 57.07 27314036 3541212 26828
11:50:01 AM 533528 65398844 99.19 2568 1451616 42286528 56.90 27520568 3257140 27148
12:00:02 PM 598556 65333816 99.09 2700 1085900 42580564 57.29 27543920 3160296 78272
12:10:01 PM 26531992 39400380 59.76 6572 4556032 35589444 47.89 2030348 4324928 2220
12:20:01 PM 25655172 40277200 61.09 23952 5361284 35609344 47.91 2549072 4669684 3396
12:30:01 PM 25304540 40627832 61.62 24840 5701996 35586908 47.88 2782596 4777068 2944
kbswpfree kbswpused %swpused kbswpcad %swpcad
11:30:02 AM 2970544 5418060 64.59 232280 4.29
11:40:02 AM 2714704 5673900 67.64 229024 4.04
11:50:01 AM 2390440 5998164 71.50 249900 4.17
12:00:02 PM 2020524 6368080 75.91 220356 3.46
12:10:01 PM 3570620 4817984 57.43 91680 1.90
12:20:01 PM 3576252 4812352 57.37 121428 2.52
12:30:01 PM 3579580 4809024 57.33 124324 2.59
May 21 crash at 09:44 :
CPU %user %nice %system %iowait %steal %idle
09:00:01 AM all 33.32 7.42 9.71 2.61 0.00 46.94
09:10:01 AM all 32.51 6.73 10.10 2.17 0.00 48.49
09:20:02 AM all 32.09 6.75 10.27 2.50 0.00 48.40
09:30:02 AM all 32.20 6.41 9.55 2.37 0.00 49.46
09:40:02 AM all 34.08 6.48 10.17 4.98 0.00 44.29
09:50:01 AM all 16.93 2.75 5.31 2.47 0.00 72.54
10:00:02 AM all 0.94 0.08 0.45 0.05 0.00 98.49
kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
09:00:01 AM 684396 65247976 98.96 6620 12489868 39250032 52.81 21870676 8354172 313188
09:10:01 AM 768752 65163620 98.83 6884 10145636 40047656 53.88 22893100 7287660 417328
09:20:02 AM 560148 65372224 99.15 6848 7709440 40094496 53.95 24282280 6187248 390432
09:30:02 AM 669136 65263236 98.99 5820 6772564 39834916 53.60 24672960 5765980 261164
09:40:02 AM 436924 65495448 99.34 2392 774276 40230372 54.13 27868580 3037652 32116
09:50:01 AM 29049272 36883100 55.94 25652 2155684 35554456 47.84 2226176 1659824 1304
10:00:02 AM 28781336 37151036 56.35 26612 2405776 35554344 47.84 2534504 1611564 96
Updated by Anonymous almost 7 years ago
Actions
#14
kbswpfree kbswpused %swpused kbswpcad %swpcad
09:00:01 AM 3479724 4908880 58.52 301672 6.15
09:10:01 AM 3473240 4915364 58.60 300224 6.11
09:20:02 AM 3468888 4919716 58.65 278480 5.66
09:30:02 AM 3457368 4931236 58.78 240940 4.89
09:40:02 AM 2628548 5760056 68.67 122640 2.13
09:50:01 AM 3331300 5057304 60.29 125436 2.48
10:00:02 AM 3338468 5050136 60.20 128456 2.54
AH Updated by Andreas Herz almost 7 years ago Actions #15
Might be worth to test AF_PACKET instead of PF_RING and a more recent kernel maybe?
VJ Updated by Victor Julien over 6 years ago Actions #16
I think it would be good to set a maximum to what a single allocation can try to get. Determining what that max would be is of course the trick.
VJ Updated by Victor Julien over 6 years ago Actions #17
Possibly related:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fb803076801 in __GI_abort () at abort.c:79
#2 0x00005619886e5367 in std::sys::unix::abort_internal ()
#3 0x00005619886f39fd in rust_oom ()
#4 0x0000561988708b67 in alloc::alloc::handle_alloc_error ()
#5 0x000056198860a181 in alloc::raw_vec::RawVec<T,A>::allocate_in (cap=4294965836, zeroed=<optimized out>, a=...) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:99
#6 alloc::raw_vec::RawVec<T>::with_capacity_zeroed (cap=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:145
#7 <u8 as alloc::vec::SpecFromElem>::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1635
#8 alloc::vec::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1614
#9 suricata::smb::smb::SMBState::parse_tcp_data_tc_gap (self=0x7fb7f47b6630, gap_size=<optimized out>) at src/smb/smb.rs:1734
#10 rs_smb_parse_response_tcp_gap (state=state@entry=0x7fb7f47b6630, input_len=<optimized out>, input_len@entry=4294965836) at src/smb/smb.rs:1864
#11 0x00005619883f766e in SMBTCPParseResponse (f=0x56198a88b380, state=0x7fb7f47b6630, pstate=0x7fb7f4611760, input=0x0, input_len=4294965836, local_data=0x0, flags=24 '\030') at app-layer-smb.c:68
#12 0x00005619883f6e06 in AppLayerParserParse (tv=tv@entry=0x56199f507920, alp_tctx=0x7fb7f429eeb0, f=f@entry=0x56198a88b380, alproto=8, flags=<optimized out>, input=input@entry=0x0,
input_len=input_len@entry=4294965836) at app-layer-parser.c:1225
#13 0x00005619883b1cd6 in AppLayerHandleTCPData (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, p=p@entry=0x7fb7f4268180, f=0x56198a88b380, ssn=ssn@entry=0x7fb7f4359620,
stream=stream@entry=0x7fb7fd3ae748, data=<optimized out>, data@entry=0x0, data_len=<optimized out>, flags=<optimized out>) at app-layer.c:601
#14 0x00005619885483d0 in ReassembleUpdateAppLayer (dir=<optimized out>, p=<optimized out>, stream=<optimized out>, ssn=<optimized out>, ra_ctx=<optimized out>, tv=<optimized out>)
at stream-tcp-reassemble.c:1012
#15 StreamTcpReassembleAppLayer (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=<optimized out>, stream@entry=0x7fb7f4359630, p=p@entry=0x7fb7f4268180,
dir=dir@entry=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1123
#16 0x000056198854905c in StreamTcpReassembleHandleSegment (tv=tv@entry=0x56199f507920, ra_ctx=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=0x7fb7f4359630, p=p@entry=0x7fb7f4268180,
pq=pq@entry=0x7fb7f429e738) at stream-tcp-reassemble.c:1788
#17 0x000056198853e33f in HandleEstablishedPacketToClient (stt=<optimized out>, pq=<optimized out>, p=<optimized out>, ssn=<optimized out>, tv=<optimized out>) at stream-tcp.c:2408
#18 StreamTcpPacketStateEstablished (tv=0x56199f507920, p=0x7fb7f4268180, stt=stt@entry=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738) at stream-tcp.c:2645
#19 0x0000561988541870 in StreamTcpStateDispatch (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738, state=<optimized out>) at stream-tcp.c:4650
#20 0x0000561988543bd9 in StreamTcpPacket (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, pq=0x7fb7f428d5f8) at stream-tcp.c:4838
#21 0x0000561988544505 in StreamTcp (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, data=<optimized out>, pq=pq@entry=0x7fb7f428d5f8, postpq=postpq@entry=0x0) at stream-tcp.c:5174
#22 0x00005619884cc4b2 in FlowWorker (tv=0x56199f507920, p=0x7fb7f4268180, data=0x7fb7f428d5d0, preq=0x56199f507d00, unused=<optimized out>) at flow-worker.c:245
#23 0x000056198855308c in TmThreadsSlotVarRun (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, slot=slot@entry=0x56199f507b80) at tm-threads.c:130
#24 0x000056198852ae4e in TmThreadsSlotProcessPkt (p=0x7fb7f4268180, s=0x56199f507b80, tv=0x56199f507920) at tm-threads.h:162
#25 AFPReadFromRing (ptv=<optimized out>) at source-af-packet.c:1025
#26 0x000056198852ecd6 in ReceiveAFPLoop (tv=0x56199f507920, data=<optimized out>, slot=<optimized out>) at source-af-packet.c:1589
#27 0x000056198855430d in TmThreadsSlotPktAcqLoop (td=0x56199f507920) at tm-threads.c:335
#28 0x00007fb803cf36db in start_thread (arg=0x7fb7fd3b0700) at pthread_create.c:463
#29 0x00007fb80315788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
VJ Updated by Victor Julien over 6 years ago Actions #18
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from Support to 5.0.2
- Affected Versions 5.0.0, 5.0.1 added
VJ Updated by Victor Julien over 6 years ago Actions #19
- Status changed from Assigned to Closed